Archiving

The information system PANGAEA is used by WDC-MARE for archiving. The technology is based on a three tiered client/server architecture with a data set cache:

(1) A relational database is backend and central archiving system using Sybase as the RDB management software on a multiprocessor computer.

(2) As middleware, an application server with open server components for import, retrieval and editing is operated. Components are encapsulated and use standard interfaces for communication.

(3) On the frontend side different clients provide access to the system. The graphical user interface (GUI) for data upload and metadata definitions is a proprietary application written in 4th Dimension (ACI). Middleware and frontend components follow a generic model to ensure a flexible functionality and easy modifications.

The strictly normalized data model follows the steps in science for gathering analytical data. Up to 12 tables can be used to define metadata which are related to the data during import. Data sets are stored in a pre-configuration as defined by the author in the database cache in ISO/XML format to assure fast access. Each data value can be georeferenced in space and/or time, allowing individual extractions from the inventory. Data sets of up to some million items are stored in two tables (numeric, text) organized through an index tree. Larger data sets or binary objects are stored as files, linked to its metadescription and georeference in Pangaea. As file and buckup archive two tape drive systems (SL8500) are used in combination with a hard disc array. The system has an Internet connection of 1 GBit.

Web services and clients are provided for data retrieval, download and harvesting:

A wiki is operated for the documentation of any technology, functionality and use around PANGAEA.

Photo of one of the tape silos

Two StorageTek Libraries SL8500 with 18 tape drives (LTO/3) and 3000 tapes (50-400 GB) each are used by WDC-MARE/PANGAEA for buckup and archiving large datasets and binary objects, e.g. from seismic, bathymetry, image material or modelling. The total capacity is up to 1200 TerraByte mirrored with full redundancy in two building at different locations.