Title: PowerPoint-Pr
1World Data Center Climate Terabyte Data Storage
in a Relational Database System
Michael Lautenschlager, Hannes Thiemann and
Frank Toussaint ICSU World Data Center
Climate Model and Data / Max-Planck-Institute for
Meorology Hamburg, Germany
WS Spatiotemporal Databases for Geosciences,
Biomedical sciences and Physical
sciences Edinburgh, November 1st 2nd, 2005
WDCC Home www.wdcc-climate.de / WDCC Contact
data_at_dkrz.de
2Content Introduction of WDCC CERA2 Data
Model Data Access Connection to Mass Storage
Archive Summary
3(No Transcript)
4WDCC Content
Oktober 2005 580 Experiments / 68.000 Data Sets
Data from Earth System Modelling and Related
Observations
ERA40
Start Approved in January 2003 Maintenance
Model and Data (MD/MPI-M) and German Climate
Computing Centre (DKRZ)
5WDCC Access
6WDCC Size
4.6 Billion BLOBs
7WDCC DB Storage
Storage of global coverages per file or BLOB
all levels, all parameters arbitrary time
intervals all levels, all parameters 1 moment
(6 by 6 hours) 1 level, 1 parameter 1
moment ( 1 BLOB 1 global field)
how we get the grid dataFiles from climate
model postprocessing step 1 homogenizing
time and calculation of diagnostics postproce
ssing step 2 isolation of levels
parameters and creation of BLOB table input
8Data Model
9CERA1) Concept Semantic Data Management
- (I) Data catalogue and Unix files (pointer or
BLOB-table-entry) - Enable search and identification of data
- Allow for data access as they are (coarse
granularity) - (II) Application-oriented data storage
- Time series of individual variables are stored as
BLOB entries in DB Tables (fine granularity) - Allow for fast and selective data access
- Storage in standard data format (GRIB, NetCDF)
- Allow for application of standard data processing
routines (PINGOs, CDOs)
1) Climate and Environmental data Retrieval and
Archiving
10WDCC Data Topology
Level 1 - Interface Metadata entries (XML,
ASCII) Data Files
Level 2 Interf. Separate files containing
BLOB table data in application adapted
structure (time series of single variables)
BLOB DB Table corresponds to scalable, virtual
file at the operating system level.
11(No Transcript)
12CERA Data Model
13(No Transcript)
14CERA Modules
- 3 Modules
- DATA_ACCESSfor automatted data access (? remote
data access) - DATA_ORGorganization of grid data(?
geo-references of grid points in BLOBs) - CODEmatching of (internal) model code numbers
15Data Model Functions
- The CERA2 data model
- allows for data search according to discipline,
keyword, variable, project, author, geographical
region and time interval and for data retrieval. - allows for specification of data processing
(aggregation and selection) without attaching the
primary data. - is flexible with respect to local adaptations, to
storage of different types of geo-referenced
data, and to definition of data topologies
(hierarchical, network, .). - is open for cooperation and interchange with
other database systems (e.g. FGDC metadata
standard and ISO 19115 included). - But
- is not the simplest data model for each single
application.
16Data Access
17Web Access to WDCC
METADATA DATA
GUI display in applet JDBC
jblob-script Search for DS names JDBC jblob f
http - html-display - xml-download (ISO, DC, ) downloadhttp URLhttp//
18Interactive Catalogue Access
web browser
request URL
dynamic html pages
http html
Servlet / JSP
- Catalogue access via WWW
- URL parsed by JSP
- integrated DB retrieval by JSP
- response in standard html
- efficient administration of detailed meta
information
19HTTP and JDBC Data Download
Data download via WWW
web browser
request html form
write to client disk
http file download
Servlet / JSP
- request handeled by JSP
- return of binary file
- standard client side jdbc retrieval
- return of binary file
Data download via script/batch
progr. jblob
request jdbc
jdbc file download
write to client disk
20XML Interface for http Metadata Output
user applications
request URL
raw xml
xhtml
ISO xml
xsl mapping
http XML
xsql query
DC xml
... various metadata formats
see wini.wdc-climate.de
- Metadata access via WWW
- xsql query to DB
- xml output from DB
- xsl mapping to any metadata format
21http Data Output
user applications
request URL
plain ASCII
html tables
binary objects
http plain, bin, html
Java Servlet
. . .
various data formats
- Data access via WWW
- URL parsed by servlet
- query DB access by jdbc
- response in any format
22Connection to Mass Storage Archive
23(No Transcript)
24Oracle DBMS HSM
DXDB Unitree client on DB machines
for communication between Oracle DB and tape
archive
25Use of DXDB
- DXDB is used for
- Ordinary Oracle datafiles
- Redo logs
- Backup
26(No Transcript)
27Migout / Migin
- Migout takes place after files havent been
modified for x minutes - Only one migout process per dxdb-filesystem
- Migin takes place immediately after a file is
requested. Only parts accessed are retrieved from
the backend storage. - One migin process per requested file.
28Purging
29Pro
- It works
- Its fast
- Applications dont have to wait until files are
completely restored from tapes.
30Contra
- If the backend works
- Dxdb not supported by Oracle
- Oracle's officially supported Backend
requirements do not necessarily match
requirements from other applications like HSM
systems (i.e. connection to Unitree is not
standarised).
31Summary
- Efficient handling of detailed metadata
- easy and structured administration of gt 60
metadata tables - access supportJava Server Pages (JSP),
Servlets, jdbc, xsqlincluding standard DB
features (sql, views, triggers, ... ) - Efficient handling of fine granularity data
- random access to arbitrary time steps of single
parameters - access supportJava Server Pages (JSP),
Servlets, jdbcincluding standard DB features
(authorisation, ... ) - transparent migration of bulk data to tape
32The Winter TopTen Program identifies the worlds
largest and most heavily used databases.
Email reached in September, 13th ..
Congratulations on achieving Grand Prize award
winner status (1) in Database Size, Other, All
and TopTen Winner status Database Size, Other,
LinuxWorkload, Other, Linux in Winter Corp.'s
2005 TopTen Program! ....... (1) Grand prizes
are awarded for first place winners in the All
Environments categories only.
WDCC's CERA DB has been identified as the largest
Linux DB.
http//www.wintercorp.com/VLDB/2005_TopTen_Survey/
2005TopTenWinners.pdf