PowerPoint-Pr - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

PowerPoint-Pr

Description:

Congratulations on achieving Grand Prize award winner status (1) in Database ... (1) Grand prizes are awarded for first place winners in the All Environments ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 33
Provided by: mlautens
Category:
Tags: powerpoint

less

Transcript and Presenter's Notes

Title: PowerPoint-Pr


1
World Data Center Climate Terabyte Data Storage
in a Relational Database System
Michael Lautenschlager, Hannes Thiemann and
Frank Toussaint ICSU World Data Center
Climate Model and Data / Max-Planck-Institute for
Meorology Hamburg, Germany
WS Spatiotemporal Databases for Geosciences,
Biomedical sciences and Physical
sciences Edinburgh, November 1st 2nd, 2005
WDCC Home www.wdcc-climate.de / WDCC Contact
data_at_dkrz.de
2
Content Introduction of WDCC CERA2 Data
Model Data Access Connection to Mass Storage
Archive Summary
3
(No Transcript)
4
WDCC Content
Oktober 2005 580 Experiments / 68.000 Data Sets
Data from Earth System Modelling and Related
Observations
ERA40
Start Approved in January 2003 Maintenance
Model and Data (MD/MPI-M) and German Climate
Computing Centre (DKRZ)
5
WDCC Access
6
WDCC Size
4.6 Billion BLOBs
7
WDCC DB Storage
Storage of global coverages per file or BLOB
all levels, all parameters arbitrary time
intervals all levels, all parameters 1 moment
(6 by 6 hours) 1 level, 1 parameter 1
moment ( 1 BLOB 1 global field)
how we get the grid dataFiles from climate
model postprocessing step 1 homogenizing
time and calculation of diagnostics postproce
ssing step 2 isolation of levels
parameters and creation of BLOB table input
8
Data Model
9
CERA1) Concept Semantic Data Management
  • (I) Data catalogue and Unix files (pointer or
    BLOB-table-entry)
  • Enable search and identification of data
  • Allow for data access as they are (coarse
    granularity)
  • (II) Application-oriented data storage
  • Time series of individual variables are stored as
    BLOB entries in DB Tables (fine granularity)
  • Allow for fast and selective data access
  • Storage in standard data format (GRIB, NetCDF)
  • Allow for application of standard data processing
    routines (PINGOs, CDOs)

1) Climate and Environmental data Retrieval and
Archiving
10
WDCC Data Topology
Level 1 - Interface Metadata entries (XML,
ASCII) Data Files
Level 2 Interf. Separate files containing
BLOB table data in application adapted
structure (time series of single variables)
BLOB DB Table corresponds to scalable, virtual
file at the operating system level.
11
(No Transcript)
12
CERA Data Model
13
(No Transcript)
14
CERA Modules
  • 3 Modules
  • DATA_ACCESSfor automatted data access (? remote
    data access)
  • DATA_ORGorganization of grid data(?
    geo-references of grid points in BLOBs)
  • CODEmatching of (internal) model code numbers

15
Data Model Functions
  • The CERA2 data model
  • allows for data search according to discipline,
    keyword, variable, project, author, geographical
    region and time interval and for data retrieval.
  • allows for specification of data processing
    (aggregation and selection) without attaching the
    primary data.
  • is flexible with respect to local adaptations, to
    storage of different types of geo-referenced
    data, and to definition of data topologies
    (hierarchical, network, .).
  • is open for cooperation and interchange with
    other database systems (e.g. FGDC metadata
    standard and ISO 19115 included).
  • But
  • is not the simplest data model for each single
    application.

16
Data Access
17
Web Access to WDCC
METADATA DATA
GUI display in applet JDBC
jblob-script Search for DS names JDBC jblob f
http - html-display - xml-download (ISO, DC, ) downloadhttp URLhttp//
18
Interactive Catalogue Access
web browser
request URL
dynamic html pages
http html
Servlet / JSP
  • Catalogue access via WWW
  • URL parsed by JSP
  • integrated DB retrieval by JSP
  • response in standard html
  • efficient administration of detailed meta
    information

19
HTTP and JDBC Data Download
Data download via WWW
web browser
request html form
write to client disk
http file download
Servlet / JSP
  • request handeled by JSP
  • return of binary file
  • standard client side jdbc retrieval
  • return of binary file

Data download via script/batch
progr. jblob
request jdbc
jdbc file download
write to client disk
20
XML Interface for http Metadata Output
user applications
request URL
raw xml
xhtml
ISO xml
xsl mapping
http XML
xsql query
DC xml
... various metadata formats
see wini.wdc-climate.de
  • Metadata access via WWW
  • xsql query to DB
  • xml output from DB
  • xsl mapping to any metadata format

21
http Data Output
user applications
request URL
plain ASCII
html tables
binary objects
http plain, bin, html
Java Servlet
. . .
various data formats
  • Data access via WWW
  • URL parsed by servlet
  • query DB access by jdbc
  • response in any format

22
Connection to Mass Storage Archive
23
(No Transcript)
24
Oracle DBMS HSM
DXDB Unitree client on DB machines
for communication between Oracle DB and tape
archive
25
Use of DXDB
  • DXDB is used for
  • Ordinary Oracle datafiles
  • Redo logs
  • Backup

26
(No Transcript)
27
Migout / Migin
  • Migout takes place after files havent been
    modified for x minutes
  • Only one migout process per dxdb-filesystem
  • Migin takes place immediately after a file is
    requested. Only parts accessed are retrieved from
    the backend storage.
  • One migin process per requested file.

28
Purging
29
Pro
  • It works
  • Its fast
  • Applications dont have to wait until files are
    completely restored from tapes.

30
Contra
  • It works

- If the backend works
  • Dxdb not supported by Oracle
  • Oracle's officially supported Backend
    requirements do not necessarily match
    requirements from other applications like HSM
    systems (i.e. connection to Unitree is not
    standarised).

31
Summary
  • Efficient handling of detailed metadata
  • easy and structured administration of gt 60
    metadata tables
  • access supportJava Server Pages (JSP),
    Servlets, jdbc, xsqlincluding standard DB
    features (sql, views, triggers, ... )
  • Efficient handling of fine granularity data
  • random access to arbitrary time steps of single
    parameters
  • access supportJava Server Pages (JSP),
    Servlets, jdbcincluding standard DB features
    (authorisation, ... )
  • transparent migration of bulk data to tape

32
The Winter TopTen Program identifies the worlds
largest and most heavily used databases.
Email reached in September, 13th ..
Congratulations on achieving Grand Prize award
winner status (1) in Database Size, Other, All
and TopTen Winner status Database Size, Other,
LinuxWorkload, Other, Linux in Winter Corp.'s
2005 TopTen Program! ....... (1) Grand prizes
are  awarded for first place winners in the All
Environments categories only.
WDCC's CERA DB has been identified as the largest
Linux DB.
http//www.wintercorp.com/VLDB/2005_TopTen_Survey/
2005TopTenWinners.pdf
Write a Comment
User Comments (0)
About PowerShow.com