Database File Systems in Support of eScience - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Database File Systems in Support of eScience

Description:

Database File Systems in Support of eScience Philip A. Adams LLNL/National Ignition Facility John C. Hax Oracle Corporation ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 38
Provided by: TID4
Category:

less

Transcript and Presenter's Notes

Title: Database File Systems in Support of eScience


1
  • Database File Systems in Support of eScience

Philip A. Adams LLNL/National Ignition
Facility John C. Hax Oracle Corporation
2
Science A product of data analysis
  • Science does not result from the launch of a
    mission or the collection of data. Rather,
    science only occurs through the analysis and
    understanding of that data.
  • - Philosophy of the NASA Science Mission
    Directorate (SMD)

3
Questions to Ask
  • Are we building IT Systems that support Research
    and Analysis or Infrastructure that supports the
    collection of data?

4
Scientific Computing History
Commercial Relational Databases
Scientific Systems
  • Scientific (minimal data shared)
  • Raw Data
  • Decentralized/Desktop Management
  • Open source software
  • Low quality of support/service
  • Best Effort
  • Mission critical operations
  • Primarily file based HDF5,Lustre
  • Millions of Files
  • Write once, read many
  • Background processing
  • Pipelines
  • Computationally intensive applications
  • Long running transactions
  • Output of Large Data Sets
  • Single application profile

vs.
  • Enterprise (all data shared)
  • Metadata
  • Centralized management
  • Industrial strength software
  • High qualities of support/service
  • SLA guarantees
  • Mission Critical Operations
  • Mission critical operations
  • Databases files
  • Read and Update
  • Enforced data integrity
  • Interactive processing
  • Interactive workflows
  • Transactional, intensive applications
  • Short running transactions (lt8 hours)
  • Output of Individual Rows
  • Mixed application profile

5
Filesystems and Legacy Databases The Gap
Database Benefits
Filesystem Benefits
vs.
Superior query/search capability over
filesystems SQL standard Easy
manipulation of data Functions
PL/SQL Java, C, PHP, Perl Low
latency, interactive data access suited for
application access Provides a structured way of
storing data and ensuring data integrity
Tables/Constraints Superior backup and
recovery capabilities RMAN,
Redo/Archive logging Block and
Point-in-Time Recovery Block Level
Corruption Detection Institutional Resources
  • Provided maximum scalability to meet data
  • volume and ingestion requirements
  • HDF5
  • GFS (Google Filesystem)
  • Lustre
  • Ubiquity of accessing filesystems
  • Number of protocols
  • NFS, SMB, CIFS and FTP
  • Able to access the data right from the OS
  • Windows, Mac, Linux, Solaris, HP/UX
  • Application programming interfaces
  • support native access
  • file open (f_open), file close (f_close)
  • importing the java io package
  • ifstream/ofstream C file I/O classes

6
Data Challenges
  • Physical Limitations
  • I/O Intensive - limitations on max IOPS
  • Network speeds - time to ship data to compute
    nodes
  • Multiple Data Silos
  • Governance issues
  • Pedigree of the data
  • Multiple access policies to get to the data
  • Duplicate data stored in each silo
  • Need to scale disparate systems as data grows
  • Increased effort required for Scientists,
    Developers, Administrators
  • Correlating the data across data silos
  • Coordinated backup and recovery plan
  • Multiple Data Aggregation Efforts

7
The Result The Split Architecture a step in
the wrong direction
  • These drawbacks include but are not limited to
  • Data curation
  • Security
  • Availability
  • Recoverability
  • Manageability
  • Because no common database and filesystem access
    protocol was available, the burden shifted to the
    application developers and scientific researchers
    to make sense of the two silos of information

8
How much of an issue is this?
  • Level 0 (Raw) data is typically enriched with
    data from other sources.
  • What happens when/if a diagnostic is found to
    have incorrect calibration data?
  • Without strict relationships, this could be a
    nightmare. It may be easier to rerun analysis to
    reproduce the Level 1, 2 and 3 data. However, an
    unknown quantity of Level 4 content has been
    generated from this data and is stored on many
    researchers workstations and file shares.

Lack of pedigree in data analysis can result in
instrument/machine damage, increased financial
costs, or embarrassment to scientific researchers
who rely on the data
9
Future of Scientific Computing and Analysis
  • Data Intensive


Collaborative
Data Intensive Collaborative Science
10
Data Intensive Collaborative Science
Cost
Complexity
Knowledge Base
Interdependence
Drivers
Collaboration
Enablers
The Web
Network Capacity
Clustering/ Grid Technologies
Moores Law
Standards
11
Whats driving the data volumes?
  • Better and more diverse instrumentation
  • Flexible optics
  • Coordinated multi-instrument observatories
  • Increased Precision
  • Genomics
  • Diverse types of data generated SQL/Scalar, XML,
    Image, Monte Carlo simulations, Audio/Video,
    telemetry, and spectrometers

12
Database Filesystems
  • Bridge the Gap between Filesystems and Relational
    Database Systems
  • Maintain Filesystem Performance
  • Leverage multiple access methods
  • Single Security Mechanism
  • Unified Administrative Tools
  • Data Pedigree
  • Unified Architecture and Skill sets
  • Leverage Institutional Resources for IT
  • Enabling Collaboration around Data
  • Optimized for Data Access

13
Pedigree with a database filesystem
3/3/2017
13
14
Modern databases have much to offer in the realm
of data analysis
  • RDF/OWL can allow semantic searching of data
  • Predictive Analytics
  • Spatial Data Analysis
  • Text Mining of Unstructured Content

15
Some of the native data mining techniques and
algorithms available
  • Algorithms
  • Logistic Regression
  • Naive Bayes
  • Support Vector Machine
  • Decision Tree
  • Multiple Regression
  • Minimum Description Length
  • One-Class Support Vector Machine
  • Enhanced K-Means
  • Orthogonal Partitioning Clustering
  • Apriori
  • Non-negative Matrix Factorization
  • Technique
  • Classification
  • Regression
  • Attribute Importance
  • Anomaly Detection
  • Clustering
  • Association
  • Feature Extraction

16
Key Components of Secure Files Architecture
  • Delta Update
  • Write Gather Cache
  • Transformation Management
  • Inode Management
  • Space management
  • I/O Management

Finally the database can accept both structured
and non-structured data in an efficient manner
17
National Ignition Facility
18
UCRL-PRES-236394
National Ignition Facility and 11g SecureFiles
NLIT 2009
Philip A. Adams Sr. Systems ArchitectNational
Ignition Facility Lawrence Livermore National
Laboratory June 1-3 2009
This work performed under the auspices of the
U.S. Department of Energy by Lawrence Livermore
National Laboratory under Contract
DE-AC52-07NA27344
19
Overview of the National Ignition Facility
  • The National Ignition Facility (NIF) is known as
    the worlds largest and most energetic laser
  • When fully operational, its 192 beams will
    converge 1.8 MJ of laser energy onto a single
    target to achieve thermonuclear ignition
  • NIF will enable experiments that produce
    temperatures and densities like those in the Sun
    or in an exploding nuclear weapon

NIF-1107-14129.ppt
Oracle, 11/12/07
19
20
Overview of the National Ignition Facility
  • The 192 laser beams of NIF will generate
  • A peak power of 500 trillion watts, 1000 times
    the electric generating power of the United
    States
  • A pulse energy of 1.8 million joules of
    ultraviolet light
  • A pulse length of three to twenty billionths of a
    second

NIF-1107-14129.ppt
Oracle, 11/12/07
20
21
The Optics make NIF work
  • Optical components
  • 7500 large optics including 3072 laser glass
    slabs as well as large lenses, mirrors, and
    crystals
  • More than 15,000 small optical components
  • Precision optics
  • Total area of 33,000 square feet (3/4 of an acre)
  • More than 40 times the total precision optical
    surface in the worlds largest telescope (Keck
    Observatory, Hawaii)

NIF-1107-14129.ppt
Oracle, 11/12/07
21
22
Example of Optic Damage
3 ns
2 µm
NIF-1107-14129.ppt
Oracle, 11/12/07
22
23
On high quality optical surfaces initiated damage
sites are very small
NIF-1107-14129.ppt
Oracle, 11/12/07
23
24
Performance Gains found in NIF with 11g
SecureFiles
  • Test Environment
  • Database Server
  • HP Blade Server w/ 4-way AMD Opteron CPUs
  • RHEL 4 32-bit kernel
  • 11g Oracle Database 32-bit version
  • Single Instance
  • ASM
  • Dual Port Fibre Channel Mezzanine Card (2 Gbit)
  • Application Server
  • Dell PowerEdge 2650 w/ 2-way Intel Xeon CPUs
  • RHEL 4 32-bit kernel
  • 10g Oracle Application Server
  • 10g Oracle CMSDK (Content Management Software
    Development Toolkit)

NIF-1107-14129.ppt
Oracle, 11/12/07
24
25
Performance Gains found in NIF with 11g
SecureFiles
  • Test Environment
  • SAN Storage
  • 3PAR S400
  • Production Environment
  • 11g RAC Environment
  • 10g CMSDK Clustered Application Server
    Environment

NIF-1107-14129.ppt
Oracle, 11/12/07
25
26
Measure the throughput of the environment
  • Perform dd tests to the disks to establish the
    theoretical max
  • WRITE
  • gt dd if/dev/zero of/dev/raw/raw6 count10000
    bs1M
  • READ
  • gt dd if/dev/raw/raw6 if/dev/null count10000
    bs1M
  • MONITOR
  • gt iostat xdk 3 100
  • We saw 180 MB/sec Read/Write throughput to the
    disks

Warning Be sure not to perform dd write tests on
your ASM configured storage or else youll damage
it
NIF-1107-14129.ppt
Oracle, 11/12/07
26
27
Create a few test tables
  • Create a test table for BasicFiles and a test
    table for SecureFiles

BasicFile Example CREATE TABLE
"FOO_BASICFILE_TABLE" ( "PKEY" NUMBER(4) NOT
NULL , "DOCUMENT" BLOB) TABLESPACE "LOB_DEMO"
LOB ("DOCUMENT") STORE AS BASICFILE (
TABLESPACE "LOB_DEMO") SecureFiles
Example CREATE TABLE "FOO_SECUREFILE_TABLE" (
"PKEY" NUMBER(4) NOT NULL , "DOCUMENT" BLOB)
TABLESPACE "LOB_DEMO" LOB ("DOCUMENT") STORE
AS SECUREFILE ( TABLESPACE "LOB_DEMO")
NIF-1107-14129.ppt
Oracle, 11/12/07
27
28
Throughput Results of Table Tests
  • Speed tests from database server (Oracle 11.1.0
    DB, using Oracle jdk 1.5.0_11 in OH/jdk, using
    ojdbc5.jar)
  • Inserting twenty 32MB image files per test

29
SecureFile vs. BasicFile Server Results
NIF-1107-14129.ppt
Oracle, 11/12/07
29
30
Measure the throughput of the network
  • Used a tool called iperf available at
  • http//sourceforge.net/projects/iperf/
  • On Server run
  • ./iperf -s fM
  • On Client run
  • ./iperf -f M -c blackstone
  • --------------------------------------------------
    ----------
  • Client connecting to blackstone.llnl.gov, TCP
    port 5001
  • TCP window size 0.06 MByte (default)
  • --------------------------------------------------
    ----------
  • 5 local XXX.XXX.XXX.XXX port 58590 connected
    with XXX.XXX.XXX.XXX port 5001
  • ID Interval Transfer Bandwidth
  • 5 0.0-10.0 sec 1120 MBytes 112 MBytes/sec

NIF-1107-14129.ppt
Oracle, 11/12/07
30
31
Throughput Results of Client-Server Tests
  • Speed tests from database server (Oracle 10.1.2
    Client, using jdk 1.5.0_11 and ojdbc14.jar)
  • Inserting twenty 32MB image files per test

32
SecureFile vs. BasicFile Client Results
33
SecureFile Performance Benefits
  • During our testing, weve seen a 2-20 times
    increase in performance using SecureFiles over
    traditional BasicFiles
  • Weve seen equivalent or better performance using
    SecureFiles as we see writing the same file to
    our NFS mounted NetApp

NIF-1107-14129.ppt
Oracle, 11/12/07
33
34
Database Tuning to optimize for SecureFiles
  • Create a separate tablespace for your LOB data
  • Use Uniform Extents 1M seems best overall
  • Tried 32M/64M extents with no performance
    increase your mileage may vary
  • Enable Automatic Segment Space Management on the
    tablespace
  • Create large enough redo log files
  • We used 200M 1024M to reduce log file switches
    during heavy loads

NIF-1107-14129.ppt
Oracle, 11/12/07
34
35
Database Tuning to optimize for SecureFiles
  • Utilize the AWR Snapshots before and after a
    SecureFile load and note the wait conditions
  • SQLgt EXECUTE dbms_workload_repository.create_snaps
    hot() 
  • PL/SQL procedure successfully completed
  • Run the AWR report
  • ORACLE_HOME/rdbms/admin/awrrpt.sql

NIF-1107-14129.ppt
Oracle, 11/12/07
35
36
Conclusion
  • The ultimate goal of science is to create new
    knowledge and new discoveries.
  • Database Filesystems have a number of features
    which can benefit the scientific community and
    ease the burden of pedigree, data management, and
    analysis
  • Using a database filesystem will enable data
    intensive collaborative science.
  • As new discoveries are made and data volumes
    increase, it is imperative to have a robust
    database system that is not only capable of
    managing the pedigree of that data, but also
    serve as a knowledge repository for the future.

37
For More Information
http//search.oracle.com
SecureFiles
or http//www.oracle.com/
Write a Comment
User Comments (0)
About PowerShow.com