Natasha Balac and Roman Olschanowsky - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Natasha Balac and Roman Olschanowsky

Description:

Natasha Balac and Roman Olschanowsky – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 18
Provided by: sdsc
Category:

less

Transcript and Presenter's Notes

Title: Natasha Balac and Roman Olschanowsky


1
  • Natasha Balac and Roman Olschanowsky
  • Data Application Group
  • User Services and Development Department
  • San Diego Supercomputer Center

http//datacentral.sdsc.edu
2
What is Data Central?
  • Data Central makes it possible to store, manage,
    analyze, share and publish data collections,
    thereby enabling access and collaboration in the
    broader scientific community
  • Eligible researchers can request a data
    allocation from SDSC (with or without a compute
    allocation) that permits expanded access to
    SDSC's Data Central facilities and services for
    data collections management, data analysis and
    data mining

http//datacentral.sdsc.edu
3
Why SDSC Data Central?
  • Todays scientists and engineers are increasingly
    dependent on valued community data collections
    and databases
  • SDSC has experienced increasing demand by the
    domain communities for collaborations on data
    management including
  • publishing of data in digital libraries
  • sharing of data through the Web and data grids
  • creating, optimizing, porting large scale
    databases
  • analyzing and data mining large scale data

http//datacentral.sdsc.edu
4
A Deluge of Data
  • Today, data comes from everywhere
  • Scientific instruments
  • Experiments
  • Sensors and sensor nets
  • New devices
  • And is used by everyone
  • Scientists
  • Consumers
  • Educators
  • General public
  • IT environments must support unprecedented
    diversity, globalization, integration, scale, and
    use

Life Sciences
Preservationand Archiving
Astronomy
http//datacentral.sdsc.edu
5
What does SDSC Data Central offer?
  • SDSC has been actively working with and
    collaborating with many researchers and national
    scale projects in their data management efforts
  • We offer Expertise and Resources for
  • Public Data Collections and Database Hosting
  • Long-term storage (tape and disk)
  • Remote data management and access (SRB)
  • Data Analysis and Data Mining
  • Professional, qualified 24/7 support

http//datacentral.sdsc.edu
6
SDSC Data Resources
  • 540 TB Storage-area Network (SAN)
  • 1 PB On-line disk
  • 6 PB StorageTek tape library capacity
  • DB2, Oracle, MySQL
  • Storage Resource Broker
  • Gpfs-WAN with 226 TB

Petabyte-scale high-performance tape storage
system
High-performance SATA SAN disk storage system
http//datacentral.sdsc.edu
7
Data Resources Available through DataCentral
  • Disk
  • 400 Terabytes SATA SAN Fibre Channel Attached
  • Enables multiple high-end computers, using a
    range of operating systems, to share data rapidly
    and seamlessly
  • Growing data storage capabilities are integrated
    with high-end computational resources such as
    SDSCs 15.6 Teraflop DataStar IBM supercomputer
    and parallel I/O
  • Accessible Mounted, Web, SRB, GridFTP
  • Tape
  • 6 Petabyte Capacity High Speed Robotic Silos
  • Disk cache front end, transparently mounted via
    Sun SAMQFS file system
  • Accessible Mounted, Web, SRB, GridFTP

http//datacentral.sdsc.edu
8
Data Resources Available through DataCentral
  • Databases
  • DB2, Oracle, MySQL servers
  • High Availability, High Performance
  • Accessible Standard RDMS connectivity, client
    software installed on most systems
  • Software
  • Storage Resource Broker (SRB) State-of-the-art
    data management and collaboration software for
    grid file access
  • Powerful software applications covering a range
    of disciplines including bioscience, geoscience,
    astronomy, chemistry, medicine, etc.
  • A wide array of data analysis, mining and
    visualization tools

http//datacentral.sdsc.edu
9
Data Resources Available through DataCentral
  • Expertise in
  • High performance large data management
  • data migration
  • database application tuning, porting and
    optimization
  • SQL query tuning
  • portal creation and collection publication
  • schema design
  • database selection (Oracle, DB2, MySQL,
    PostgreSQL)
  • data migration, upload and sharing through the
    grid
  • data analysis and mining

http//datacentral.sdsc.edu
10
Data Resources Available through DataCentral
Quality User Support
  • Consulting
  • Phone, Web, e-mail
  • M-F, 9 a.m. - 5 p.m.
  • 24x7 Help Desk/Operational Support
  • Training
  • Documentation
  • User Portals
  • Targeted Optimization and Porting (TOP)
  • Strategic Applications Collaborations (SAC)
  • Strategic Applications Collaborations (SAC)
  • Strategic Community Collaborations (SCC)

http//datacentral.sdsc.edu
11
Strategic Collaborations
  • Strategic Data Applications Collaborations (SDAC)
  • SDSC expert staff paired with domain scientists
    for projects lasting 3-12 months
  • Strategic Community Collaborations (SCC)

http//datacentral.sdsc.edu
12
Enabling Data Science
  • Many users with large data needs
  • extend above and beyond what their home
    environments
  • increasingly dependent on valued community data
    collections and databases used community-wide
  • Experiencing increasing demand by the domain
    communities for collaborations on
  • publishing of data in digital libraries
  • sharing of data through the Web and data grids
  • creating, optimizing, porting large scale
    databases
  • analyzing and data mining large scale data
  • Comprehensive data environment that incorporates
    access to the full spectrum of data enabling
    resources

http//datacentral.sdsc.edu
13
SDSC Data Allocations Environmentdatacentral.sdsc
.edu
Services
Parallel File-system High-speed, Temporary
Data Parking (SAN) High-speed, Short-term
Data Collections (SATA) Moderate-speed, Long-ter
m
Data Sharing (SATA) Moderate-speed, Medium-term
Disk
Local Back-up (e.g., Tape)
HPSS/SAMFS
Offsite Back-up
http//datacentral.sdsc.edu
14
Data Science Support Systems
Archival Systems
Blue Gene/L (Due 12/04)
6 PB
DataStar IBM Power4
Expertise, Networking, Visualization, Storage
and Compute Resources
2.8/5.7 TF
10.4 TF
http//datacentral.sdsc.edu
15
Partial list of databases and data collections
currently housed at SDSC
  • Protein Data Bank (protein data)
  • National Virtual Observatory (astronomical data)
  • UCSD Libraries Image Collegion (ArtStore)
  • National Science Digital Library (education
    collection)
  • SCEC (earthquake data)
  • BIRN (neuroscience data)
  • Encyclopedia of Life (genomic data)
  • TreeBase (phylogeny and ontology information)
  • Transport Classification Database (protein
    information)
  • Library of Congress data
  • CKAAPS (protein evolutionary information)
  • AfCS Molecule Pages (protein information)
  • SLACC-JCSG (structural genomics data)
  • APOPTOSIS DB (proteins related to cell death
    data)
  • NAVDAT (geochemistry data)
  • QRC (NSF data on Supercomputer Centers and PACI)
  • Network Topology Data (Skitter project)
  • UC Merced Library
  • Biology Workbench Databases (mirrors and
    originals of over 80 biology databases)
  • 2 Micron All Sky Survey (astronomy data)
  • Digital Palomar Observatory Sky Survey Collection
    (astronomy data)
  • Sloan Digital Sky Survey Collection (astronomy
    data)
  • Interpro Mirror (protein data)
  • HPWREN (Wireless Network Network Analysis Data)
  • HPWREN (sensor network data)
  • Security logs and archives (security information)
  • EarthRef Digital Archive (earth science
    information)
  • GERM (earth reservoir information)
  • Braindata (Rutgers neuroscience collection)
  • HyperLTER (hyperspectral images)
  • SIO-Explorer (oceanographic voyages)
  • Transana (classroom video)
  • WebBase (web crawls)
  • Alexandria Digital Library (photographs)
  • Backskatter Data (from UCSD network telescope)
  • Digital Earth Data Library (earth sciences
    related datasets)
  • GEON (PaleoGeographic Atlas project)
  • IMDC (Internet measurement data catalog)
  • Seamount Catalogue (bathymetric seamount maps)
  • Hayden Planetarium Collection (astronomical data)
  • TeraGrid Data (science and engineering
    collections)
  • Biocyc (collection of pathway/genome DBs)
  • Digital Embryo (human embryology)
  • National Archives (persistent archive)
  • San Diego Conservation Resources Network
    (sensitive species map server)
  • LDAS (land data assimilation system)
  • ROADNET (sensor data)
  • NPACI Data Grid (scientific simulation output)
  • Salk (biology data archive)
  • Backbone Packet Header Traces (OC48, OC12)
  • Teragrid (science and engineering collections)
  • CHRONOS (analytical tools for chronostratigraphy)
  • ERESE (educational Earth science portal)
  • TeraBridge (Sensor stream data)
  • C5 Landscape (UCSD Art dept)

http//datacentral.sdsc.edu
16
Getting an Allocation Its Free!
  • Who should apply?
  • Open to researchers affiliated with US
    educational institutions
  • Proposals merit-reviewed quarterly by Data
    Allocations Committee
  • Types of Allocations
  • Expedited Allocations
  • 1 TB or less of disk tape 1st year
  • 5 GB Database 1st year
  • Yearly review
  • Medium Allocations
  • Under 30 TB
  • Large Allocations
  • Larger than 30 TB
  • Data Allocations
  • Getting Started http//datacentral.sdsc.edu

17
Thank You
  • SDSC Data Resources and Allocations
  • http//datacentral.sdsc.edu/
Write a Comment
User Comments (0)
About PowerShow.com