http://esd.lbl.gov/BWC/ - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

http://esd.lbl.gov/BWC/

Description:

Support End Science. Project Motivation. Data is now being ... Science Portal. Data Harvesting and. Transformations. Data Cleaning, Models, Analysis Tools ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 22
Provided by: maria280
Learn more at: http://www.ias.sdsmt.edu
Category:
Tags: bwc | esd | gov | http | lbl

less

Transcript and Presenter's Notes

Title: http://esd.lbl.gov/BWC/


1
http//esd.lbl.gov/BWC/
Designing CyberInfrastructure to Support End
Science
Deb Agarwal (UCB and LBNL) Catharine van Ingen
(MSFT) Berkeley Water Center Microsoft TCI
IndoFlux Meeting, Chennai, India, July 13, 2006
2
Project Motivation
  • Data is now being gathered into common data
    archives
  • Data archives provide an opportunity for
    cross-discipline and cross-site investigations
  • Data analysis techniques which worked well on
    small data sets often do not scale
  • Current CS tools have evolved in support of other
    disciplines Investigate their ability to
    facilitate data analysis

3
Distributed Data Sets
Science Portal
Building BWC Water Cyberinfrastructure to Connect
Data, Resources, and People
4


Data Providers Host Ameriflux Climate
Data Statsgo Soils Data MODIS products


Tools Statistical Graphical
Web Service Interface to Data and Tools
Choose Ameriflux Area/Transect, Time Range, Data
Type
Ecology Toolbox
Design Workflow
Data harvest Sites 1-16
Data Cleaning Tools
Web-based Workbench access
Compute Resources
Import other Datasets
Gap Fill, A technique
Gap Fill, B technique
Knowledge Generation Tools
Statistical graphical analysis
Climate Statsgo MODIS
Data Mining and Analysis Tools
Version control
Canoak Model Site 9
Canoak Model Site 1
LAI Temp Fpar Veg Index Surf Refl NPP Albedo
Modeling Tools
Network display LAI
Statistical Graphical analysis
Visualization Tools
Carbon Community Workbench
5
Approach
  • Work closely with the end scientists to define,
    prototype, and test the system
  • Provide a solution that leverages both
    server-based and local desktop/laptop
    environments
  • Leverage commercial tools to the extent possible

6
Some Critical Capabilities
  • Support for versioning of data sets
  • Work with multiple data sets
  • Advanced data selection and plotting capabilities
  • Select data relative to an event
  • Simple calculation across any specified date
    range
  • Statistical information available
  • Plots - scatter, diurnal, time series,
    probability density function, tiled, correlation
  • Ability to access capabilities from desktop

7
Data Pipeline
CSV Files
Excel Pivot Table and Chart
ORNL Ameriflux Site
Data Cube
BWC SQL Server Database
8
Data Cleaning and Versioning
Excel spreadsheet of current data
BWC SQL Server Database
Investigator updated spreadsheet
9
Analysis Services Data Cube
  • An organized view of the data
  • A multi-dimensional view into the data
  • Can integrate multiple data sources
  • Define measures and dimensions
  • Measure a value you want to be able to plot
  • Dimension An axis you want to be able to use to
    select data and as axis
  • Calculations define new measures

10
(No Transcript)
11
(No Transcript)
12
Precipitation trends and totals
Plot created by Gretchen Miller of UC Berkeley
  • Summer precipitation
  • Tonzi and Vaira 2 of total
  • Metolius 24 of total
  • Walker Branch 40 of total

13
Other applications
Plot created by Gretchen Miller of UC Berkeley
14
Observations by latitude
Plot created by Gretchen Miller of UC Berkeley
15
Observations by ecosystem type
Plot created by Gretchen Miller of UC Berkeley
16
Some Lessons Learned so Far
  • Data naming and unit consistency is critical to
    easy ingest of large amounts of data
  • Commercial tools do not necessarily provide all
    the right analysis capabilities directly
  • Scaling capabilities of the tools not yet clear
  • We will need tools to aid in notification of PIs

17
Portal Deployment
  • Behind the portal are a collection of databases
    and data cubes
  • Distribution for ease of use
  • Only see the data of interest
  • Private data remains stable
  • Distribution for scaling
  • Smaller queries on smaller databases take less
    resources
  • Larger databases and cubes can be replicated
    across machines
  • Batch job like infrastructure for managing very
    long running queries

18
(No Transcript)
19
Acknowlegements
  • Science Team
  • Dennis Baldocchi
  • Bev Law
  • Gretchen Miller
  • Cyberinfrastructure
  • Matt Rodriguez
  • Monte Goode
  • Microsoft
  • Tony Hey
  • Nolan Li
  • Oak Ridge National Lab CDIAC personnel
  • Berkeley Water Center
  • Yoram Rubin
  • Susan Hubbard

20
URLs and Connection Coordinates
  • Web Site
  • http//esd.lbl.gov/BWC
  • Blog
  • http//dsd.lbl.gov/BWC/amfluxblog
  • E-mail
  • bwc-tci_at_lists.berkeley.edu

21
http//esd.lbl.gov/BWC/
Write a Comment
User Comments (0)
About PowerShow.com