Title: Semantics
1eScience Opportunities for Applied Computing
Dan Fay www.microsoft.com/science
2Science _at_ Microsoft
Life Sciences
Social Sciences
Earth Sciences
Accelerating Discovery
New Materials,Technologies Processes
MultidisciplinaryResearch
Computer Information Sciences
Math andPhysical Science
3A Data Deluge in Science
- Data collection
- Sensor networks, satellite surveys, high
throughput laboratory instruments, astronomical
telescopes, supercomputers, LHC - Data processing, analysis, visualization
- Legacy codes, workflows, data mining, indexing,
searching, graphics - Archiving
- Digital repositories, libraries, preservation,
SensorMap Functionality Map navigation Data
sensor-generated temperature, video camera feed,
traffic feeds, etc.
Scientific visualizations NSF Cyberinfrastructure
report, March 2007
4Emergence of a New Research Paradigm?
- Thousand years ago Experimental Science
- Description of natural phenomena
- Last few hundred years Theoretical Science
- Newtons Laws, Maxwells Equations
- Last few decades Computational Science
- Simulation of complex phenomena
- Today eScience or Data-centric Science
- Unify theory, experiment, and simulation
- Using data exploration and data mining
- Data captured by instruments
- Data generated by simulations
- Data generated by sensor networks
- Scientists over-whelmed with data
- Computer Science and IT companies
- have technologies that will help
- (With thanks to Jim Gray)
5The Perfect Data Storm
- The era of remote sensing, cheap ground-based
sensors and web service access to agency
repositories is here
- Extracting and deriving the data needed for the
science remains problematic - Specialized knowledge
- Finding the right needle in the haystack
6The Data Pipeline
7Dynameomics
High-throughput molecular dynamics to simulate
representative proteins from all known folds
- Valerie Daggett University of Washington
- Perform MD simulations of representatives of all
folds (41K structures in PDB ? 1130 fold
families) - Top 30 folds - Many are potential biomedical
targets - Current Status
- gt 650 proteins simulated
- gt 4744 simulations
- gt 64 TB of data
- gt 1.26x108 structures
- Housed in novel hybrid SQL/OLAP database using
SQL Server - We invite you to experience it!
www.dynameomics.org
8The Cosmic Genome Project
- The Sloan Digital Sky Survey is the first major
astronomical survey project - 5 color images of ¼ of the sky
- Pictures of 300 million celestial objects
- Distances to the closest 1 million galaxies
- Jim Gray from Microsoft Research worked with
astronomer Alex Szalay to build the public
SkyServer archive for the survey - New model of scientific publishing
- Have to publish the data before astronomers
publish their analysis
9Public Use of the SkyServer
- Posterchild in 21st century data publishing
- 380 million web hits in 6 years
- 930,000 distinct usersvs 10,000 astronomers
- 1600 scientific papers
- Delivered 50,000 hoursof lectures to high
schools - Delivered 100B rows of data
- Citizen Science GalaxyZoo
- Goal of 1 million visual galaxy classifications
by the public - Allows general public to search for photographs
and classify different types of galaxies
10Hanny van Arkles Voorwerp
11World Wide Telescope
Seamless Rich Social Media Virtual Sky Web
application for science and education
- Participants
- Alyssa Goodman Harvard University
- Alex Szalay Johns Hopkins University
- Curtis Wong, Jonathan Fay Microsoft Research
- Goals
- Integration of data sets and one-click contextual
access - Easy access and use
- In just over a little more than two months, a
million users have downloaded, installed and
launched the application (2,206,497 unique
sessions) - We invite you to experience it!
www.worldwidetelescope.org
12Berkeley Water Center
Understanding regional hydrology
- Project Organization
- Jim Hunt, Dennis Baldocchi, UC Berkeley
- Deb Agarwal, Lawrence Berkeley Laboratory
- Catharine van Ingen, MSR
- Goals
- Enable rapid scientific data browsing for
availability and applicability - Enable environmental science via data synthesis
from multiple sources
- Progress
- Environmental Data Server, www.fluxdata.org
(SharePoint), serves 921 site years of
carbon-climate field data from 160 field teams
to 60 paper writing teams (800M values) - Multiple projects now leveraging same SQL Server
database and data cube approach - CUAHSI consortium 100 universities collaborating
on hydrology
13Carbo-Climate Synthesis (BWC Dennis Baldocchi et
al)
- Sharepoint site www.fluxnet.org
- 921 site-years of data from 240 sites around the
world 80 site-years now being added - American data subset is public and served more
widely - Summary data greatly simplify initial data
discovery - Communal field science each investigator acts
independently. - Cross site studies and integration with modeling
increasingly important
14Browsing For Data Availability, Applicability,
Early Science
15Browsing the Whole Dataset
Daily Rg 2005, 72 sites
Daily Rg 2000-2006, 200 sites
16(No Transcript)
17(No Transcript)
18Data Depot Social Data Aggregation and Analysis
- http//datadepot.msresearch.us
Removal of CO2 from the air, by latitude over the
course of a year.
Sensors
Phones
Applications
Internet
Web datadepot.msresearch.us Contact
counts_at_microsoft.com
19Trident Scientific Workflow WorkbenchUniv. of
Washington and Monterey Bay Aquarium Research
Institute
Scientific workflow workbench to automate the
data processing pipelines of the worlds first
plate-scale undersea observatory
- Goals
- From raw data to useable data products
- Focusing on cleaning, analysis, re-gridding,
interpolation - Support real time, on-demand visualizations
- Custom activities and workflow libraries for
authoring - Visual programming accessible via a browser
- Trial Cloud Services for science
- Proof Points
- A scientific workflow workbench for a number of
science projects, reusable workflows, automatic
provenance capture. - Demonstrate scientific use of Windows WF, HPCS,
SQL Server and Cloud Service SSDS
20Resources
- Microsoft Research
- http//research.microsoft.com
- Microsoft Research downloads http//research.micr
osoft.com/research/downloads - Science at Microsoft
- http//www.microsoft.com/science
- Scholarly Communications
- http//www.microsoft.com/scholarlycomm
- CodePlex
- http//www.codeplex.com
- The Faculty Connection
- http//www.microsoft.com/education/facultyconnecti
on - MSDN Academic Alliance
- http//msdn.microsoft.com/en-us/academic
21(No Transcript)