Title: Distributed Environmental Data Analysis System DEDAS
1Distributed Environmental Data Analysis
System(DEDAS)
- Special Interest Group on Data Integration
2Environmental Megatrends
- Short to long range transport. Pollutants (e.g.
ozone, PM2,5, POPs) travel across state and
national boundaries. - New regulatory approach. Compliance evaluation is
now based on weight of evidence and the
effectiveness of controls need to be tracked. - From command control to participatory
management. The participating stakeholders now
include federal, state, local, industry and
international members
3The Air Quality Managers Challenge
- Broader user community. The information systems
need to be extended to reach all the stakeholders
(federal, state, local, industry, international) - A richer set of data and analysis. Establishing
causality, weight of evidence, emissions
tracking requires the analysis of air quality,
meteorology emissions and effects data. - Increasing demand for analysis. Secondary
pollutants along with more open environmental
management style require broader and more
detailed data analysis.
4The Researcher/Analysts Challenge
The researcher cannot get access to the data if
he can, he cannot read them if he can read them,
he does not know how good they are and if he
finds them good he cannot merge them with other
data. Information Technology and the Conduct of
Research The Users view National Academy Press,
1989
Air Quality Data Integration and Living Data
Inventory
5Opportunities
- Rich AQ data availability. Abundant high-grade
routine and research monitoring data from EPA and
other agencies are now available. - New information technologies. Effective data
management along with distributed analysis,
exploration and communication tools allows
cooperation (sharing) and coordination among
diverse groups. - More Cooperative Spirit. The stakeholders
increasingly recognize the need and the benefits
of collaboration(sharing) and coordination.
6Air Quality Management Sensory Data to Action
Multi-sensory data are collected through
Monitoring and delivered for Assessment
Assessment performs data analysis to turn data
into useful knowledge for decision making and
actions
7Analysis From Raw Data to Refined Knowledge
Data Refinery Data analysis can be viewed as a
refinery that transforms raw sensory data into
knowledge usable for management Multi-step
processing. The data refining has many parallel
and sequential steps, usually performed by
different analysts. Value-Adding Chain. Each
step in the analysis is part of a value-adding
chain.
- Example data to knowledge refining
- Environmental Status Report
- Primary data are gathered from providers of
sensory data - Data are filtered, aggregated and fused into
secondary data, figures, tables - Report describes pollutant pattern and possibly
causality
8(No Transcript)
9Environmental Data and Use Features
- Multidimensional. The key data dimensions are
space (x,y,z) and time (t). - Need for current and historical data. Daily
(hourly) as well as long-term strategic
management decisions need to be supported. - Data from many sources. For full context, data
from multiple sources need to be combined and
analyzed, e.g. - Air quality data (collected by many federal,
state and local agencies) - Weather data (from the National Weather Service)
- Possibly satellite data (from NASA or NOAA)
10Distributed Environmental Data Analysis
SystemDEDAS
- Specifications
- Use standardized form of data, metadata and
access protocols - Support distributed data archives, each run by
its own providers - Provide tools for data exploration, analysis and
presentation - Features
- The data are organized as multidimensional data
cubes - The dimensional data cubes are distributed but
shared - Analysis is supported by built-in and user
functions
11A Possible Architecture of DEDAS
- There are four types of nodes in the system Data
Providers, Organizers, Transformers and Users. - The Users receive data on demand from the
Providers through DEDAS
12The DEDAS CastData Providers, Organizers,
Transformers and Users.
- Data Providers supply primary data to system,
through SQL or other data servers. - Data Organizers populate the data cubes with
primary data from the Providers - Transformers add value to the primary data by
processing (e.g. filtering, aggregation, fusion).
They produce secondary data in virtual data
cubes accessible to the users - Users are the analysts who access the DEDAS and
produce knowledge from the data
13The Data Warehouse
14User Interaction with Data Cubes
15Benefits of DEDAS
- Access to data. Data in DEAS can be easily found,
accessed, processed and presented. - Recycling data. Data are costly resource. The
system can help managing, accessing and
documenting one's own data, and sharing it with
others for re-use. - Saving time and money. The data, tools and other
resources in the shared system could be
leveraging the dollars and time available for
specific projects.
16- The output from individual sensors is collected
and archived by many - different organizations, like EPA, NASA, USGS as
well as state and local - agencies. Even though most organizations ere
eager to share their data, the - actual data sharing is very tedious and
inefficient There are no general - data formatting and access standards, so the
process is done by hand, the - hard way.
- To get to the point, I think that environmental
data management and analysis - could benefit greatly from a distributed OLAP
approach. All tree aspects of - distributed data usage are now falling in place
- (1) multidimensional data storage and query
processing, OLAP - (2) standard data description and transmission
protocols, XML - (3) multi-platform data viewers, Java