Title: The Canopy Database Project Tools for Research
1The Canopy Database Project Tools for Research
Information Integrationhttp//canopy.evergreen.
edu
- Judy Cushing, Nalini Nadkarni
- Mike Finch, Anne Fiala
- Youngmi Kim, Aaron Crosland and others
- The Evergreen State College
Collaborating Ecologists Collaborating Computer Scientists
Van Pelt, Bond, Dial, Ishii, Keim, Parker, Shaw, Sillett, Sumida, et al Dave Maier, Lois Delcambre, Travis Brooks (OHSU)
Collaborating LTER Information Mangers
Eda, Nicole, Kristin, Ken, Jonathan, James Brunt and others?
NSF CISE and BIO 04-xxx, 03-xxx, 01-31952,
01-9309 99-75510, 9630316, 93-07771
2Canopy DB Vision
PI IM use ofdatabase technology components
can ease metadata provision, data validation
and archiving, and data mining for synthesisBUT
- Researchers arent programmers.
- The technology must be easy to use
-
- increase research productivity.
3The Underlying IdeaDatabase Design with Domain
Specific Components
- Validate generated databases with rules
- e.g., Stem
- depends on study area, plot
- includes species table
-
Capitalize on core components for
tools Visualization, Metadata Provision, Data
Acquisition Validation, research protocol,
statistical analysis.
4Approach
- Pathfinder Projects
- Ecologists design carry out field research at
several sites. - Find research, archiving and data mining
bottlenecks. - Determine spatial data structures.
- Reverse-engineer components.
- Database Tools for the Field Ecologist
- Design field databases DataBank.
- Visualize data using those databases
CanopyView. - Lab-specific metadata acquisition.
- Hand-held (palm pilot) field data acquisition.
- Reality-check with LTER Information Managers.
- Web Accessible Research Reference -- BCD
5Research BottlenecksDatabase Technology for
Researcher Productivity
Metadata Generation
- Archive in Lab(common types)
Data Visualization
Statistical analysis
Data validation (against metadata)
Data and metadata capture
- Database and Protocol Design
- Research Reference Tools
Information Synthesis
6Recent Work
- Finding maintaining the components
- Ecology Theory spatial categorization of the
Canopy - Template Editor
- Refine existing software
- Template-embedded semantic metadata, carried
forward - DataBank now stand alone
- Generate Excel, as well as Access and other RDBMS
- New visualizations
- Collaborate with other eco-informatics projects
- Closer integration with EML, Morpho
- LTER IM Collaboration Kaplan, Melendez-Colom,
Ramsey, Vanderbilt, Walsh. - Outreach to computer science community agencies
- NSF/USGS/NASA/EPA/ JIIS special issue dg.o
7Future Work
- Carry out collaborative field studies
- Develop and test synthesis hypotheses
- Develop theoretical constructs on canopy
structure-function - Develop statistical protocols that guide study
design - Create and enhance informatics tools
- Build theory-based components
- Build better UIs, data import validation, more
visualization - Build parameterized queries for standard
statistical scripts - Develop better metadata capture and evolution
- Develop or adapt warehouse interface to other
tools - Field test tools from the get-go
8How DataBank WorksMike Finch
9Research BottlenecksDatabase Technology
Research Productivity Gain
EML Generation
CanopyView
- DataBank Database Generator
- BCD
10Conclusions
- Database design is a complex web app
- Sociological aspects are important
- Proprietary data
- Technology adoption
- Integrative ecology new
- Defining intuitive adequate set of templates is
hard - Spatial is special.
- Visualization is cool.
11DataBank Workflow
Database Components
shopping cart
DB design
Empty DB
convert SQL MSSQL MSAccess
12DataBank Software Architecture
Internet Browser IE 5 Netscape 6
Web Server (Apache)
Access Field DB
Enhydra (Middleware)
Databank Backend (Java)
Viz Tookkit JDK
DB SQL Server
13Canopy DataBank
- What is it
- End-user database design with components (aka
templates) - Variable table level metadata inherent
- Study-level metadata available from the BCD
- Technology
- HTML, Java, Enhydra, SQLServer, Access, JTK
- Aim to produce XML/EML for exchange and archive
- Status
- Some templates (mostly spatial tree structure)
- About 5 field studies
- Some visualization
14DataBank Architecture (workflow)
template.xml descr.xml pic.gif bigpc.gif
shopping cart
DB design
Empty DB
TEOF internal object representation
TDM convert SQL MSSQL MSAccess
schema element dependencies entities
observation attributes
15Next Steps
- XML/EML for data exchange
- Outreach to CS community
- VLDB Panel on Ecosystem Informatics (August)
- NSF BDEI PIs Meetings Forum (May, Nov)
- Further define support spatial data structures
-- additional collaborator(s)? - Visualization (!!!)
16Discussion
Are we on the right track with
visualization? What off the shelf viz. tools are
available? Who might consult with us on
visualization, How about spatial scaling? How to
refine our spatial categorization scheme? What
collaborators (data sets) should we seek? How is
modeling linked to visualization? Comments about
DataBank?