Title: Christine Laney
1Information management issues and the Trends
project A drawing board for making cross-site
comparisons feasible
- Christine Laney
- Ken Ramsey
- Mark Servilla
2THANK YOU!!
- LTER Information Managers you know who you are
- LTER Network Office Mark, James B., Bob, Duane,
Marshall, Inigo, James V. - NCEAS Mark, Callie, Will, Jim, Rick
- Jornada staff Ken, Justin technicians
3Trends in Long-Term Ecological Data a
multi-agency synthesis project
- Objectives
- to create a platform for synthesis by producing a
compendium of easily accessible long term graphs
and data from long-term ecological research sites - to illustrate the utility of this platform in
addressing important within-site and
network-level scientific questions
4Products
- Folio-sized book to be published by Oxford Univ.
Press - Website (data, metadata, graphs) for synthesis
and analysis
5Book organization
- Introduction value and importance of long-term
research - Within-site graphs/tables arranged by four themes
in the LTER Planning Process - Climate and variability in the physical
environment, including disturbance
characteristics - Human population and economy
- Biogeochemistry (e.g., atmospheric deposition,
surface water chemistry) - Biotic structure (e.g., ANPP, plant biomass,
species richness) - Among-site comparison graphs e.g., atmospheric
chemistry, N fertilization, climate variability,
ENSO signal responses - Site descriptions and photos organized by biomes
6Website Trends Data Storewww.ecotrends.info
- Initial design
- Static datasets, metadata book graphs listed by
chapter and figure number, some search capability - Metadata for static data provide access to raw
data and the script used to generate the derived
product - Prototype near completion
- Final design
- Routinely harvested and derived data from ongoing
projects - Metadata links back to sources and revisions
- Search, sort, analysis graphing tools
- Prototype in development
7Coming soon www.ecotrends.info
8Participating Sites
9Process
- Selecting variables
- Submitted broad request for long-term data
- Downloaded data from other online compilations
- Examined submitted data for consistent variables
across sites (e.g., precipitation, nitrogen,
etc.) - Refined data request and requested additional
data from sites for variables that should exist,
but may not have been submitted (e.g., ANPP,
species richness, etc.) - Generated wish list of variables that may be
important for cross-site and network-level
questions, but long-term data dont exist yet at
very many sites (e.g., soil respiration, foliar
nutrients). This will be used in planning grant
activities. - Selecting variables for the webpage
- Use variables from book in static form first
- Update data sets with time and include additional
variables
10Progress to date
Contributors 26 LTER (84), 13 FS (12), 6 ARS
sites (3) Santa Rita ER (lt1) Climate
datasets 300 Biogeochemistry datasets 150
Biotic datasets 100 Others 50
Total over 600 datasets Plus
190 llustrative graphs Human population
and economy collected for all LTER sites from
census data (funded by NSF supplement) Metadata
Most data have at least rudimentary metadata, few
have full EML with attribute level description of
the datasets.
11What were doing with the data
- Downloading and storing data documentation
- Writing R or SAS scripts to generate
- Datasets containing monthly or annual averages or
totals, depending on the variable - Strict time plots with simple linear regression
- Tables that record all derived statistics
- Plots that show change over time among different
sites for each variable - Anomaly plots of monthly climate data
- Generating metadata with EML for each derived
product. Metadata contains links to original data
and associated scripts. - Recording each product (data, metadata, graphs),
along with links between products, in a
multi-purpose database.
12MULTI-SITE ANALYSES
Nitrate in precipitation
Step 1. Graph similar data through time for sites
with those data.
Step 2. Determine trend line by site.
13Step 3. Compare slopes of trend lines among
sites.
14Step 4. Compare slopes spatially.
Mean change in total deposition of N in nitrate
form in precipitation
15Challenges, solutions opportunities
- Obtaining data
- Quality and quantity of data and documentation
- Utilizing data toward specific goals
- Properly documenting received data and products
derived from the data - Making final products accessible to editorial
committee and available on website
16Obtaining Data time-intensive and inconsistent
process on both sides!
- Located data on individual websites
- Few had their long-term data separated out from
short-term data - Unable to search for long-term data
- Utilized metacat via LTER, KNB, Morpho
- Slow search engine
- Unable to search for particular record lengths
- Unable to sort filtered records by time
- Metadata often available without attached data
files - No pre-knowledge of types of available long-term
data beyond basics (precip, temp, etc). - Result a lot of emails and phone calls!
17Challenges, opportunities solutions
- Obtaining data
- Quality and quantity of data and documentation
- Utilizing data toward specific goals
- Properly documenting received data and products
derived from the data - Making final products accessible to editorial
committee and available on website
18Quality and quantity of data and documentation
- Lots of great data, varied level of detailed
metadata in text or EML format - Small problems with single datasets ? large
problems with many datasets - Online data sometimes not quality-checked or
ready for use but no markers to say so - Examples
19Looks nicebut.
20(No Transcript)
21The nit-picky details
- Dates as an example
- 2-digit years
- range of dates in single cell (e.g.,
02/01-03/2006 or 02/01/2006,02/03/2006) - date with a letter appended to the end (ex
02/01/1999A) - single digit day and month, especially when there
are no delimiters between month, day, year.
22Preferred data formats for synthesis
- Simple ascii delimited with commas, spaces, tabs,
etc. with headers, or very simple excel
spreadsheets. If fixed-width, give widths and
spaces. - Metadata in separate file
- All data in single file, not separated by year.
If not possible, each file in exactly the same
format. - Complex formatting systems, like multisheets
several tables in one sheet, are more difficult
to interpret and extract information.
23Challenges, opportunities solutions
- Obtaining data
- Quality and quantity of data and documentation
- Utilizing data toward specific goals
- Properly documenting received data and products
derived from the data - Making final products accessible to editorial
committee and available on website
24Utilizing data toward specific goals
- Selected variables with specified summary time
spans (monthly or yearly) with specified units. - Converting short time scales to longer time
scales OK - Converting long time scales to shorter
Impossible - Unit conversion often simple
- F?C
- W/m2 ? MJ/m2
- Can be really difficult
- Flow in m from a weir ? m3/s using weir
dimensions - Raw shield count data without calibrations given
? moist impossible. - Missing data leads to bias in particular
months/years especially with totals. - Lots of consultation with metadata and PIs. What
happens when metadata is incomplete PIs are
unavailable?
25Challenges, opportunities solutions
- Obtaining data
- Quality and quantity of data and documentation
- Utilizing data toward specific goals
- Properly documenting received data and products
derived from the data - Making final products accessible to editorial
committee and available on website
26Properly documenting received data and products
derived from the data
- Morphing system
- Hierarchical folder system with emails
- Attempted EML documentation. Help from NCEAS.
- Current Versioning System (CVS) multipurpose
SQL Server MySQL database. - Documentation of deriving data and graphs
- EML template
- Scripts
- Metacat (versioning)
27Challenges, opportunities solutions
- Obtaining data
- Quality and quantity of data and documentation
- Utilizing data toward specific goals
- Properly documenting received data and products
derived from the data - Making final products accessible to editorial
committee and available on website
28Trends editorial pagejornada-www.nmsu.edu
29Voting page
30Trends IM meeting, 15 min breakout
- Site involvement/commitment to Trends
- Within site
- Percentage of IM time/resources spent compared to
PIs - Percentage of time/resources spent on Trends
compared to time spent on site needs - Too much, enough, too little?
- Among sites
- Has there been communication between sites about
trends data requests? - Has Trends triggered any new collaborations or
strengthened old ones? - Communication
- Progress reports often and/or adequate enough?
- Recommendations for further communications
31Trends IM meeting, 15 min breakout
- Keeping track of data use proper citation
- Now (by the trends project itself)
- In the future via the website
32Trends IM meeting, 15 min breakout
- International site involvement
- Interest in Trends project how can ILTER sites
use the current set of data in their own research - Reasons pro and con for initiating a similar
effort among ILTER sites - What would it take to do a Trends-like project at
the international level? - List of contacts