Title: A View of the Ameriflux Data June ORNL Download
1A View of the Ameriflux Data June ORNL Download
- 19 July 2006
- Catharine van Ingen
- Microsoft Research E-Science Group
2Outline
- Intentions and goals
- Brief overview of the data downloaded
- Various views of the data
- Available data (by count) viewed by site, type,
etc. - Why data curation and data analysis are
intertwined
3Intentions and Goals
- Give the computing team an overview of the data
- Were trying to understand performance and schema
and web page design and - Give some examples of what the tools can do
- There are many more possibilities
- Have a little fun with numbers
4Ameriflux Overview
- 149 Sites across the Americas
- Each site reports a minimum of 22 common
measurements. - Communal science each principle investigator
acts independently to prepare and publish data. - Data published to and archived at Oak Ridge.
- Total data reported to date on the order of 110M
half-hourly measurements. - http//public.ornl.gov/ameriflux/
4
5June ORNL Download Overview
- Data automatically downloaded on June 16, 2006
from http//cdiac.esd.ornl.gov/programs/ameriflux
/data_system/aamer.html - 61 sites reporting data
- 627 unique measurement column headings
- 110M valid measurements
- Early data checking performed
- Data are always valid single precision floating
point numbers - Data gaps indicated by -9999., -99999., 9999.
- One and only on measurement from same site at
same time unless identified as a repeat
measurement - Data type specific range and other sanity checks
pending
6June ORNL Measurement Classification
- The discovered column headings are represented
as - Datumtype repeat_offset_offsetextended
dataumtypeunits - Datumtype the short (lt16 characters) name for
the data. - Example TA, PREC, or LE.
- Repeat an optional number indicating that
multiple measurements were taken at the same site
and offset. - Example include TA2.
- _offset_offset major and minor part of the z
offset. - Example SWC_10 (SWC at 10 cm) orTA_10_7 (TA at
10.7m). - Extended datumtype any remaining column text.
- Example fir, E, sfc.
- Units measurement units (should only be one!)
- Example w/m2, or deg C.
7What the Classification Means
- Both the datumtype and extended datumtype are
sometimes necessary to uniquely name some
measurement types - Only 37 datumtypes are currently used
- These account for 94 of all data
- Other is a catch-all for all others - the
extended datumtype is used to differentiate them - Example is Albedo or NEEP
- The extended datumtype also modifies the
datumtype - Examples are LE_actual and LE_potential or U_x,
U_y, U_z. - Each of datumtype, extended datumtype, repeat and
offset may be necessary to uniquely specify a
specific measurement - Examples are SWC4_10, Rn2_sfc, FC_WPL_LE_4_65
8And now for some plots
9Claimers and Disclaimers
- Y-axis is always the count of available (non-gap)
data measurements - X-axis and color used to differentiate two other
attributes - Often too many attributes to use color to
distinguish reliably - Goal is to look at the entire data set rather
than to show specific details - Some blurry rendering due to conversions and/or
too many axis legends or plotted values - Detailed drilldown with fewer contributing
attributes or sites or ? is possible - Some cutpaste errors may occur
- This isnt science or computer science, but a
step to information science - Comments in blue are what a non-scientist sees in
the figures
10Total Data Available by YearColored by Site
Overall number of sites and data taken is growing
11Total Data Availability by MonthColored by Site
While there is some tendency away from taking
data in the winter, most sites report throughout
the year
12Total Data Availability by SiteColored by Year
Many sites come and go after 4-5 years
13Total Data Availability by SiteColored by Type
Sites report more data either because of
longevity or ?
14Total Data Availability by Type Colored by Site
Data type reporting is far from uniform across
type
15Other Data TypesColored by Site
There is a long tail in the other extended
types how generally useful are these?
16Other Data TypesColored by Extended Type
A very few sites account for the vast bulk of
other extended data types
17Non-Zero Repeat Counts by Site Colored by
Repeat Count
A very few sites account for the vast bulk of
repeated measurements
18Non-Zero Repeat Counts by TypeColored by Repeat
Count
Are repeats just another way of reporting a
different offset?
19Non-Zero Offsets by TypeColored by Offset
Magnitude
Measuring soil properties (SWC, TA) at different
offsets must be important for science. Measuring
others (TA, CO2, RH) may be important or just
convenient?
20Non-Zero Offsets by OffsetColored by Type
Soil property measurements tend to be reported at
common offsets.
21Extended Data TypeColored by Major Data Type
The most common extended data type (_cum) is just
a derived value cumulative and affects only
PREC (rainfall). Are PAR_OUT, Rg_OUT and Rgl_OUT
similar? Some extended data types are just unit
conversion issues.
22Why data curation and data analysis are
intertwinedornow for something fun Thanks to
Gretchen Miller (Gmiller_at_berkeley.edu) for the
idea
23Average Reported Temperature by Latitude
Whats going on at higher latitudes? (It should
be getting colder)
24Data Availability by Month at Higher Latitudes
Colder month data is missing at northern sites!
25Questions, Comments, Suggestions to
bwc-tci_at_lists.berkeley.edu