Title: LTER EML Best Practices
1LTER EML Best Practices
- Second KNB Data Management Workshop
- 2-4 February 2005
- Mark Servilla
- LTER Network Office
- University of New Mexico, Albuquerque
2Agenda
- Introduction
- Goals Motivation Why EML Best Practices?
- LTER Metadata Tiers and recommended EML elements
- Additional Recommendations
3Introduction
4Goals Motivation
Why do we need an EML Best Practices document?
- Guidelines to achieve the following goals
- Maximize interoperability of LTER EML documents
to facilitate data synthesis - Minimize heterogeneity of LTER EML documents to
simplify development and re-use of software tools
and style sheets - Identify useful subsets of the EML to support
specific functionality tiers targeted by the LTER
NIS Advisory Committee (NISAC) - Provide guidance to sites in their initial
implementation of EML, and a roadmap for
improving their implementation to achieve higher
functionality
5Information Entropy over Time
entropy a process of degradation or running
down or a trend to disorder Merriam-Webster
6LTER Tiered Trajectoryfor Metadata
7Best Practices Metadata completeness
- Identification
- Discovery
- Evaluation
- Access
- Integration
- Semantic Use
8Level 1 - Identification
- Description Minimum content for adequate data
set discovery - Major Elements Added
- Title
- Creator
- Contact
- Publisher
- Publication Date
- Keywords
- Abstract
- Dataset/distribution (i.e. URL for dataset
information)
9Level 1 Code Example
- lt?xml version"1.0" encoding"UTF-8"?gt
- ltemleml xmlnseml"eml//ecoinformatics.org/eml-2
.0.1" - xmlnsxsi"http//www.w3.org/2001/XMLSchema-insta
nce" - xsischemaLocation"eml//ecoinformatics.org/eml-
2.0.1 - http//someserver.fls.edu/eml-2.0.1/eml.xsd"
- packageId"knb-lter-fls.1.1" system"FLS
scope"system"gt -
- ltdataset id"FLS-1" system"FLS"gt
- ltalternateIdentifiergtFLS-1lt/alternateIdent
ifiergt - ltshortNamegtArthropodslt/shortNamegt
- lttitlegt
- Long-term Ground Arthropod Monitoring
Dataset at - Silver City, NM USA from 1998 to 2004
- lt/titlegt
- . . .
- lt/datasetgt
10Level 1 Code Example cont.
- ltcreator id"pers-1" system"FLS"gt
- ltindividualNamegt
- ltgivenNamegtJohnlt/givenNamegt
- ltsurNamegtEcologistlt/surNamegt
- lt/individualNamegt
- ltorganizationNamegtFLS LTERlt/organizationNamegt
- ltaddress id"addr-1" system"FLS"gt
- ltdeliveryPointgtDepartment of
Ecologylt/deliveryPointgt - ltdeliveryPointgtUniversity of New
Mexicolt/deliveryPointgt - ltdeliveryPointgtPO Box 1234lt/deliveryPointgt
- ltcitygtAlbuquerquelt/citygt
- ltadministrativeAreagtNMlt/administrativeArea
gt - ltpostalCodegt87131-1234lt/postalCodegt
- lt/addressgt
- ltphone phonetype"voice"gt(505)
999-9999lt/phonegt - ltelectronicMailAddressgtjeco_at_unm.edult/electroni
cMailAddressgt - ltonlineUrlgthttp//www.unm.edu/jecolt/onlineUrl
gt - lt/creatorgt
11Level 2 - Discovery
- Description Level 1 content, plus coverage
information to support targeted searches - Major Elements Added
- Geographic Coverage
- Taxonomic Coverage
- Temporal Coverage
12Level 2 Code Example
- ltcoveragegt
- ltgeographicCoveragegt
- ltgeographicDescriptiongt
- Silver City, NM USA
- lt/geographicDescriptiongt
- ltboundingCoordinatesgt
- ltwestBoundingCoordinategt-112.373634lt/w
estBoundingCoordinategt - lteastBoundingCoordinategt-111.612936lt/e
astBoundingCoordinategt - ltnorthBoundingCoordinategt33.708829lt/n
orthBoundingCoordinategt - ltsouthBoundingCoordinategt33.298975lt/s
outhBoundingCoordinategt - ltboundingAltitudesgt
- ltaltitudeMinimumgt304lt/altitudeMini
mumgt - ltaltitudeMaximumgt627lt/altitudeMaxi
mumgt - ltaltitudeUnitsgtmeterlt/altitudeUnit
sgt - lt/boundingAltitudesgt
- lt/boundingCoordinatesgt
- lt/geographicCoveragegt
- ...
-
13Level 2 Code Example cont.
- ...
- lttemporalCoveragegt
- ltrangeOfDatesgt
- ltbeginDategt
- ltcalendarDategt1998-11-12lt/calendar
Dategt - lt/beginDategt
- ltendDategt
- ltcalendarDategt2003-12-31lt/calendar
Dategt - lt/endDategt
- lt/rangeOfDatesgt
- lt/temporalCoveragegt
- lttaxonomicCoveragegt
- ltgeneralTaxonomicCoveragegt
- Orthopteran insects (grasshoppers)
were id using - the 2004 BigKey to Orthoptera
- lt/generalTaxonomicCoveragegt
- lttaxonomicClassificationgt
- lttaxonRankNamegtKingdomlt/taxonRankNamegt
- lttaxonRankValuegtAnimalialt/taxonRankVal
uegt
14Level 3 - Evaluation
- Description Level 2 content, plus data set
details to enable end-user evaluation of the
methodology and data entities - Major Elements Added
- Intellectual Rights
- Project
- Methods
- Data Table/Entity Group
- Data Table/Attributes (constrained
by current version of EML)
15Level 3 Code Example
- ltintellectualRightsgt
- ltsectiongt
- ltparagt
- The dataset is released to the public
and - may be used for academic or
commercial purposes - subject to the following
restrictions - lt/paragt
- ltparagt
- ltitemizedlistgt
- ltlistitemgt
- ltparagt
- LTER will make every
effort possible - to control and document
the quality of - the data it publishes.
Data are made - available "as is"...
- lt/paragt
- lt/listitemgt
- ...
- lt/itemizedlistgt
16Level 3 Code Example cont.
... ltprojectgt lttitlegtFictitious LTER Site
(FLS) permanent monitoring programlt/titlegt
ltpersonnel id"pers-30" system"FLS"gt
ltindividualNamegt ltsalutationgtDr.lt/salu
tationgt ltgivenNamegtEvalt/givenNamegt
ltsurNamegtScientistlt/surNamegt
lt/individualNamegt ltaddressgt
ltreferencegtaddr-1lt/referencegt lt/addressgt
ltrolegtprincipalInvestigatorlt/rolegt
lt/personnelgt ltabstractgt ltparagt
The FLS basic monitoring program consists of
monitoring of arthropod populations,
plant net primary productivity, and bird
populations. Monitoring takes place at 3 sites,
4 times a year. Climate parameters
are continuously measured at all stations.
lt/paragt lt/abstractgt lt/projectgt
17Level 3 Code Example cont.
ltmethodsgt ltmethodStepgt ltdescriptiongt
ltparagt FSL Protocol
for Surveying Ground Arthropods has been...
lt/paragt lt/descriptiongt
ltprotocolgt lttitlegt
FLS Protocol for Surveying Ground Arthropods
lt/titlegt ltcreatorgt
ltreferencesgtpers-1lt/referencesgt
lt/creatorgt ltpubDategt2000-02-23lt/pubDat
egt ltabstractgt ltparagt
This protocol is being used by
FLS arthropod... lt/paragt
lt/abstractgt ltkeywordSetgt
ltkeyword keywordType"theme"gtEcologylt/keywor
dgt ... lt/keywordSetgt
ltdistributiongt ltonlinegt
lturlgthttp//fls.univ.edu/protoc
ols/arthro.htmllt/urlgt lt/onlinegt
lt/distributiongt lt/protocolgt
lt/methodStepgt ...
18Level 3 Code Example cont.
ltmethodStepgt ltinstrumentationgt
SBE MicroCAT 37-SM (S/N 1790) manufacturer
Sea-Bird Electronics (model 37-SM
MicroCAT) parameter Conductivity
(accuracy 0.0003 S/m, readability 0.00001 S/m,
range 0 to 7 S/m) last calibration
Feb 28, 2001 lt/instrumentationgt
ltinstrumentationgt SBE MicroCAT 37-SM
(S/N 1790) manufacturer Sea-Bird
Electronics (model 37-SM MicroCAT) parameter
Pressure (water) (accuracy 0.2m,
readability 0.0004m, range 0 to 20m) last
calibration Feb 28, 2001
lt/instrumentationgt ltinstrumentationgt
SBE MicroCAT 37-SM (S/N 1790)
manufacturer Sea-Bird Electronics
(model 37-SM MicroCAT) parameter Temperature
(water)(accuracy 0.002C, readability
0.0001C, range -5 to 35C) last
calibration Feb 28, 2001
lt/instrumentationgt lt/methodStepgt
... lt/methodsgt
19Level 3 Code Example cont.
- ...
- ltdataTablegt
- ltentityNamegtarthro_hablt/entityNamegt
- ltentityDescriptiongt
- Habitat description for the sampling
locations - lt/entityDescriptiongt
- ltattributeListgt
- ltattributegt
- ltattributeNamegttemplt/attributeNamegt
- ltattributeDefinitiongtWater
Temperaturelt/attributeDefinitiongt - ltstorageTypegtfloatlt/storageTypegt
- ltmeasurementScalegt
- ltintervalgt
- ltunitgt
- ltstandardUnitgtcelsiuslt/sta
ndardUnitgt - lt/unitgt
- ltprecisiongt0.001lt/precisiongt
- ltnumericDomaingt
- ltnumberTypegtreallt/numberTy
pegt
20Level 3 Code Example cont.
ltattributegt ltattributeNamegtcondlt/attri
buteNamegt ltattributeLabelgtConductivitylt/at
tributeLabelgt ltattributeDefinitiongt
measured with SeaBird Electronics CTD-911
lt/attributeDefinitiongt
ltstorageTypegtfloatlt/storageTypegt
ltmeasurementScalegt ltratiogt
ltunitgt
ltcustomUnitgtsiemensPerMeterlt/customUnitgt
lt/unitgt ltprecisiongt0.0001lt/
precisiongt ltnumericDomaingt
ltnumberTypegtreallt/numberTypegt
ltboundsgt
ltminimum exclusive"false"gt0lt/minimumgt
ltmaximum exclusive"false"gt40lt/maximu
mgt lt/boundsgt
lt/numericDomaingt lt/ratiogt
lt/measurementScalegt lt/attributegt lt/attributeLi
stgt ...
21Level 3 Code Example cont.
... ltaddtionalMetadatagt ltunitListgt
ltunit id"siemensPerMeter" name"siemensPerMeter"
unitType"conductance" parentSI"siemen"
multiplerToSI"1"gt ltdescriptiongt
electrical conductance of a solution
(conductivity) lt/descriptiongt
lt/unitgt lt/unitListgt lt/additionalMetadatagt ...
22Level 4 - Access
- Description Level 3 content plus data access
details to support automated data retrieval - Major Elements Added
- Access
- Physical
23Level 4 Code Example
- ltaccess authSystem"FLS"gt
- ltallowgt
- ltprincipalgtPUBLIClt/principalgt
- ltpermissiongtreadlt/permissiongt
- lt/allowgt
- ltallowgt
- ltprincipalgtuidfls,oLTER,dcecoinformatic
s,dcorg lt/principalgt - ltpermissiongtalllt/permissiongt
- lt/allowgt
- lt/accessgt
24Level 4 Code Example
- ltdataTablegt
- ...
- ltphysicalgt
- ltobjectNamegtflslter.299.1lt/objectNamegt
- ltsize unit"bytes"gt59847lt/sizegt
- ltdataFormatgt
- lttextFormatgt
- ltnumHeaderLinesgt1lt/numHeaderLines
gt - ltattributeOrientationgtcolumnlt/att
ributeOrientationgt - ltsimpleDelimitedgt
- ltfieldDelimitergt,lt/fieldDelim
itergt - lt/simpleDelimitedgt
- lt/textFormatgt
- lt/dataFormatgt
- ltdistributiongt
- ltonlinegt
- lturlgthttp//fls.unm.edu/flslter.29
6.1lt/urlgt - lt/onlinegt
- lt/distributiongt
25Level 5 - Integration
- Description Level 4 content plus complete
attribute and quality control details to support
computer-assisted data integration and
re-sampling Integration-level metadata should
support computer-mediated access and processing
of data, and therefore requires that all aspects
of the data package be fully described. - Major Elements Added
- Attribute List (full descriptions)
- Measurement Scale
- Units
- Constraint
- Quality Control
26Level 5 Code Example
... ltconstraint id"pkarthro_taxa"gt
ltprimaryKeygt ltconstraintNamegtpkarthro_taxa
lt/constraintNamegt ltkeygt
ltattributeReferencegtdbo.arthro_taxa.taxonlt/attribu
teReferencegt lt/keygt
lt/primaryKeygt lt/constraintgt ltconstraint
id"arthro_taxa.taxonNotNull"gt
ltnotNullConstraintgt ltconstraintNamegtarthro
_taxa.taxonNotNulllt/constraintNamegt
ltkeygt ltattributeReferencegtdbo.arthro_t
axa.taxonlt/attributeReferencegt lt/keygt
lt/notNullConstraintgt lt/constraintgt ...
27Level 5 Code Example
lt/measurementScalegt ltmethodgt
ltqualityControlgt ltdescriptiongt
ltparagt Passage of
clouds during a profile reduces the incident
radiation, and leads to erroneous
estimates of Kd. Variation of
incident irradiance was described in two
ways (before binning) 1) the
coefficient of variation (cv)
over the 10m depth interval, and 2)
difference... lt/paragt
lt/descriptiongt lt/qualityControlgt
lt/methodgt ...
28Level 6 - Semantic
- Description Level 5 content plus semantic
information (currently under development by SEEK,
and may require extension to the EML schema)
29Additional Recommendations
- packageID and Metacat document naming convention
- LDAP access control in Metacat
- Organizational citation
Metacat and by extension the Metacat harvester
rely on numerical data set ids and revision
numbers for document management and
synchronization - packageId attributes for EML
contributed to the KNB Metacat should be formed
as follows knb-lter-site.dataset
number.revision, e.g. knb-lter-sev.187.4
Scope
UniqueID Revision
Metacat access control format conforms to the
LDAP Distinguished Name concept ltprincipalgtuidF
LS,olter,dcecoinformatics,dcorglt/principalgt
The Organization field on the Metacat query
results page is populated using the first
emleml/dataset/creator/organizationName element
in the document, so it is recommended that for
LTER-contributed data sets the LTER site be
included as the first creator ltorganizationName
gtSevilleta LTERlt/organizationNamegt
30Credits
- James Brunt (LNO)
- Corinna Gries (CAP)
- Jeanine McGann (LNO)
- Margaret OBrien (SBC)
- Ken Ramsey (JRN)
- Wade Sheldon (GCE)
31Acknowledgements
This material is based upon work supported
by The National Science Foundation under Grant
Numbers 9980154, 9904777, 0131178, 9905838,
0129792, and 0225676. The National Center for
Ecological Analysis and Synthesis, a Center
funded by NSF (Grant Number 0072909), the
University of California, and the UC Santa
Barbara campus. The Andrew W. Mellon
Foundation. PBI Collaborators NCEAS, University
of New Mexico (Long Term Ecological Research
Network Office), San Diego Supercomputer Center,
University of Kansas (Center for Biodiversity
Research) Kepler contributors SEEK, Ptolemy II,
SDM/SciDAC, GEON