EaGLe: Data Archiving and Metadata - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

EaGLe: Data Archiving and Metadata

Description:

EaGLe: Data Archiving and Metadata The EaGLe Legacy R-8286750 – PowerPoint PPT presentation

Number of Views:178
Avg rating:3.0/5.0
Slides: 51
Provided by: CSCE99
Category:

less

Transcript and Presenter's Notes

Title: EaGLe: Data Archiving and Metadata


1
EaGLe Data Archiving and Metadata
  • The EaGLe Legacy

R-8286750
2
Why archive the EaGLe data?
  • To ensure its preservation for future generations
    of scientists
  • To ensure it is broadly available for current
    scientists to use
  • To create the broadest possible public benefit
    from this taxpayer-funded program
  • To help EPA retain the data that is collected /
    created through its funding
  • Because we wish that earlier researchers had
    archived their data for us to use

3
EaGLe Data Committee Mission Statement
  • Develop an information management plan to archive
    EaGLe data with appropriate metadata so that EPA
    can make it readily available
  • Ensure that data usefulness outlives the EaGLe
    project (and does not require continued
    maintenance by EaGLe researchers)

Skip
4
Jump 2
DATA METADATA EML and XML COST-EFFECTIVENESS
1 Types of data A) Metadata standards 1 Seems awfully complicated
2 What to archive B) What is EML? 2 How much will it cost me?
3 Data objects C) Ecological metadata 3 How long does it take?
4 Data packages D) What good is EML? 4 What good are metadata?
5 What is metadata? E) EML vocabulary 5 Who needs metadata?
6 Why collect metadata? F) What is XML? SECURITY ISSUES
7 The cons of metadata G) What good is XML? (1) Access controls
8 The locs of metadata H) Do I have to learn XML? (2) Approval process
9 Sample metadata file 1 METADATA RETRIEVAL (3) The locs of metadata
10 Sample metadata file 2 IJ) EIMS overview MORE INFORMATION
11 Getting in gear K) Data flow You to EIMS Optional data archival
12 EaGLe metadata entry L) EaGLe home page Do NOT archive
13 Metadata checklist M-N) Global search Non-standardized metadata
14 Checklist continued O-S) Metadata report EaGLe contacts
15 Data file formats T-U) Searches End
5
1 EaGLe Data Types
  • Geospatial Imagery
  • Genomic
  • Remote Sensing
  • Biological
  • Routine Monitoring

Go Back
6
2 What data must be archived?
  • All new data created or collected using EaGLe
    funds
  • Field data
  • Genomics experiments
  • New GIS coverages
  • New remote sensing data
  • Other images, models
  • All important summary, supplemental, and
    explanatory information
  • Journal articles
  • Poster Sessions
  • Presentations
  • Rules governing data QC or transforms
  • SOPs, protocols, experimental design documents,
    QA/QC documents

Go Back
7
3 Types of Data Objects
  • Literature Objects
  • Journal Articles, Bibliographies, Books,
    Adobe.pdf files, etc.
  • Flat Files
  • Stand-alone tables (i.e., SAS tables),
    spreadsheet data
  • Relational Databases
  • Many normalized tables joined by relational rules
  • Data views, query objects combined bits from
    separate tables
  • Graphical Objects
  • Maps, photos, digital sounds, presentations, Web
    sites
  • Material objects
  • Soil samples, stained slides, microfiche,
    posters, video tapes,etc.

Go Back
8
4 What is a Data Package?
  • Together, electronic data objects and their
    metadata file constitute a Data Package.
  • The metadata file is like the box, inventory tag
    and instruction manual
  • The data themselves are the content of the
    package
  • Data inventory requires good-quality metadata
  • Even material objects can have electronic metadata

Go Back
9
5 Whats metadata ?
  • Metadata means beside the data or data about
    data
  • Metadata files contain summary and reference data
    about primary data objects
  • Any information needed to identify, decode,
    interpret, track, store, locate, assign ownership
    of, or control access to a data object.
  • Everyday examples
  • Library card catalogue Key to map symbols
    Checkbook register
  • Scientific Metadata examples
  • Particulate matter instruments equipment models
    and settings, detection limits, replication,
    sample handling details
  • Journal article citation, methods citation
  • Sample indented metadata

Go Back
10
6 Why Collect Metadata?
  • Long-term Storage
  • Keep EaGLe data safely banked for future reuse
  • Support long-term data tracking and retrieval
  • Data Broadcasting
  • Publish metadata via the Environmental Research
    and Science Library (ERSL) public interface
  • Foster collaborative and cross-cutting research
  • Meta-analyses made possiblesmall dataset mergers
  • Cross-regional data, cross-media data
  • Longitudinal time-series analysesdata
    recombining

Go Back
11
7 The cons of metadata
  • Content What is in the data object?
  • Data descriptions, citation info, electronic file
    formats
  • Contacts Who owns the data?
  • Authors, contact person, organization
  • Context What is the provenance of the data?
  • Applicable knowledge areas, methods, project
    origins, etc.

Go Back
12
8 The locs of metadata
  • Location
  • Where is the electronic file located?
  • What is the geographic coverage of the data
    object?
  • Locks
  • Final version (protected against inadvertent
    updates)
  • Viewing access controls
  • Editing/downloading access controls
  • Release date, expiration date

Go Back
13
9 Sample Indented Metadata file
Go Back
14
10 Sample 2 indented metadata
  • Switch to Normal view
  • Click on icon
  • Press Page down key to view PDF
  • When finished, press ESC key to restore Normal
    view
  • Use slide show icon to resume

Go Back
15
11 Getting in Gear
  • Feb. 1, 2004 Begin metadata creation.
  • Summer 2004 Begin EaGLe data uploading.
  • Jan. 2005 EaGLe metadata completed.
  • End of no-cost extensions (early 2006) Most of
    EaGLe datasets archived but password-protected.
  • Jan. 2008 Most of EaGLe data released to public

Go Back
16
12 Metadata Creation / Data Uploading
  • Metadata Entry Form (MEF)
  • Generates an EML-compliant metadata file in XML
    format
  • Automatic upload to ERSL
  • Data packages stored in EIMS repository (ERSL
    backend)
  • EaGLe Portalintranet interface for grantees
  • Review, Approval, and Release Processes
  • Post-Release Search, Store and Update
  • Searchable Metadata Records in one area of
    EIMS/ERSL
  • Actual Datasets stored in EIMS/ERSL Repository

Go Back
17
13 Metadata Checklist
  • General Information
  • Data Set Title
  • Point of Contact
  • Time period of the information contained in the
    dataset
  • Abstract (brief description) of the dataset
  • Geographic coverage of the dataset
  • Data format (i.e., shape-file, coverage,
    spreadsheet, etc.)
  • Dataset Creation
  • Formal authors
  • Others who contributed
  • Research objectives for dataset
  • Common misinterpretations of the data, if any

Go Back
18
14 Metadata Checklist (continued)
  • Dataset Contents
  • Was a georeferencing system used? If so, what is
    it?
  • What does each dataset record describe?
  • What are the attributes that describe these
    features?
  • Define each attribute and provide measurement
    units. Also provide resolution and estimated
    accuracy, if possible
  • Define or reference coded attributes (e.g., FIPS
    codes, error codes)
  • Dataset Processes
  • Citation of source of original data, if
    applicable (e.g., GIS data)
  • Types of major data processing steps
  • Detailed methodology of data collection,
    including study designs, protocols, equipment,
    analyses, etc., and any changes in data
    collection procedures during the study
  • Record any QA tests performed and their results

Go Back
19
15 Data File Formats
Unacceptable
Acceptable
  • Files converted into character delimited ASCII
    files (i.e., comma delimited .csv files)
  • jpeg, jpg, tiff, gif, img, png, geo-tiff, ecw,
    ArcView, simple html or htm, xml, LaTeX, TeX, pdf
    (method files)
  • Programs in programming language (must have text
    support).
  • Excel Spreadsheets (convert to .csv)
  • Presentation files such as PowerPoint (convert to
    .pdf)
  • Word-processing files (convert to ASCII)
  • Proprietary files
  • RTF files
  • Special characters (Greek letters and other
    symbols not found in ASCII)

Go Back
Go End
20
A) Standards for Metadata Creation
  • FGDC Content Standard for Digital Geospatial
    Metadata http//www.fgdc.gov/metadata/contstan.ht
    ml http//www.fgdc.gov/metadata/metadata.html
  • National Biological Information Infrastructure
    http//www.nbii.gov/
  • Ecological Metadata Language http//knb.ecoinfor
    matics.org/software/eml
  • Knowledge Network for Biocomplexity
    (MORPHO) http//knb.ecoinformatics.org/
  • Dublin Core Metadata Element Set
    www.dublincore.org
  • Encoded Archival Description (EAD)
    http//www.loc.gov/ead/
  • Data Documentation Initiative
    http//www.icpsr.umich.edu/DDI/

Go Back
21
B) So, whats EML?
  • Ecological Metadata Language
  • A metadata standard designed to handle
    cross-disciplinary research
  • A wrapper that holds metadata for many
    different types of primary data (geospacial,
    biological, genomic,etc)
  • Widely accepted standard in the ecological
    communities of interest.
  • A container that meshes with other types of
    metadata standards
  • A metadata standard based on XML vocabulary.
  • An information tree that can graft on new
    branches of knowledge when they become necessary
    to the knowledge community

Go Back
22
C) EML Standard for Ecological Metadata
  • Core Definitions and units of the columns
    (fields or attributes) in all data tables
  • Methods, procedures, and protocols
  • Research questions and hypotheses
  • Site selection
  • Authors, contacts, and proper citation for use
  • Sampling Extent spatial, biological, temporal
  • Sample Indented Metadata

Go Back
23
D) What good is EML?
  • Ease of data interchange with other scientists
  • Enhances precision in data documentation
  • Forces clarity in defining measurement units
  • Missing-data codes, other interpretative codes
  • Enforces data access rules
  • Improves rapid search capability

Go Back
24
E) EML Specialty Terms
Common usage EML Term
Field, independent variable, column name, header Attribute
Abstract, Brief, Executive Summary Abstract
Project Officer, Primary investigator Party
Go Back
Go End
25
F) What is XML?
  • eXtensible Markup Language
  • A subset of Standard General Markup Language
  • A method for marking up plain text
  • To distinguish clearly between the
  • content (text)
  • document structure (title, paragraph, line, etc.)
  • Note Textual attributes (bold, large, italic,
    etc) are NOT included.
  • To make electronic documents readily
    machine-readable
  • Makes document structures explicit and modular
  • Permits easy transformations between document
    formats

Go Back
26
G) What good is XML?
  • Allows document contents to be re-used in new
    ways
  • Allows document elements to be stored just like
    tables of numerical data
  • Enforces precise translation of document look
    and feel from one presentation mode (hard-copy)
    to another (web)
  • Transparency of markup to future readers
  • Can accommodate new kinds of text markup at need
    (audio tags, motion tags, etc)
  • Converts information to platform and software
    independent formats to maximize long-term utility

Go Back
27
H) Do I have to learn XML?
  • NO!
  • The Metadata Entry Form automatically creates a
    valid XML document
  • Data entered into the form automatically follows
    the EML constraints on mandatory inclusion of
    metadata elements
  • Only system administrators and metadata
    librarians need XML expertise

Go Back
Go End
28
IJ) EIMS overview
Go Back
29
K) Data Flow From You to EIMS back
EaGLe
Metadata entry into existing EaGLe system
Data load into EIMS
EIMS
Data update / retrieval from EaGLe intranet
portal into EIMS
Go Back
30
L) EaGLe Prototype Home Page
Go Back
31
M) EaGLe Prototype Global Search
Go Back
32
N) EaGLe Prototype Search Results
Go Back
33
O) EaGLe Metadata Report
Links to headers in the Metadata Report
Go Back
34
P) EaGLe Metadata Report (continued)
Go Back
35
Q) EaGLe Metadata Report (continued)
Go Back
36
R) EaGLe Metadata Report (continued)
Go Back
37
S) EaGLe Prototype Simple Search
Go Back
38
T) EaGLe Prototype Advanced Search
Go Back
39
U) EaGLe Prototype Advanced Search (continued)
Go Back
Go End
40
Optional Data Archival
  • Historical data owned by EaGLe researchers
  • Data used strictly for QA/QC
  • e.g., temperature of experimental tanks
  • Work that produced no analyzable data
  • Qualitative reports
  • Pilot data

Go Back
41
Do NOT Archive
  • Data not owned by EaGLe researchers
  • Data already archived elsewhere
  • e.g., many GIS coverages
  • Dirty data
  • Sans quality controls
  • Containing many missing values, duplicates, etc.

Go Back
42
Non-standardized metadata
  • Field notes
  • Marginalia
  • Large object free text fields
  • Index cards
  • Voice recordings
  • Personal communications
  • Mental notes (non-transcribed knowledge)

Go Back
Go End
43
Who is working on EaGLe data archiving?
  • EaGLe data committee (EDC)
  • ? Valerie Brady (chair) ? Terry Brown (GLEI)
  • ? Peter Noble (CEER-GOM) ? Lexia Valdes (ACE INC)
  • ? Webb Sprague (PEEIR) ? Chris Pfeiffer (ASC)
  • Environmental Information Management System
    (EIMS)
  • ? John Sykes (USEPA EIMS)
  • Computer Sciences Corporation (CSC)
  • ? Derek Lane ? Susan Eversole ? Steve Walata III
  • ? Geoff Blair ? Wally Schwab ? And others

Go End
Go Back
44
1 Seems awfully complicated
  • ...but its easier than statistics
  • No need to learn whole of EML to use the relevant
    bits
  • No more complicated than programming a VCR
  • Time, Date, Channel, Skip commercials
  • Similar to writing a journal article
  • Abstract, Background, Protocol,
  • Methods, Analysis, Discussion, Results,
  • Caveats, Secondary analysis potential
  • Author Names, Affiliations, Bibliography
  • EaGLe MEF or Morpho user-interface allow
    production of the most useful metadata

Go Back
45
2 How much does it cost to collect metadata?
  • Estimate the value of your research results
  • Total amount of research grant(s) plus 15 added
    value
  • Divide by number of years project is funded
  • Allocate 10 of resulting /efforts to metadata
    collection
  • Distribute amounts evenly over yearsdont stint!
  • Collecting metadata at the beginning of a study
    captures important data decisions and research
    design elements
  • Use metadata collection as an ad hoc method of
    data quality control during each year of the
    study.

Go Back
46
3 How much time is this going to take?
  • Between 8 and 40 hours per data group
  • All similar data bundled togethernot a per
    dataset cost!
  • More complex datasets take more time
  • Loading or linking to pre-written material can
    save time
  • Training for use of Metadata Entry Form
  • One-time 3-hour training session
  • Minimum 3 hours hands-on practice
  • Availability of live help during first solo MEF
    work

Go Back
Go End
47
4 What Good are Metadata?
  • High quality metadata serve 5 purposes
  • Data Integrity Maintenance over the long term
    20-year rule
  • Across expected changes in data storage
    technology, compression, etc.
  • Tracking, searching for, and retrieving datasets
  • Like a library card cataloguewhere to find data,
    where to shelve it.
  • Scientific collaboration
  • Joint analysis and secondary analysis potential
  • Cathedral effect
  • Pooling data across regions contributes to an
    environmental big picture
  • Longitudinal studies--building science efforts
    upon a shared data foundation.
  • Economical
  • Extending the shelf life of data gives taxpayers
    more return on investment

Go Back
48
5 Who needs the EaGLe metadata?
Other scientists
Todays Colleagues Scientific Collaborators
Tomorrows meta-analysts
The next generation
Archivists
Data Librarians
The Public
Data Exchange Tools (CDX)
Citizens and Citizen Groups
Legislators and other decision-makers
Go Back
Go End
49
1) Data Access and Security
  • Only registered users may enter or edit a
    metadata record
  • Record-level edit permissions required for input
    and update
  • Only registered Data Librarians can release
    records to a designated user base (Public, EPA
    Only, Group, Owner)
  • Confidential records can be restricted to a
    subset of users
  • EPA Only accessible only to EPA registered
    users
  • Group accessible only to members of a specified
    group of users (including system users outside
    the EPA firewall, if necessary)
  • Owner accessible only by the designated owner
    of the EIMS record
  • Post-release any internet user may view metadata
    records.
  • Separate access controls for actual datasets

Go Back
50
Generations of Research
  • For a true confluence of research efforts,
    clarity in metadata is the key
Write a Comment
User Comments (0)
About PowerShow.com