Title: DATA SHARING ISSUES, METADATA, ARCHIVES, AND COMPREHENSION
1DATA SHARING ISSUES, METADATA, ARCHIVES, AND
COMPREHENSION
- Urgency
- NEESgrid (www.neesgrid.org) schedule
- Characterize the Earthquake Engineering community
use of data and metadata January 2002. - Distribute preliminary metadata standards May
2002. - Publish standards for data and metadata models
and representations by September 2002 - (Prudhomme and Mish, 2001).
- Consortium Developer of NEES www.nees.org
- Working groups on data issues looking for
interested volunteers
2Identify/define uses of data and metadata
- To help me remember what I did last time
- To permit other researchers to duplicate test
- Real time remote PI interaction
- To allow numerical simulation
- Interactive decision making during experiment
- Years after the test
- Automated control of the experiment
- Visualization
- Research and education, sponsors
- Data search/query filter
- Artificial Intelligence, inverse/system
identification - Software sharing by common interface opensees
3Use of data
- Data search/query filter
- Artificial Intelligence, inverse/system
identification - Software sharing by common interface opensees
4Experience of geotech community
- CWRU database on element tests
- VELACS USC
- COSMOS and IRIS
- PEER structures data bases UCSD, UW
- UCD cgm.engr.ucdavis.edu
5Other community examples
- Atmosphere/ocean research NCAR, NOAA, Navy
- Example of flux vector interchanged between
programs - User specific API to interface with black box
- CORBA Common Object Request Broker
Architecture. A spec for an object that may be
accessed by many platforms java, fortran, etc. - Fluid flow
- Visualization code runs with solver
- Open GL
- Generic flux vector
- Connection of mismatched meshes (regular and
scattered.) - Meshing experimental data with numerical data.
6Data use and format
- Think ahead for uses
- Needs assessment
- Format changes
- Visualization of large data sets is demanding
- What is data ?
- Format
- Access tools input and output
- Dont store twice because it is in different
format (calibration?)
7Formats, coding
8What are benefits of standardization?
- Knowledge of data format at one facility is
transferable to others. - E.g., numerical simulation of tests at CWRU, UCD.
- Training of experimenters may transferable.
- User interfaces to databases may be sharable so,
maybe we will not have to each develop the
interfaces independently. - Search, query, automated IO, visualization..
9Barriers to standardization and how to overcome
them
- Need a killer app that assumes a standard
- The gap between Civil Engineering and Information
Technology.
10Killer App features
- To help me remember what I did last time
automated metadata documentation - To permit other researchers to duplicate test
- Real time remote PI interaction-
teleparticipation - To allow numerical simulation
- Interactive decision making during experiment
- Years after the test
- Automated control of the experiment
11Killer App features(2)
- Visualization
- Data search/query/access/filter
- Web portal - for all of the above?
12Metadata Design
- Determine the structure of metadata to optimize
- Intuitive query language
- Readable to computers and humans
- Completeness without redundancy
- Flexibility and Evolution
- Curation by NEES SI and Consortium
- Write code- XML document type definitions
13Strawman metadata structure
- Project Identifiers
- Catalog of Materials, Objects, Sensors and
Apparatus - Sequence of Model Test Events and Measurements
- Sensor Channel Gain Lists (1)
- Image Data
- Control Data Files
14Discussion Items
- Philosophical issues related to culture of data
sharing? - Data producer should get first shot at
publication - How long should we allow a data generator to
ponder before other people can have access? - How do we publish electronic data?
- Give academic credit to data publishers,
15XML
- ltModelTestgt
- ltCataloggt
- ltSensorsgt
- ltSensor SN"PCB3245"gt
- ltTypegtPiezoelectric Accelerometerlt/Typegt
- ltManufacturergtPCBlt/Manufacturergt
- ltModelgt352lt/Modelgt
- ltCalibrationDategt092899lt/CalibrationDategt
- ltSensitivity Unit"mV/g"gt100lt/Sensitivitygt
- ltRangegt50glt/Rangegt
- ltSensorDatagt http//www.pcb.com/pcb3245
lt/SensorDatagt - lt/Sensorgt
- lt/Sensorsgt
- lt/Cataloggt
16There must be nice interfaces to complex data
structures. Automatic metadata generator should
do most of the work. TEDS (Transducer Electronic
Data Sheets), SCEDS, automated geometry
definition will make the job do-able.
17Discussion Items
- At what metadata level do we refer to other
archives instead of re-archiving? - Example
- Accelerometer amplifier gain for each test event
archive - Accelerometer calibration in the test archive
- Date and method of calibration in facility
archive - Cross-axis sensitivity at manufacturers archive
18Strawman metadata hierarchy
- Section 1 of the outline in Table 1 contains
metadata associated with the research project. - Section 2 is a catalog of physical objects used
to construct or test the model. This includes
apparatus used to test the model, passive
materials and markers that are placed in the
model, and sensors that are used in the model
tests.
19Strawman metadata hierarchy
- Section 5 describes image data. This could
include photographs, video camera data, and/or
engineering drawings of configuration. - Section 6 describes the data required to control
the experiment. This could determine the
location of a CPT sounding, the rate of
penetration of a penetrometer, or command files
to control a shaker.
20Strawman metadata hierarchy
- Section 3 describes sequencing of events. A
sequence can be the measurement of the location
of an object, or an event involving activation of
an actuator or a penetrometer sounding. - Section 4 includes the sensor-channel-gain lists
this documents which sensors are plugged into
which amplifier channels, and also includes the
sequence in which the sensor data was recorded,
and parameters that define gains and filters.
21CAD of geometry and instrument location numbers
Printable version of report (pdf) describing
experiment and automatically generated data time
histories
Excel spreadsheets of metadata
ASCII data files of sensor readings during about
90 simulated earthquakes (about 1 MB each)
22Excel spread sheet describing calibration
factors, amplifier channel numbers, gains, data
file format, ...
23Event BV, page 3 of pdf document - semiautomatic
plot generation using MathCAD program, central
vertical array of accelerometer data
24NEES Collaboratory
Othersite 1
System Integrator
Othersite 2
Earthquake Researchers
Site A
NEESgrid
Simulation/Experimental Facilities
Educators Students
Site B
NEES Consortium
NEES Consortium Development
Site C
Other Practitioners
Site Council
Professional Engineers
25UC Davis Research Network
Prototype OLS router
Prototype OLS router
To Sacramento, Merced
To Berkeley, SantaCruz
26(No Transcript)