Title: MI3 Station History Information Management System
1MI3 Station History Information Management System
Design, Details and Directions
- Jeff Arnfield
- Station History Program Manager
- National Climatic Data Center, Asheville, NC
- National Oceanic and Atmospheric Administration
2MI3 Presentation Roadmap
- Background
- System Overview and Walkthrough
- Enhancements Underway
- Challenges
- Questions
3The Ultimate Metadata Problem
4Metadata Our Big Picture
Observing Systems
Satellite Granule
Station Histories
Datasets
Standards
Inventories
5MI3 Goals
- Integrate, enhance increase access
- Initial focus manage station histories
- Widely accessible
- Support data ingest and access needs
- Accommodate NOAA and non-NOAA stations
- Contain wide variety of station details
- Handle new observing systems, programs and
phenomena without recoding - Track information sources, log all changes
- Integrate with inventories, other details
6First Step Document Imaging
- Much station info available only on paper
- Reviewed, collated and imaged station info
documents - 500 different forms, 50 commonly used
- 450,000 documents
- 750,000 pages
- 37,000 stations
- Images available on the web using WSSRD
- Web Store Search Retrieve Display
- Commercial service of Information Manufacturing
Corporation - Security via individual user accounts, controlled
by NCDC - Used by NOAA, state climatologists, others
- Privacy concerns limit access
- Content will be incorporated into database
7WSSRD Station History Images
8Starting State Standalone Station Sets
- Variety of station repositories
- usually designed for specific task, project,
system - systems, spreadsheets, ASCII files
- Both formal and ad hoc, updated and static
- variety of information sources
- variable freshness, accuracy, detail
- some lacked historical values
- Varying accessibility and usability
- Updates did not propagate to all systems
- Systems contained conflicting values
- Seldom integrated with related information
9The nature of the subject matter
- Subject matter is challenging
- Each station has many, many details
- Each detail may vary independently
- Each detail has its own period of validity
- Variations in station management and
identification practices - No widespread standard for handling station
information - Station practices are challenging
- Stations may participate in multiple networks and
programs - Stations may be known by different names and
identifiers - There may be multiple information sources for a
given station - Few networks provided an automated station
information feed - Frequent historical information backfill and
correction
10MI3 Design Inputs
- Existing NCDC systems
- Current and projected production requirements
- Existing and projected data holdings and sources
- Available station metadata sources
- US CRN metadata requirements team output
- NCDC subject matter expert workshops
- Projects with NOAA, national, international
partners
11MI3 Database Design Options
- Synchronized entries in all tables when anything
changes
- Easy to find latest value
- Easier to query
- May improve performance
- Simpler to develop
- Good for batch submission
- Greater redundancy
- More difficult to detect changes
- Changes may affect many versions
- Single change could require new version
- Independent atomic values with begin and end dates
- Fewer records
- Minimizes redundancy
- Changes easy to detect
- Single update is logically propagated
- Potential performance impact
- Date management more complex
- Development more difficult
12MI3 Development Environment
- Web-based user interface
- ColdFusion
- Javascript
- Oracle stored procedures
- Oracle database
- Separate end-user query instance minimizes impact
on production, increases availability
13MI3 System Organization
- Information grouped into subject areas
- Identity
- Updates
- Location
- Involved Parties
- Data Programs
- Datasets / Products
- Equipment
- Phenomena / Observing Practices
- Location Map
- Remarks
- Administrative Options
14MI3 Capabilities and Features
- Flexible search options
- One interface for query and maintenance
(privilege-based) - Tabbed interface by subject area, easily expanded
- Overview grids showing all time periods
- Drill-down capability to an integrated form view
- Further drilldown to individual fact level
- Generates critical production reports
- Date management functions simplify views/reports
- Direct query, CGI access from other systems
15MI3 System Security
- Initial security at the subject area tab-level
- Can configure for no access, read, read/write
- At database level, its either read or read/write
- Security enhancements underway
- Control update privileges at the station group
and data program level - Provides more granular database security
- Users can maintain only their stations
16MI3 Content
- Initial content ported from legacy system
- Many issues discovered and corrected
- More than 33,300 stations, including
- 27,250 Cooperative stations (11740 currently
open) - 886 ASOS
- 76 CRN stations
- 160 RADAR sites
- 534 AWOS sites (others in process)
- About 5200 other stations (mostly surface,
includes historical) - Associations with 17 different datasets
17MI3 Home Page
http//mi3.ncdc.noaa.gov
18MI3 Search Options
19MI3 Search Operators
20MI3 Additional Search Parameters
21MI3 Advanced Search Options
22MI3 Search Results
23Station Details Identity Grid
24Station Details Identity Grid
25Station Details Updates Grid
26Station Details Location Grid
27Station Details Location Form
28Station Details Location Drilldown
29Station Details Locking a station
30Station Details Location Update
31Station Details Personnel
32Station Details Data Products
33Station Details Equipment
34Station Details Equipment Update
35Station Details Phenomena
36Station Details Phenomena Update
37Station Details Phenomena Update
38Station Details Map
39MI3 Getting Information
40MI3 External Access via CGI
- MI3 station details window can be instantiated by
any system using a CGI call to open a web browser - MI3 opens a station details window as GUEST user
- If multiple stations have used that ID (it
happens), a list is presented so user can select
the station(s) of interest - http//arachne.ncdc.noaa.gov/mi3qry/
displaystation.cfm?idtypeabbrICAOidvalueKAVL - Idtypeabbr is an abbreviation for the ID type
- Idvalue is a valid station ID of the specified
type
41MI3 Station ID Types
- Abbreviations for ID types currently supported
- ICAO - International Civil Aeronautics
Organization 4 alphanumerics - WBAN - Weather Bureau Army Navy 5 digits
- COOP - NWS Cooperative ID 6 digits
- FAA - Federal Aviation Administration Call Sign
3 characters - WMO - World Meteorological Organization Index
Number 5 digits - NWSLI - NWS Location Identifier 3-5
alphanumerics inconsistent for older stations - GOES - Geosynchronous Orbiting Satellite format
varies used for GOES data transmission,
currently entered only for CRN stations - CRN - Internal CRN network station ID 4 digits
- NCDCSTNID - NCDC Station ID 8 digits internal
management - List is table driven, new types easily added
42MI3 Data Flows
43MI3 Enhancements Underway
- Enhanced security by station group, data
program - Data inventories
- Will replace a current system, providing added
functionality - Different levels of granularity
- Stations and geographic areas
- Presentation options selectable by user
- Digital image management
- Simple map of query results
- Print / export search results
- Display and maintain related remarks by subject
tab - Revised, expanded documentation
44MI3 Planned Enhancements
- GIS interface
- Additional station reports and views
- Advanced station management utilities
- Generic auto-ingest for station information
- Data collection and QC workflow interface for
other networks - Links to external information
- Collection-level FGDC metadata
- Network reference information
- Station history document images
45MI3 Content Expansion Underway
- Research, correction of existing stations
- Automated update of WMO stations
- Update geographic details via GIS
- Add details to operational networks
- Data inventories
46MI3 Content Plans
- Other NERON stations
- Add historical details to current stations
- Upper Air Stations
- Air Force Master Station Catalog
- Other NOAA stations
- Other national, international stations
47MI3 Adding A New Station Group
- Identify information source(s)
- Identify affected systems and processes
- Quantify volume of information
- Station count
- Information volume per station
- Map information to MI3, identify gaps
- Identify overlap with current stations
- Define QC requirements/ develop QC processes
- Develop ingest processes
- Manual
- Automated (if warranted/feasible)
- Define QA/QC and data entry rules, workflow and
personnel - Develop critical reports extracts
- For operational networks
- Ensure ongoing refresh from source agency(s)
- Identify anticipated update frequency volume
48Station History Management Issues
- Dealing with multiple networks
- Tracking Changes
- Date management
- Logically period of validity for each detail
- Constructing coherent views of the data
- Maintaining logical integrity during update
- Station ID management through the years
49MI3 Dealing with dates
- Each station may have many, many details
- Details may vary independently
- Each detail has its own period of validity
- Relational design spreads details across many
tables - Some tables may contain no detail for a station
for a given time period
50Producing a simple data view.
51 requires joining three tables
- Join on Station Key column
- Must also match by date
- Data change independently
- No table contains all dates
- A table may not contain data for a given period
52We need the dates for this view
Build a list of dates from all tables in the
view, discarding duplicates
53The Context Date Pair Algorithm
Build date pairs from list, removing those pairs
for which no table in the view contains data
The final date pairs are used to drive the joins
in our output view query. All source tables may
be outer-joined to the chronology table.
54The Context Date Pair Schema
55A Sample Context
56Context Date Pair Benefits
- Resolves date pairs into coherent views
- Enables ad hoc queries by power users
- End-users benefit because special data requests
are easier to accommodate - Common schema / body of code
- supports all system queries, views and reports
- easy to define new contexts as needed
- single maintenance point
- permits future enhancements
57Other Date Management Concerns
- An associative (child) records period of
validity must fall within that of its parents - Peer-level child records (siblings) should not
have overlapping dates - Context date pairs do not help with this
parent/child and sibling record logical date
consistency - Currently enforced in user interface
- Developing stored procedures to handle at
database level
58What, and where, is a station?
- An airport has surface, upper air and NEXRAD
observing systems - Is this one station or three?
- If its one station, whats the location?
- We manage this case as three separate stations
- Each is a distinct observing system
- Any may change without affecting the others
- Equipment is different
- Locations usually differ, sometimes greatly
59A Station By Any Other ID.
- Station IDs have not been assigned and managed
consistently over time - In some networks, name changes may result in an
ID change - location and equipment may not change
- complicates data and metadata management and use
- IDs may be reassigned to other currently
operating stations, leaving a fragmented record - Sometimes there is no consistent, unchanging
station identifier available. - Stations are known by different IDs, and
sometimes names, in different networks and
programs
60Questions
MI3 home page http//mi3.ncdc.noaa.gov