Title: Nonbio necro sciences Jim Frew, Bob Mann
1Non-bio (necro-?) sciences(Jim Frew, Bob Mann)
- Examples of current practice and issues
- Astronomy Bob Mann, Alex Szalay
- Earth Sciences Jim Frew, Dave Maier
- Others
- Draw up list of issues
- Discussion
2Some provenance data derivation issues in
astronomy
- Bob Mann
- Institute for Astronomy, Edinburgh Univ.
- National e-Science Centre
3Outline
- Trends in astronomy implications for provenance
- Two provenance issues
- Recording provenance in the FITS data format
- Provenance in database federation
- Alex Szalay
- Provenances in pipelines and databases
- Annotations in astronomy databases
4Evolution in astronomical practice
- Collectivisation the empowerment of the
individual - Fewer individual observational programmes and
more sky surveys - More people access the data, via archives
- The specialist is dead, long live the
generalist - Use multi-wavelength data
- Expertise in classes of astronomical object, not
observational techniques
5Implications for provenance
- More science being done with data that the
individual scientist didnt take - about which the scientist knows less
- More reliance on pipeline processing
- More science with catalogues of source attributes
derived from primary data - More science being done through combining data
from multiple sources more later
6FITSFlexible Image Transport System
- Format of a FITS file (http//fits.gsfc.nasa.gov)
- Primary Header metadata describing instrument,
observation file contents - Primary Data Array array of 0-999 dimensions
usually a 2D image - none or more Extensions
- Array, ASCII Table or Binary Table, each with
Header - (New FITS-inspired XML format VOTable)
7FITS header entries
- Keyword-value pairs optional comment
- e.g. PLTSCALE '67.14 ' / arcsec/mm
plate scale - Three types of header keyword
- Mandatory e.g. NAXIS
- Optional e.g. DATAMAX
- Additional i.e. user-defined, but not from
restricted list (mandatory optional)
8Provenance in FITS headers
- Many optional keywords related to provenance
- ORIGIN, DATE-OBS, TELESCOP, INSTRUME, OBSERVER,
REFERENC - plus HISTORY The text should contain a history
of steps and procedures associated with the
processing of the associated data. Any number of
HISTORY card images may appear in a header.
(FITS Standard)
9Example FITS header extracts (1)
- SIMPLE T / file does
conform to FITS standard - BITPIX 32 / number of bits
per data pixel - NAXIS 2 / number of data
axes - NAXIS1 648 / length of data
axis 1 - NAXIS2 648 / length of data
axis 2 - EXTEND T / FITS dataset may
contain extensions - BUNIT 'Primary Array' / Units of the
image - XPROC0 'evselect table''product/P0059750201PNU
002PIEVLI0000.FITEVENTS'' w - CONTINUE 'ithfilteredsetno filteredset''filtere
d.fits'' keepfilteroutputno - CONTINUE ' destructyes flagcolumn''EVFLAG''
flagbit-1 filtertype''expres - CONTINUE 'sion'' expression''GTI(intermediate/Gl
obalHK-all-1-Attitude_GTI-X0 - CONTINUE '000000000.fits, TIME)
GTI(intermediate/pnEvents-epn-1-EPIC_flare - CONTINUE '_GTI-U0020000000.fitsSTDGTI, TIME)
(RAWY12) (PATTERN - CONTINUE ' (PI in (20012000) (PI500
(PI - CONTINUE 'ATTERN0)) (FLAG 0x2fa0024)
0'' dssblock'''' writedssyes - CONTINUE ' cleandssno updateexposureyes
filterexposureyes blockstocopy''' - CONTINUE ''' attributestocopy''''
energycolumn''PHA'' withzcolumnno zcolu -
-
New Keyword
Multi-line entry
10Example FITS header extracts (2)
End of header entries generated at telescope
- XTENSION 'IMAGE ' / Image extension
- BITPIX 16 / Bits per pixel
- NAXIS 2 / Number of axes
-
- HISTORY This is the end of the header written by
the ING observing-system. - WAT0_001 'systemimage'
- WAT1_001 'wtypezpx axtypera projp11.0
projp3220.0' - WAT2_001 'wtypezpx axtypedec projp11.0
projp3220.0' -
- TRIM 'Sep 2 1614 Trim data section is
512098,14100' - BP-FLAG 'Sep 2 1614 Bad pixel file is
/home/jrl/wfcred/stds/A5506-4.bad' - BT-FLAG 'Sep 2 1614 Overscan section is
150,14128 with mean1514.871' - BI-FLAG 'Sep 2 1614 Zero level correction
image is /data/cass03a/was/mframe - FF-FLAG 'Sep 2 1614 Flat field image is
/data/cass03d/was/mframes/r_9362689 - ILLUMCOR 'Sep 2 1614 Illumination image is
tmpill.pl with scale0.9655418'
Keywords describing data reduction process
11Example FITS header extracts (3)
House-keeping provenance metadata
- SIMPLE T / file does
conform to FITS standard - BITPIX 16 / number of bits
per data pixel -
- NHKLINES 146 / Number of lines
from house-keeping file - HKLIN001 'JOB.JOBNO UKJ349' /
- HKLIN002 'JOB.DATE-MES 19980929' /
-
- HISTORY 'SuperCOSMOS image analysis and mapping
mode (IAM and MM)' / - HISTORY 'data written by xydcomp_ss.' /
- HISTORY 'Any questions/comments/suggestions/bug
reports should be sent' / - HISTORY 'to N.Hambly_at_roe.ac.uk' /
12FITS provenance - summary
- Header keywords designed for recording provenance
information esp. HISTORY - HISTORY cards written in free text not readily
machine-interpretable - Project-specific provenance keywords not readily
interpretable at all outside project
13Provenance in database federation
- Sky survey databases in many wavebands
- New science from federating them
- Need to associate entries in different DBs
- Unified Column Descriptors (UCDs)
- Taxonomy based on collation of column names from
hundreds of databases - Location on sky provides natural indexing
14Matching by proximity not always adequate
Need to know more about astrophysical properties
of two populations to know which of the red
objects is the most likely counterpart to the
cyan source
15Recording association provenance
- Might want to record associations in DBs
- Users want to know whether to trust them
- Complex probabilistic association algorithms
- Difficult to describe easily
- Associations may change in light of new data
- Can users challenge them via annotation?
16Summary
- Astronomers record lots of provenance info
- Want machine-interpretability
- Some astronomical provenance is complex
- Want means of describing algorithms
- Starting to get links between databases and
online copies of scientific papers - No culture of annotation by users - yet
17