JSOC Pipeline Processing Overview - PowerPoint PPT Presentation

About This Presentation
Title:

JSOC Pipeline Processing Overview

Description:

Records are the main data objects seen by module programmers ... If no module fails all data records are commited and become visible to other ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 21
Provided by: rasmusmu
Learn more at: http://hmi.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: JSOC Pipeline Processing Overview


1
JSOC Pipeline Processing Overview
  • Rasmus Munk Larsen, Stanford University
  • rmunk_at_quake.stanford.edu
  • 650-725-5485

2
Overview
  • Hardware overview
  • JSOC data model
  • Pipeline infrastructure subsystems
  • Pipeline modules

3
JSOC Connectivity
Stanford
DDS
NASA AMES
LMSAL
1 Gb Private line
MOC
White Net
4
JSOC Hardware configuration
5
JSOC data model Motivation
  • Evolved from MDI dataset concept to
  • Enable record level access to meta-data for
    queries and browsing
  • Accommodate more complex data models required by
    higher-level processing
  • Main design features
  • Lesson learned from MDI Separate meta-data
    (keywords) and image data
  • No need to re-write large image files when only
    keywords change (lev1.8 problem)
  • No out-of-date keyword values in FITS headers -
    can bind to most recent values on export
  • Data access through query-like dataset names
  • All access in terms of (sets of) data records,
    which are the atomic units of a data series
  • A dataset name is a query specifying a set of
    data records
  • jsochmi_lev1_V3000-3020 (21 records from
    with known epoch and cadence)
  • jsochmi_lev0_fgt_obs2008-11-07_020000/8hcam
    doppler (8 hours worth of filtergrams)
  • Storage and tape management must be transparent
    to user
  • Chunking of data records into storage units for
    efficient tape/disk usage done internally
  • Completely separate storage unit and meta-data
    databases more modular design
  • MDI data and modules will be migrated to use new
    storage service
  • Store meta-data (keywords) in relational database
  • Can use power of relational database to search
    and index data records
  • Easy and fast to create time series of any
    keyword value (for trending etc.)

6
JSOC data model
  • JSOC Data will be organized according to a data
    model with the following classes
  • Series A sequence of like data records,
    typically data products produced by a particular
    analysis
  • Attributes include Name, Owner , primary search
    index, Storage unit size, Storage group
  • Record Single measurement/image/observation with
    associated meta-data
  • Attributes include ID, Storage Unit ID, Storage
    Unit Slot
  • Contain Keywords, Links, Data segments
  • Records are the main data objects seen by module
    programmers
  • Keyword Named meta-data value, stored in
    database
  • Attributes include Name, Type, Value, Physical
    unit
  • Link Named pointer from one record to another,
    stored in database
  • Attributes include Name, Target series, target
    record id or primary index value
  • Used to capture data dependencies and processing
    history
  • Data Segment Named data container representing
    the primary data on disk belonging to a record
  • Attributes include Name, filename, datatype,
    naxis, axis0naxis-1, storage format
  • Can be either structure-less (any file) or
    n-dimensional array stored in tiled, compressed
    file format
  • Storage Unit A chunk of data records from the
    same series stored in a single directory tree
  • Attributes include Online location, offline
    location, tape group, retention time
  • Managed by the Storage Unit Manager in a manner
    transparent to most module programmers

7
JSOC data model
JSOC Data Series
Data records for series hmi_lev1_fd_V
Single hmi_lev1_fd_V data record
Keywords RECORDNUM 12345 Unique serial
number SERIESNUM 5531704 Slots since
epoch. T_OBS 2009.01.05_232240_TAI DATAMIN
-2.537730543544E03 DATAMAX
1.935749511719E03 ... P_ANGLE
LINKORBIT,KEYWORDSOLAR_P
hmi_lev0_cam1_fg
hmi_lev1_fd_V12345
aia_lev0_cont1700
hmi_lev1_fd_V12346
hmi_lev1_fd_M
hmi_lev1_fd_V12347
hmi_lev1_fd_V
Links ORBIT hmi_lev0_orbit, SERIESNUM
221268160 CALTABLE hmi_lev0_dopcal, RECORDNUM
7 L1 hmi_lev0_cam1_fg, RECORDNUM 42345232 R1
hmi_lev0_cam1_fg, RECORDNUM 42345233
hmi_lev1_fd_V12348
aia_lev0_FE171
hmi_lev1_fd_V12349

hmi_lev1_fd_V12350
hmi_lev1_fd_V12351
hmi_lev1_fd_V12352
Data Segments V_DOPPLER
hmi_lev1_fd_V12353

Storage Unit Directory
8
JSOC subsystems
  • SUMS Storage Unit Management System
  • Maintains database of storage units and their
    location on disk and tape
  • Manages JSOC storage subsystems Disk array,
    Robotic tape library
  • Scrubs old data from disk cache to maintain
    enough free workspace
  • Loads and unloads tape to/from tape drives and
    robotic library
  • Allocates disk storage needed by pipeline
    processes through DRMS
  • Stages storage units requested by pipeline
    processes through DRMS
  • Design features
  • RPC client-server protocol
  • Oracle DBMS (to be migrated to PostgreSQL)
  • DRMS Data Record Management System
  • Maintains database holding
  • Master tables with definitions of all JSOC series
    and their keyword, link and data segment
    definitions
  • One table per series containing record meta-data,
    e.g. keyword values
  • Provides distributed transaction processing
    framework for pipeline
  • Provides full meta-data searching through JSOC
    query language
  • Multi-column indexed searches on primary index
    values allows for fast and simple querying for
    common cases
  • Inclusion of free-form SQL clauses allows
    advanced querying

9
Pipeline software/hardware architecture
JSOC Science Libraries
Utility Libraries
Pipeline program module
File I/O
OpenRecords CloseRecords
GetKeyword, SetKeyword GetLink, SetLink
OpenDataSegment CloseDataSegment
DRMS Library
Data Segment I/O
JSOC Disks
JSOC Disks
JSOC Disks
JSOC Disks
Record Cache (KeywordsLinksData paths)
DRMS socket protocol
Data Record Management Service (DRMS)
Data Record Management Service (DRMS)
Storage unit transfer
Storage Unit Management Service (SUMS)
Data Record Management Service (DRMS)
AllocUnit GetUnit PutUnit
Storage unit transfer
SQL queries
Robotic Tape Archive
Database Server
SQL queries
SQL queries
Record Catalogs
Record Catalogs
Series Tables
Record Tables
Storage Unit Tables
10
JSOC Pipeline Workflow
Pipeline processing plan
Pipeline Operator
DRMS session
Module3
Processing script, mapfile List of pipeline
modules with needed datasets for input, output
PUI Pipeline User Interface (scheduler)
Module2
Processing History Log
Module1
DRMS Data Record Management service
DRMS Data Record Management service
SUMS Storage Unit Management System
11
Analysis modules co-I contributions and
collaboration
  • Contributions from co-I teams
  • Software for intermediate and high level analysis
    modules
  • Data series definitions
  • Keywords, links, data segments, size of storage
    units, primary index keywords etc.
  • Documentation
  • Test data and intended results for verification
  • Time
  • Explain algorithms and implementation
  • Help with verification
  • Collaborate on improvements if required (e.g.
    performance or maintainability)
  • Contributions from HMI team
  • Pipeline execution environment
  • Software hardware resources (Development
    environment, libraries, tools)
  • Time
  • Help with defining data series
  • Help with porting code to JSOC API
  • If needed, collaborate on algorithmic
    improvements, tuning for JSOC hardware,
    parallelization
  • Verification

12
HMI module status and MDI heritage
Intermediate and high level data products
Primary observables
Internal rotation
Heliographic Doppler velocity maps
Spherical Harmonic Time series
Mode frequencies And splitting
Internal sound speed
Full-disk velocity, sound speed, Maps (0-30Mm)
Local wave frequency shifts
Ring diagrams
Doppler Velocity
Carrington synoptic v and cs maps (0-30Mm)
Time-distance Cross-covariance function
Tracked Tiles Of Dopplergrams
Wave travel times
High-resolution v and cs maps (0-30Mm)
Egression and Ingression maps
Wave phase shift maps
Deep-focus v and cs maps (0-200Mm)
Far-side activity index
Stokes I,V
Line-of-sight Magnetograms
Line-of-Sight Magnetic Field Maps
Stokes I,Q,U,V
Full-disk 10-min Averaged maps
Vector Magnetograms Fast algorithm
Vector Magnetic Field Maps
Vector Magnetograms Inversion algorithm
Coronal magnetic Field Extrapolations
Tracked Tiles
Tracked full-disk 1-hour averaged Continuum maps
Coronal and Solar wind models
Continuum Brightness
Solar limb parameters
Brightness feature maps
Brightness Images
13
Example Global Seismology Pipeline
14
Questions to be discussed at working sessions
  • List of standard science data products
  • Which data products, including intermediate ones,
    should be produced by JSOC to accomplish the
    science goals of the mission?
  • What cadence, resolution, coverage etc. should
    each data product have?
  • Which data products should be computed on the fly
    and which should be archived?
  • What are the challenges to be overcome for each
    analysis technique?
  • Detailing each branch of the processing pipeline
  • What are the detailed steps in each branch?
  • Can some of the computational steps be
    encapsulated in general tools that can be shared
    among different branches (example tracking)?
  • What are the CPU and I/O resource requirements of
    computational steps?
  • Contributed analysis modules
  • What groups or individuals will contribute code,
    and incorporate it in the pipeline?
  • If multiple candidate techniques and/or
    implementations exist, which should be included
    in the pipeline?
  • What is the test plan and what data is needed to
    verify the approach?

15
JSOC Series Definition
16
Global Database Tables
17
Database tables for example series hmi_fd_v
  • Tables specific for each series contain per
    record values of
  • Keywords
  • Record numbers of records pointed to by links
  • DSIndex an index identifying the SUMS storage
    unit containing the data segments of a record
  • Series sequence counter used for generating
    unique record numbers

18
Pipeline batch processing
  • A pipeline batch is encapsulated in a single
    database transaction
  • If no module fails all data records are commited
    and become visible to other clients of the JSOC
    catalog at the end of the session
  • If failure occurs all data records are deleted
    and the database rolled back
  • It is possible to commit data produced up to
    intermediate checkpoints during sessions

Pipeline batch atomic transaction
Module 2.1
Module N
Commit Data Deregister
Module 1
Register session

DRMS API
DRMS API
DRMS API
DRMS API
DRMS API
Module 2.2
DRMS API
Input data records
Output data records
DRMS Service Session Master
Record Series Database
SUMS
19
Example of module code
  • A module doing a (naïve) Doppler velocity
    calculation could look as shown below
  • Usage
  • doppler DRMSSESSIONhelios33546
    "2009.09.01_160000_TAI" "2009.09.01_170000_TAI
    "

extern CmdParams_t cmdparams / command line
args / extern DRMS_Env_t drms_env / DRMS
environment / int module_main(void)
DRMS_RecordSet_t filtergrams, dopplergram
int first_frame, status char
query1024,start,end start
cmdparms_getarg(cmdparams, 1) end
cmdparms_getarg(cmdparams, 2) sprintf(query,
"hmi_lev0_fgT_Obss-s", start, end)
filtergrams drms_open_records(drms_env, query,
"RD", status) if (filtergrams-gtnum_recs0)
printf("Sorry, no filtergrams found for
that time interval.\n") return -1
first_frame 0 / Start looping over record
set. / for () first_frame
find_next_framelist(first_frame, filtergrams)
if (first_frame -1) / No more complete
framelists. Exit. / break dopplergram
drms_create_records(drms_env, "hmi_fd_v",

1, status) if (status)
return -1 compute_dopplergram(first_frame,
filtergrams, dopplergram) drms_close_records(
drms_env, dopplergram) return 0
20
Example continued
int compute_dopplergram(int first_frame,
DRMS_RecordSet_t filtergrams,
DRMS_RecordSet_t
dopplergram) int n_rows, n_cols, tuning
DRMS_Segment_t fg10, dop short
fg_data10 char pol double dop_data
/ Get pointers for doppler data array. / dop
drms_open_datasegment(dopplergram-gtrecords0,
"v_doppler", "RDWR") n_cols
drms_getaxis(dop, 0) n_rows
drms_getaxis(dop, 1) dop_data (double
)drms_getdata(dop, 0, 0) / Get pointers for
filtergram data arrays. / for (ifirst_frame
iltfirst_frame10 i) fgi
drms_open_datasegment(filtergrams-gtrecordsi,
"intensity", "RD") fg_datai (short
)drms_getdata(fg, 0, 0) pol
drms_getkey_string(filtergrams-gtrecordsi,
"Polarization") tuning drms_getkey_int(filt
ergrams-gtrecordsi, "Tuning") printf(Using
filtergram (s, d)\n, pol, tuning) /
Do the actual Doppler computation./
calc_v(fg_data, dop_data)
Write a Comment
User Comments (0)
About PowerShow.com