JSOC Pipeline Processing Overview - PowerPoint PPT Presentation

About This Presentation

Title:

JSOC Pipeline Processing Overview

Description:

Records are the main data objects seen by module programmers ... If no module fails all data records are commited and become visible to other ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 21

Provided by: rasmusmu

Learn more at: http://hmi.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: JSOC Pipeline Processing Overview

1
JSOC Pipeline Processing Overview

Rasmus Munk Larsen, Stanford University
rmunk_at_quake.stanford.edu
650-725-5485

2
Overview

Hardware overview
JSOC data model
Pipeline infrastructure subsystems
Pipeline modules

3
JSOC Connectivity
Stanford
DDS
NASA AMES
LMSAL
1 Gb Private line
MOC
White Net
4
JSOC Hardware configuration
5
JSOC data model Motivation

Evolved from MDI dataset concept to
Enable record level access to meta-data for
queries and browsing
Accommodate more complex data models required by
higher-level processing
Main design features
Lesson learned from MDI Separate meta-data
(keywords) and image data
No need to re-write large image files when only
keywords change (lev1.8 problem)
No out-of-date keyword values in FITS headers -
can bind to most recent values on export
Data access through query-like dataset names
All access in terms of (sets of) data records,
which are the atomic units of a data series
A dataset name is a query specifying a set of
data records
jsochmi_lev1_V3000-3020 (21 records from
with known epoch and cadence)
jsochmi_lev0_fgt_obs2008-11-07_020000/8hcam
doppler (8 hours worth of filtergrams)
Storage and tape management must be transparent
to user
Chunking of data records into storage units for
efficient tape/disk usage done internally
Completely separate storage unit and meta-data
databases more modular design
MDI data and modules will be migrated to use new
storage service
Store meta-data (keywords) in relational database
Can use power of relational database to search
and index data records
Easy and fast to create time series of any
keyword value (for trending etc.)

6
JSOC data model

JSOC Data will be organized according to a data
model with the following classes
Series A sequence of like data records,
typically data products produced by a particular
analysis
Attributes include Name, Owner , primary search
index, Storage unit size, Storage group
Record Single measurement/image/observation with
associated meta-data
Attributes include ID, Storage Unit ID, Storage
Unit Slot
Contain Keywords, Links, Data segments
Records are the main data objects seen by module
programmers
Keyword Named meta-data value, stored in
database
Attributes include Name, Type, Value, Physical
unit
Link Named pointer from one record to another,
stored in database
Attributes include Name, Target series, target
record id or primary index value
Used to capture data dependencies and processing
history
Data Segment Named data container representing
the primary data on disk belonging to a record
Attributes include Name, filename, datatype,
naxis, axis0naxis-1, storage format
Can be either structure-less (any file) or
n-dimensional array stored in tiled, compressed
file format
Storage Unit A chunk of data records from the
same series stored in a single directory tree
Attributes include Online location, offline
location, tape group, retention time
Managed by the Storage Unit Manager in a manner
transparent to most module programmers

7
JSOC data model
JSOC Data Series
Data records for series hmi_lev1_fd_V
Single hmi_lev1_fd_V data record
Keywords RECORDNUM 12345 Unique serial
number SERIESNUM 5531704 Slots since
epoch. T_OBS 2009.01.05_232240_TAI DATAMIN
-2.537730543544E03 DATAMAX
1.935749511719E03 ... P_ANGLE
LINKORBIT,KEYWORDSOLAR_P
hmi_lev0_cam1_fg
hmi_lev1_fd_V12345
aia_lev0_cont1700
hmi_lev1_fd_V12346
hmi_lev1_fd_M
hmi_lev1_fd_V12347
hmi_lev1_fd_V
Links ORBIT hmi_lev0_orbit, SERIESNUM
221268160 CALTABLE hmi_lev0_dopcal, RECORDNUM
7 L1 hmi_lev0_cam1_fg, RECORDNUM 42345232 R1
hmi_lev0_cam1_fg, RECORDNUM 42345233
hmi_lev1_fd_V12348
aia_lev0_FE171
hmi_lev1_fd_V12349

hmi_lev1_fd_V12350
hmi_lev1_fd_V12351
hmi_lev1_fd_V12352
Data Segments V_DOPPLER
hmi_lev1_fd_V12353

Storage Unit Directory
8
JSOC subsystems

SUMS Storage Unit Management System
Maintains database of storage units and their
location on disk and tape
Manages JSOC storage subsystems Disk array,
Robotic tape library
Scrubs old data from disk cache to maintain
enough free workspace
Loads and unloads tape to/from tape drives and
robotic library
Allocates disk storage needed by pipeline
processes through DRMS
Stages storage units requested by pipeline
processes through DRMS
Design features
RPC client-server protocol
Oracle DBMS (to be migrated to PostgreSQL)
DRMS Data Record Management System
Maintains database holding
Master tables with definitions of all JSOC series
and their keyword, link and data segment
definitions
One table per series containing record meta-data,
e.g. keyword values
Provides distributed transaction processing
framework for pipeline
Provides full meta-data searching through JSOC
query language
Multi-column indexed searches on primary index
values allows for fast and simple querying for
common cases
Inclusion of free-form SQL clauses allows
advanced querying

9
Pipeline software/hardware architecture
JSOC Science Libraries
Utility Libraries
Pipeline program module
File I/O
OpenRecords CloseRecords
GetKeyword, SetKeyword GetLink, SetLink
OpenDataSegment CloseDataSegment
DRMS Library
Data Segment I/O
JSOC Disks
JSOC Disks
JSOC Disks
JSOC Disks
Record Cache (KeywordsLinksData paths)
DRMS socket protocol
Data Record Management Service (DRMS)
Data Record Management Service (DRMS)
Storage unit transfer
Storage Unit Management Service (SUMS)
Data Record Management Service (DRMS)
AllocUnit GetUnit PutUnit
Storage unit transfer
SQL queries
Robotic Tape Archive
Database Server
SQL queries
SQL queries
Record Catalogs
Record Catalogs
Series Tables
Record Tables
Storage Unit Tables
10
JSOC Pipeline Workflow
Pipeline processing plan
Pipeline Operator
DRMS session
Module3
Processing script, mapfile List of pipeline
modules with needed datasets for input, output
PUI Pipeline User Interface (scheduler)
Module2
Processing History Log
Module1
DRMS Data Record Management service
DRMS Data Record Management service
SUMS Storage Unit Management System
11
Analysis modules co-I contributions and
collaboration

Contributions from co-I teams
Software for intermediate and high level analysis
modules
Data series definitions
Keywords, links, data segments, size of storage
units, primary index keywords etc.
Documentation
Test data and intended results for verification
Time
Explain algorithms and implementation
Help with verification
Collaborate on improvements if required (e.g.
performance or maintainability)
Contributions from HMI team
Pipeline execution environment
Software hardware resources (Development
environment, libraries, tools)
Time
Help with defining data series
Help with porting code to JSOC API
If needed, collaborate on algorithmic
improvements, tuning for JSOC hardware,
parallelization
Verification

12
HMI module status and MDI heritage
Intermediate and high level data products
Primary observables
Internal rotation
Heliographic Doppler velocity maps
Spherical Harmonic Time series
Mode frequencies And splitting
Internal sound speed
Full-disk velocity, sound speed, Maps (0-30Mm)
Local wave frequency shifts
Ring diagrams
Doppler Velocity
Carrington synoptic v and cs maps (0-30Mm)
Time-distance Cross-covariance function
Tracked Tiles Of Dopplergrams
Wave travel times
High-resolution v and cs maps (0-30Mm)
Egression and Ingression maps
Wave phase shift maps
Deep-focus v and cs maps (0-200Mm)
Far-side activity index
Stokes I,V
Line-of-sight Magnetograms
Line-of-Sight Magnetic Field Maps
Stokes I,Q,U,V
Full-disk 10-min Averaged maps
Vector Magnetograms Fast algorithm
Vector Magnetic Field Maps
Vector Magnetograms Inversion algorithm
Coronal magnetic Field Extrapolations
Tracked Tiles
Tracked full-disk 1-hour averaged Continuum maps
Coronal and Solar wind models
Continuum Brightness
Solar limb parameters
Brightness feature maps
Brightness Images
13
Example Global Seismology Pipeline
14
Questions to be discussed at working sessions

List of standard science data products
Which data products, including intermediate ones,
should be produced by JSOC to accomplish the
science goals of the mission?
What cadence, resolution, coverage etc. should
each data product have?
Which data products should be computed on the fly
and which should be archived?
What are the challenges to be overcome for each
analysis technique?
Detailing each branch of the processing pipeline
What are the detailed steps in each branch?
Can some of the computational steps be
encapsulated in general tools that can be shared
among different branches (example tracking)?
What are the CPU and I/O resource requirements of
computational steps?
Contributed analysis modules
What groups or individuals will contribute code,
and incorporate it in the pipeline?
If multiple candidate techniques and/or
implementations exist, which should be included
in the pipeline?
What is the test plan and what data is needed to
verify the approach?

15
JSOC Series Definition
16
Global Database Tables
17
Database tables for example series hmi_fd_v

Tables specific for each series contain per
record values of
Keywords
Record numbers of records pointed to by links
DSIndex an index identifying the SUMS storage
unit containing the data segments of a record
Series sequence counter used for generating
unique record numbers

18
Pipeline batch processing

A pipeline batch is encapsulated in a single
database transaction
If no module fails all data records are commited
and become visible to other clients of the JSOC
catalog at the end of the session
If failure occurs all data records are deleted
and the database rolled back
It is possible to commit data produced up to
intermediate checkpoints during sessions

Pipeline batch atomic transaction
Module 2.1
Module N
Commit Data Deregister
Module 1
Register session

DRMS API
DRMS API
DRMS API
DRMS API
DRMS API
Module 2.2
DRMS API
Input data records
Output data records
DRMS Service Session Master
Record Series Database
SUMS
19
Example of module code

A module doing a (naïve) Doppler velocity
calculation could look as shown below
Usage
doppler DRMSSESSIONhelios33546
"2009.09.01_160000_TAI" "2009.09.01_170000_TAI
"

extern CmdParams_t cmdparams / command line
args / extern DRMS_Env_t drms_env / DRMS
environment / int module_main(void)
DRMS_RecordSet_t filtergrams, dopplergram
int first_frame, status char
query1024,start,end start
cmdparms_getarg(cmdparams, 1) end
cmdparms_getarg(cmdparams, 2) sprintf(query,
"hmi_lev0_fgT_Obss-s", start, end)
filtergrams drms_open_records(drms_env, query,
"RD", status) if (filtergrams-gtnum_recs0)
printf("Sorry, no filtergrams found for
that time interval.\n") return -1
first_frame 0 / Start looping over record
set. / for () first_frame
find_next_framelist(first_frame, filtergrams)
if (first_frame -1) / No more complete
framelists. Exit. / break dopplergram
drms_create_records(drms_env, "hmi_fd_v",

1, status) if (status)
return -1 compute_dopplergram(first_frame,
filtergrams, dopplergram) drms_close_records(
drms_env, dopplergram) return 0
20
Example continued
int compute_dopplergram(int first_frame,
DRMS_RecordSet_t filtergrams,
DRMS_RecordSet_t
dopplergram) int n_rows, n_cols, tuning
DRMS_Segment_t fg10, dop short
fg_data10 char pol double dop_data
/ Get pointers for doppler data array. / dop
drms_open_datasegment(dopplergram-gtrecords0,
"v_doppler", "RDWR") n_cols
drms_getaxis(dop, 0) n_rows
drms_getaxis(dop, 1) dop_data (double
)drms_getdata(dop, 0, 0) / Get pointers for
filtergram data arrays. / for (ifirst_frame
iltfirst_frame10 i) fgi
drms_open_datasegment(filtergrams-gtrecordsi,
"intensity", "RD") fg_datai (short
)drms_getdata(fg, 0, 0) pol
drms_getkey_string(filtergrams-gtrecordsi,
"Polarization") tuning drms_getkey_int(filt
ergrams-gtrecordsi, "Tuning") printf(Using
filtergram (s, d)\n, pol, tuning) /
Do the actual Doppler computation./
calc_v(fg_data, dop_data)

Write a Comment

User Comments (0)