Data Processing Pipeline with TransactionOriented Data Sharing - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Data Processing Pipeline with TransactionOriented Data Sharing

Description:

'The portal to Mars at marsrovers.nasa.gov/home includes nearly ... Used by every major space missions to manage and distribute science data products world-wide ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 19
Provided by: thomas171
Category:

less

Transcript and Presenter's Notes

Title: Data Processing Pipeline with TransactionOriented Data Sharing


1
Data Processing Pipeline withTransaction-Oriented
Data Sharing
2
Agenda
  • Data Processing Pipeline
  • Traditional method
  • Pipeline
  • File Transaction Service
  • Transactions
  • FEI features
  • Components
  • Service Federation
  • Conclusion
  • QA

3
Data Processing Pipeline
4
The Players
  • Mission-specific telemetry processor
  • Acquire and process raw science data
  • Data decompression
  • Reconciliation
  • Geometric correction
  • Collection of science modules
  • Mosaics
  • Map projections
  • Pattern-recognition
  • Limb detection
  • Data product
  • Images, metadata, non-image binary data
  • Product catalog and distribution

5
Steps in Traditional Method
  • Analyst starts telemetry processor to
  • Acquire and process raw science data
  • Catalog and output data products under operation
    storage
  • Science users notified by analyst
  • (via email, phone call, disk pulling scripts) on
    new products availability
  • Upon being notified, science users
  • Login and initiate file transfer to acquire the
    product
  • Execute science modules on the data products to
    produce new data products (e.g. create mosaics)
    and store them under operation storage
  • notify other science users (return to Step 2)

6
What is wrong?
  • Requires too much user interactions
  • Performance bottleneck expensive
  • Limited security
  • Email
  • Requires users to physically login to operation
    machines to acquire data products
  • Raise new security and scalability concerns
  • Unsecured file transfer
  • Race conditions
  • Prone to premature releases
  • Users might obtain partially written files
  • Unmanaged concurrent updates
  • No product integrity verification

7
The Pipeline
  • Requires a product registration and delivery
    service as a
  • Security manager
  • Authentication, authorization, accounting, and
    communication
  • Traffic cop
  • Monitor concurrent accesses
  • Concurrent updates to the same file are
    serialized
  • Files in transaction are inaccessible by others
  • Messenger
  • Automatic delivery
  • Client to register local application modules to
    be dispatched upon delivery
  • Insurance agent
  • Data corruption detection on registration and
    delivery
  • Restart subscription and resume delivery
  • All registered products are time stamped for time
    based queries

8
Data Processing Pipeline
  • File Exchange Interface (FEI) developed at JPL
  • A transaction-oriented file registration and
    delivery service
  • Organizes files in user-defined file types
  • Subscription and automatic dispatch of local
    application modules for pipeline processing

9
File Transaction Service
  • The portal to Mars at marsrovers.nasa.gov/home
    includes nearly 1,000 Web pages filled with
    computer animations, panoramic and catalog of
    other images, some in 3-D, published online
    almost as soon as the scientists receive them.
  • -- The New York Times, January 27, 2004

10
Transaction
  • Database transaction - ACID properties
  • Atomic
  • sequence of operations must all execute with
    correct results or a rollback must occur
  • Correctness
  • consistency and integrity of the data
  • Isolation
  • updates must be performed in isolation to prevent
    other processes from accessing involved data
  • Durability
  • data must be available and persisted without
    ambiguity after commit

11
File Exchange Interface (FEI)
  • High performance file transaction service for
    science data product registration and
    distribution
  • Content-independent file management
  • Optimized for file query, delivery, reliability
    and high load
  • Used by every major space missions to manage and
    distribute science data products world-wide

12
FEI Features
  • Transaction-oriented file registration and
    sharing
  • File query and retrieval with restart
  • Automatic file delivery
  • File integrity verification
  • Security (authentication, authorization,
    accounting, communication)
  • File receipts (delivery accountability)
  • 64-bit file size and resume transfer
  • Virtual file type

13
File Transaction
  • Prevent concurrent updates to the same file
  • Prevent updates to file that is being read by
    others
  • Automatic rollback all incomplete transactions
  • A file is invisible to others until it has been
    successfully registered to the server

14
Delivery and Integrity
  • Automatic file delivery
  • New files registered to server are delivered to
    subscribed users
  • Subscriber client can register application
    modules to be invoked when new file is received
  • Subscriber client can resume/restart subscription
    session
  • File integrity verification
  • Checksums are computed on-the-fly by both the
    client and server
  • Mismatched checksum values is an indication of
    potential file corruption
  • File checksum values can be stored in the server
    registry as part of the file metadata

15
FEI Architecture
  • Component-based framework
  • Pluggable service components using virtual
    component design pattern
  • Select service components that are best fit for
    mission requirements

16
Service Federation
  • Why?
  • Limited physical resources on a single host
    machine
  • High-profile missions, such as MER, requires
    hundreds of concurrent users
  • Solution
  • An architecture that supports one or more servers
    to subscribe to the same registry

17
Service Federation
18
Conclusion
  • The pipeline offers
  • Automate science data processing
  • Reduce in manual interaction
  • Reduce in cost
  • Bring processed data to our science users in
    higher rate
  • Loose coupling between science modules with
    reactive behavior
  • Producers publish data products to FEI
  • Consumers receive data products and trigger local
    processing module automatically
  • Consumers become producers by publishing locally
    processed products to FEI
Write a Comment
User Comments (0)
About PowerShow.com