SciDAC SDM Center All Hands Meeting, October 5-7, 2005 - PowerPoint PPT Presentation

About This Presentation
Title:

SciDAC SDM Center All Hands Meeting, October 5-7, 2005

Description:

SciDAC SDM Center All Hands Meeting, October 5-7, 2005 ... ML310-board3. ML310-board2. Active Storage System (reconfigurable system) External net ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 28
Provided by: yua4
Learn more at: https://sdm.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: SciDAC SDM Center All Hands Meeting, October 5-7, 2005


1
Parallel I/O Middleware Optimizations andFuture
Directions
Northwestern University PIs Alok Choudhary, Wei-keng Liao
Graduate Students Jianwei Li, Avery Ching, Kenin Coloma
ANL Collaborators Bill Gropp, Rob Ross, Rajeev Thakur, Rob Latham
  • SciDAC SDM Center All Hands Meeting, October 5-7,
    2005

2
Outline
  • Progress and accomplishments Wei-keng Liao
  • Parallel netCDF
  • Client-side file caching in MPI-IO
  • Data-type I/O for non-contiguous file access in
    PVFS
  • Future research directions Alok Choudhary
  • I/O middleware
  • Autonomic and Active storage Systems

3
Parallel NetCDF
  • NetCDF defines
  • A set of APIs for file access
  • A machine-independent file format
  • Parallel netCDF work
  • New APIs for parallel access
  • Maintaining the same file format
  • Tasks
  • Built on top of MPI for portability and high
    performance
  • Support C and Fortran interfaces
  • Support external data representations

4
PnetCDF Current Status
  • Version 1.0.0 was released on July 27, 2005
  • Supported platforms
  • Linux Cluster, IBM SP, SGI Origin, Cray X, NEC SX
  • Two sets of parallel APIs are completed
  • High level APIs (mimicking the serial netCDF
    APIs)
  • Flexible APIs (extended to utilize MPI derived
    datatype)
  • Fully supported both in C and Fortran
  • Support for large file ( gt 4GB files)
  • Test suites
  • Self test codes ported from Unidata netCDF
    package to validate against single-process
    results
  • Parallel test codes for both sets of APIs

5
Illustrative PnetCDF Users
  • FLASH astrophysical thermonuclear application
    from ASCI/Alliances center at university of
    Chicago
  • ACTM atmospheric chemical transport model, LLNL
  • WRF-ROMS regional ocean model system I/O module
    from scientific data technologies group, NCSA
  • ASPECT data understanding infrastructure, ORNL
  • pVTK parallel visualization toolkit, ORNL
  • PETSc portable, extensible toolkit for
    scientific computation, ANL
  • PRISM PRogram for Integrated Earth System
    Modeling, users from CC Research Laboratories,
    NEC Europe Ltd.
  • ESMF earth system modeling framework, national
    center for atmospheric research
  • More

6
PnetCDF Future Work
  • Non-blocking I/O APIs
  • Performance improvement for data type conversion
  • Type conversion while packing non-contiguous
    buffers
  • Extending PnetCDF for newer applications, e.g.,
    data analysis and mining
  • Collaboration with application users

7
File Caching in MPI-IO
Applications
Parallel netCDF
MPI-IO
PVFS
Storage devices
8
File Caching for Parallel Apps
  • Why file caching?
  • Improves the performance for repeated file access
  • Enable write-behind strategy
  • Accumulates multiple small writes to better
    utilize network bandwidth
  • May balance the work load for irregular I/O
    patterns
  • Useful for checkpointing
  • Enable data pre-fetching
  • Useful for read-only applications (parallel data
    mining, visualization)
  • Why not just use traditional caching strategies?
  • Each client performs independently ? cache
    incoherence
  • I/O servers are in charged with cache coherence
    control ? potential I/O serialization
  • Inadequate for parallel environment where
    application clients frequently read/write shared
    files

9
Caching Sub-system in MPI-IO
  • Application-aware file caching
  • A user-level implementation in MPI-IO library
  • MPI communicators define the subsets of processes
    operating on a shared file
  • Processes cooperate with each other to perform
    caching
  • Data cached in one client can be directly
    accessed by another
  • Moves cache coherence control from servers to
    clients
  • Distributed coherence control (less overhead)
  • Supports both collective and independent I/O

10
Design
  • Cache metadata
  • File-block based granularity
  • Cyclically stored in all processes
  • Global cache pool
  • Comprises local memory of all processes
  • Single copy of file data to avoid coherence issue
  • Two implementations
  • Using an I/O thread (POSIX thread)
  • Using the MPI remote-memory-access (RMA) facility

11
Example Read Operation
12
Future Work
  • Data pre-fetching
  • Instructional (through MPI info) and
    non-instructional (based on sequential access)
  • Collective write-behind for data check-pointing
  • Stand-alone distributed lock sub-system
  • Using MPI-2 remote-memory access facility
  • Design new MPI file hints for caching
  • Application I/O pattern study
  • Structured/unstructured AMR

13
Data-type I/O in PVFS
Applications
Parallel netCDF
MPI-IO
PVFS
Storage devices
14
Non-contiguous I/O
  • Four types
  • Contiguous both in memory and file
  • Contiguous in memory, non-contiguous in file
  • Non-contiguous in memory, contiguous in file
  • Non-contiguous both in memory and file
  • Each segment is an I/O request of (offset, length)

15
Implementations
  • POSIX I/O
  • One call per (offset, length)
  • Generates large number of I/O requests
  • Data sieving
  • Single (offset, length) covering multiple
    segments
  • Accessing unused data and introduces consistency
    control overhead
  • List I/O
  • Single calls handle multiple non-contiguous
    access
  • Passing multiple (offset, length)s across network

Application process
I/O request
I/O request
I/O request
Client-side file system
Application process
List I/O request
Client-side file system
network
Server-side file system
16
Data-type I/O
  • Single requests all the way to the servers
  • Abandons offset-length pair representation
  • Borrow MPI datatype concept to describe
    non-contiguous access patterns
  • New file system data types
  • New file system interfaces
  • An implementation in PVFS
  • Both client and server sides

Application process
Datatype I/O request
PVFS client
Single request
network
PVFS server
17
Summary of Accomplishments
  • High-level I/O
  • Parallel netCDF
  • Low-level I/O
  • MPI-IO file caching
  • Parallel file system
  • Data-type I/O in PVFS

Parallel netCDF
MPI-IO
PVFS
18
Future Research
19
Typical Components in I/O Systems
Compute node
Compute node
Compute node
Compute node
  • Based on a lot of current apps
  • High-level
  • E.g., NetCDF, HDF, ABC
  • Applications use these
  • Mid-level
  • E.g., MPI-IO
  • Performance experience
  • Low-level
  • E.g., File systems
  • Critical for performance in above
  • More access info lost if more components used

Applications
Client-side File System
network
I/O Server
I/O Server
I/O Server
End-to-End Performance critical
20
(No Transcript)
21
Decouple What from How andBe Proactive
Goal
Current
streaming/
FS
DM
Datasets
HSS
Small/large
configuration
Speed BW Latency QoS
s/w layer
caching
load balance
Regular/irregular
collective
Fault-tolerance
I/O SW OPT
Local/remote
reorganize
Understand
  • user burdened
  • Ineffective interfaces
  • Non-communicating layers

App4
App1
App2
App3
22
Component Design for I/O
  • Application-aware
  • Capture applications file access information
  • Relationship between files, objects, users
  • Environment-aware
  • Network (reliability, security), storage devices
    (active disks)
  • Context-aware
  • Binding data attributes to files, indexing for
    fast search
  • High-performance I/O needs supports from
  • Languages Compilers
  • I/O libraries
  • File systems
  • Storage devices

23
Component Interface Design
  • Informative
  • Should deliver access/storage information
    top-down/bottom-up
  • Flexibility
  • Should describe arbitrary data distribution in
    memory buffers, files, storage devices
  • Functionality
  • Asynchronous operations, read-ahead,
    write-behind, replications
  • Provides ability for additional innovation
  • Object-based I/O
  • For hardware control (I/O co-processor, active
    disk, object-based file systems, etc.)

24
Future Work in MPI-IO
  • Investigate interface extensions
  • Client-side caching sub-system
  • Implementations for various I/O strategies
    buffering, pre-fetching, replication, migration
  • Adaptive caching mechanisms and algorithms for
    optimizing different access patterns
  • Distributed mutual exclusive locking sub-system
  • Shared resources, such as files and memory
  • Pipeline locking (overlap lock waiting time with
    I/O)
  • Work with HDF5 and parallel netCDF
  • Design I/O strategies for metadata and data
  • Metadata small, overlap, repeated, strong
    consistency requirement
  • Array data large, less frequent update

25
Future Work in Parallel File Systems
  • File caching (focus on parallel apps)
  • File versioning
  • Alternative to file locking
  • Reliability and availability aspects
  • Guarantee atomicity in the presence of client or
    I/O system failure
  • Can enable efficient RAID-type schemes in PFS
    (because of atomicity)
  • Dynamic rebalancing of I/O
  • File list lock
  • Locks to multiple regions in a single request

26
Active Storage System (reconfigurable system)
External net
ML310-board1
ML310-host
ML310-board2
Switch
ML310-board3
ML310-board4
  • Xilinx XC2VP30 Virtex-II Pro family
  • 30,816 logic cells (3424 CLBs)
  • 2 PPC405 embedded cores
  • 2,448 Kb (136 18 Kb blocks) BRAM
  • 136 dedicated 18x18 multiplier blocks
  • Software
  • Data Mining
  • Encryption
  • Functions and runtime libs
  • Linux micro-kernel

27
MineBench - data mining benchmark suite
Write a Comment
User Comments (0)
About PowerShow.com