Testing Efficiency of Parallel I/O Software - PowerPoint PPT Presentation

About This Presentation
Title:

Testing Efficiency of Parallel I/O Software

Description:

Most users have limited I/O capability in their applications because they have ... parallel IO using different software stacks (as demonstrated in VH1 experiments) ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 10
Provided by: phili5
Learn more at: https://sdm.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: Testing Efficiency of Parallel I/O Software


1
Testing Efficiency of Parallel I/O Software
  • Weikuan Yu, Jeffrey Vetter
  • December 6, 2006

2
Testing Parallel IO at ORNL
  • Earlier analysis of scientific codes running at
    ORNL/LCF
  • Most users have limited I/O capability in their
    applications because they have not had access to
    portable, widespread PIO
  • Seldom direct use of MPI-IO
  • Little use of high-level IO middleware PnetCDF
    or HDF5
  • Large variance in performance of parallel IO
    using different software stacks (as demonstrated
    in VH1 experiments)
  • Ongoing work
  • Collect application IO access pattern and Lustre
    server IO traces, with tau, craypat, mpiP, etc
  • Testing other parallel IO components over Lustre
  • Analysis, benchmarking and optimization of data
    intensive scientific codes

3
Parallel IO Optimization at ORNL
  • Parallel IO over Lustre
  • A new file system still relies on a generic ADIO
    implementation
  • Generations of platforms at ORNL demands
    efficient parallel IO
  • Performance with Jaguar
  • Good read/write bandwidth for large shared single
    file
  • Not scalable for small read/write and parallel IO
    management operations (metadata)
  • Approaches for Optimizations
  • Providing a specific, ADIO implementation
    well-tuned for Lustre
  • Investigating parameters for adjusting striping
    pattern
  • Exploited Lustre file joining
  • Regular files can be joined in place
  • Split writing and hierarchical striping
  • Developed a prototype on an 80-node Linux cluster
  • Paper submitted to CCGrid 2006, available if
    interested

4
Some Characteristics of Lustre IO Performance
  • Performance can be significantly affected by
    stripe width
  • Need to introduce flexibilities in striping
    pattern
  • Exploit file joining for growing stripe width
    with increasing file size

5
Explore Lustre File Joining
  • Split writing
  • Create/write a shared file as multiple small
    files, aka subfiles
  • Temporary structure to hold file attributes
  • Subfiles joined at the closing time

open
File Attributes
read/write
Subfiles
close
Joined file
Diagram of Split Writing
6
Hierarchical Striping
  • Hierarchical striping
  • Create another level of striping pattern for
    subfiles
  • Allow maximum coverage of Lustre storage targets
  • Mitigate the impact of striping overhead

Diagram of Hierarchical Striping (HS) (HS width
N1 HS size Sw)
subfile 0
subfile 1
subfile n
ost0
ost1
ost2
ost3
ost2n
ost2n1
S1
S
0
1
nS1
nS
S2
S3
2
3
nS2
nS3
S-2
S-1
2S-2
2S-1
nS-2
nS-1
(Stripe width 2 Stripe size w)
7
Evaluation
Table 1 Scalability of Management Operations
Table 2 Performance of Collective Read/Write
No. of Processes Original New New
Create (Milliseconds) Create (Milliseconds) Create (Milliseconds) Create (Milliseconds)
4 8.05 8.05 8.75
8 11.98 11.98 8.49
16 20.81 20.81 8.63
32 37.37 37.37 8.98
Resize (Milliseconds) Resize (Milliseconds) Resize (Milliseconds) Resize (Milliseconds)
4 182.67 0.56 0.56
8 355.28 0.81 0.81
16 712.68 1.03 1.03
32 1432.5 1.36 1.36
No. of Processes Original New
Write (MB/sec) Write (MB/sec) Write (MB/sec)
16 396.5 1272.7
32 775.9 1086.0
Read (MB/sec) Read (MB/sec) Read (MB/sec)
16 275.64 538.87
32 506.93 635.14
Read/Write an existing joined file Read/Write an existing joined file Read/Write an existing joined file
Write (MB/sec) 87.5 88.4
Read (MB/sec) 123.2 122.5
  • Write/Read performance improved dramatically for
    new files
  • Read/Write of an existing join file is not well
    performing due to a non-optimized IO path for a
    join file in Lustre
  • Scalability of file open and file resize improved
    dramatically

8
Results on Scientific Benchmarks MPI-Tile-IO
and BT/IO
  • IO Pattern as represented by BT-IO can be
    improved if the number of iterations is small. It
    may help if an arbitrary number of files can be
    joined.
  • Write Performance in MPI-Tile-IO can be improved
    dramatically
  • Read performance in MPI-Tile-IO cannot be
    improved by file joining because reading an
    existing join file does not perform well

9
Conclusions
  • Parallel IO over Lustre
  • Split writing can improve metadata management
    operations
  • Stripe overhead can be mitigated with careful
    augmentations of stripe width
  • Lustre file joining
  • Race conditions when joining files multiple
    processes
  • Low read/write performance on an existing file
  • Not possible for arbitrary hierarchical striping
    because limited number of files can be joined
  • Need improvement before its production usage
    parallel IO
  • Next Steps
  • Continue optimization of parallel IO at ORNL,
  • Adapting the earlier techniques to liblustre on
    XT3/XT4
  • Develop/Exploit other features, group locks and
    dynamic stripe width
  • Adapting parallel I/O and parallel FS to wide
    area collaborative science with other IO
    protocols such as pNFS and Logistical Networking
Write a Comment
User Comments (0)
About PowerShow.com