Title: Testing Efficiency of Parallel I/O Software
1Testing Efficiency of Parallel I/O Software
- Weikuan Yu, Jeffrey Vetter
- December 6, 2006
2Testing Parallel IO at ORNL
- Earlier analysis of scientific codes running at
ORNL/LCF - Most users have limited I/O capability in their
applications because they have not had access to
portable, widespread PIO - Seldom direct use of MPI-IO
- Little use of high-level IO middleware PnetCDF
or HDF5 - Large variance in performance of parallel IO
using different software stacks (as demonstrated
in VH1 experiments) - Ongoing work
- Collect application IO access pattern and Lustre
server IO traces, with tau, craypat, mpiP, etc - Testing other parallel IO components over Lustre
- Analysis, benchmarking and optimization of data
intensive scientific codes
3Parallel IO Optimization at ORNL
- Parallel IO over Lustre
- A new file system still relies on a generic ADIO
implementation - Generations of platforms at ORNL demands
efficient parallel IO - Performance with Jaguar
- Good read/write bandwidth for large shared single
file - Not scalable for small read/write and parallel IO
management operations (metadata) - Approaches for Optimizations
- Providing a specific, ADIO implementation
well-tuned for Lustre - Investigating parameters for adjusting striping
pattern - Exploited Lustre file joining
- Regular files can be joined in place
- Split writing and hierarchical striping
- Developed a prototype on an 80-node Linux cluster
- Paper submitted to CCGrid 2006, available if
interested
4Some Characteristics of Lustre IO Performance
- Performance can be significantly affected by
stripe width - Need to introduce flexibilities in striping
pattern - Exploit file joining for growing stripe width
with increasing file size
5Explore Lustre File Joining
- Split writing
- Create/write a shared file as multiple small
files, aka subfiles - Temporary structure to hold file attributes
- Subfiles joined at the closing time
open
File Attributes
read/write
Subfiles
close
Joined file
Diagram of Split Writing
6Hierarchical Striping
- Hierarchical striping
- Create another level of striping pattern for
subfiles - Allow maximum coverage of Lustre storage targets
- Mitigate the impact of striping overhead
Diagram of Hierarchical Striping (HS) (HS width
N1 HS size Sw)
subfile 0
subfile 1
subfile n
ost0
ost1
ost2
ost3
ost2n
ost2n1
S1
S
0
1
nS1
nS
S2
S3
2
3
nS2
nS3
S-2
S-1
2S-2
2S-1
nS-2
nS-1
(Stripe width 2 Stripe size w)
7Evaluation
Table 1 Scalability of Management Operations
Table 2 Performance of Collective Read/Write
No. of Processes Original New New
Create (Milliseconds) Create (Milliseconds) Create (Milliseconds) Create (Milliseconds)
4 8.05 8.05 8.75
8 11.98 11.98 8.49
16 20.81 20.81 8.63
32 37.37 37.37 8.98
Resize (Milliseconds) Resize (Milliseconds) Resize (Milliseconds) Resize (Milliseconds)
4 182.67 0.56 0.56
8 355.28 0.81 0.81
16 712.68 1.03 1.03
32 1432.5 1.36 1.36
No. of Processes Original New
Write (MB/sec) Write (MB/sec) Write (MB/sec)
16 396.5 1272.7
32 775.9 1086.0
Read (MB/sec) Read (MB/sec) Read (MB/sec)
16 275.64 538.87
32 506.93 635.14
Read/Write an existing joined file Read/Write an existing joined file Read/Write an existing joined file
Write (MB/sec) 87.5 88.4
Read (MB/sec) 123.2 122.5
- Write/Read performance improved dramatically for
new files - Read/Write of an existing join file is not well
performing due to a non-optimized IO path for a
join file in Lustre
- Scalability of file open and file resize improved
dramatically
8Results on Scientific Benchmarks MPI-Tile-IO
and BT/IO
- IO Pattern as represented by BT-IO can be
improved if the number of iterations is small. It
may help if an arbitrary number of files can be
joined. - Write Performance in MPI-Tile-IO can be improved
dramatically - Read performance in MPI-Tile-IO cannot be
improved by file joining because reading an
existing join file does not perform well
9Conclusions
- Parallel IO over Lustre
- Split writing can improve metadata management
operations - Stripe overhead can be mitigated with careful
augmentations of stripe width - Lustre file joining
- Race conditions when joining files multiple
processes - Low read/write performance on an existing file
- Not possible for arbitrary hierarchical striping
because limited number of files can be joined - Need improvement before its production usage
parallel IO - Next Steps
- Continue optimization of parallel IO at ORNL,
- Adapting the earlier techniques to liblustre on
XT3/XT4 - Develop/Exploit other features, group locks and
dynamic stripe width - Adapting parallel I/O and parallel FS to wide
area collaborative science with other IO
protocols such as pNFS and Logistical Networking