Yijian Wang

About This Presentation

Title:

Yijian Wang

Description:

Northeastern University {yiwang, kaeli}_at_ece.neu.edu. 1/30/2003 BARC. 2. Outline. Introduction ... Northeastern University Computer Architecture Research Group ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 18

Provided by: yijia

Learn more at: https://groups.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Yijian Wang

1
Profile-Guided I/O Partitioning

Yijian Wang
David Kaeli
Electrical and Computer Engineering Department
Northeastern University
yiwang, kaeli_at_ece.neu.edu

2
Outline

Introduction
Related work
Profile-guided I/O partitioning
Benchmarks
Experimental results
Conclusions and future work

3
Introduction

The I/O bottleneck
The growing gap between the speed of processors
and I/O devices
Some applications access disks very frequently
I/O intensive applications
Multimedia applications
Database applications
Parallel scientific applications

4
Related work

Fast disks
FC-connected SCSI disks
Smart caching I/O controller (EMC, IO Integrity)
Parallel I/O
Parallel disks (i.e., RAID)
Parallel file systems (NFS, PIOF, HPS, etc.)
Runtime parallel systems (MPI-IO, ROMIO, ADIO)
Compiler technology
(Loop tiling, compiler-directed collective I/O)
To achieve high performance, I/O should be
parallelized at multiple levels (application,
file system, disks)

5
I/O Partitioning

Our target applications are parallel scientific
codes running on Beowulf clusters
I/O is parallelized at both the application level
(using MPI and MPI-IO) and the disk level (using
file partitioning)
Ideally, every process will only access files on
local disk (though this is typically not possible
due to data sharing)
How to recognize the access patterns ?
dynamically (profiling)
statically (compiler)

6
Profile generation
Run the application
Capture I/O traces
Apply our partitioning algorithm
Rerun the tuned application
7
I/O traces and partitioning

For every process, for every contiguous file
access, we capture the following I/O profile
information
Process ID
File ID
Address
Chunk size
I/O operation (read/write)
Timestamp
Generate a partition for every process
Partitioning is NP-complete

8
Our Greedy Algorithm
For each MPI-IO process create a file
partition For each contiguous data
chunk identify the process that most frequently
accesses this chunk assign the chunk to the
associated partition For each
partition reorder data in the partition based on
first access to each chunk
9
Benchmarks

NASA Parallel Benchmark (NPB2.4)/BT
Computational fluid dynamics
Generates a file (1.6 GB) dynamically and then
reads it
Writes/reads sequentially in chunk sizes of 2040
Bytes
SPEChpc96/seismic
Seismic processing
Generates a file (1.5 GB) dynamically and then
reads it back
Writes sequential chunks of 96 KB and reads
sequential chunks of 2 KB
mpi-tile-io
Parallel Benchmarking Consortium
Tile access to a two-dimensional matrix (1 GB)
with overlap
Writes/reads sequentially chunks of 32 KB, with
2KB of overlap
All applications uses MPI and MPI-IO for
computation, communication and I/O

10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Conclusions and future work

We obtain scalable speedup due to
creating parallel I/O channels
reducing disk seek time
reducing communication overhead
I/O access patterns are generally independent of
data values, for the applications studied
Investigating static (compile time) approaches to
I/O partitioning

17
Northeastern University Computer Architecture
Research Grouphttp//www.ece.neu.edu/groups/nucar