File System Benchmarks - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

File System Benchmarks

Description:

It is normal for the rewrite performance to be higher than ... Re-writer Report. 184401. 181254. 183131. 204742. 228342. 227368. 229802. 224931. 213342. 168260 ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 34
Provided by: jianzho
Category:

less

Transcript and Presenter's Notes

Title: File System Benchmarks


1
File System Benchmarks
2
Outline
  • Tools for benchmark
  • Iozone
  • Bonnie
  • Shell script (mkdir)
  • Personal notes
  • Fsync, fdatasync
  • Mount options

3
Tools for benchmark
  • Iozone http//www.iozone.org/
  • Bonnie http//www.coker.com.au/bonnie/
  • Shell script (for mkdir and rmdir)

4
Iozone
  • Benchmark features
  • ANSII C source
  • POSIX async I/O
  • Mmap() file I/O
  • Normal file I/O
  • Single stream measurement
  • Multiple stream measurement
  • Distributed fileserver measurements (Cluster)
  • POSIX pthreads
  • Multi-process measurement
  • Excel importable output for graph generation
  • Latency plots
  • 64bit compatible source
  • Large file compatible
  • Stonewalling in throughput tests to eliminate
    straggler effects
  • Processor cache size configurable
  • Selectable measurements with fsync, O_SYNC
  • Builds for AIX, BSDI, HP-UX, IRIX, FreeBSD,
    Linux, OpenBSD, NetBSD, OSFV3, OSFV4, OSFV5, SCO
    OpenServer, Solaris, Windows95/98/NT
  • Test for Read, write, re-read, re-write, read
    backwards, read strided, fread, fwrite, random
    read, pread ,mmap, aio_read, aio_write

5
Definitions of the tests
  • Write This test measures the performance of
    writing a new file. When a new file is written
    not only does the data need to be stored but also
    the overhead information for keeping track of
    where the data is located on the storage media.
    This overhead is called the metadata It
    consists of the directory information, the space
    allocation and any other data associated with a
    file that is not part of the data contained in
    the file. It is normal for the initial write
    performance to be lower than the performance of
    rewriting a file due to this overhead
    information.
  • Re-write This test measures the performance of
    writing a file that already exists. When a file
    is written that already exists the work required
    is less as the metadata already exists. It is
    normal for the rewrite performance to be higher
    than the performance of writing a new file.
  • Read This test measures the performance of
    reading an existing file.
  • Re-Read This test measures the performance of
    reading a file that was recently read. It is
    normal for the performance to be higher as the
    operating system generally maintains a cache of
    the data for files that were recently read. This
    cache can be used to satisfy reads and improves
    the performance.

6
  • Random Read This test measures the performance
    of reading a file with accesses being made to
    random locations within the file. The performance
    of a system under this type of activity can be
    impacted by several factors such as Size of
    operating systems cache, number of disks, seek
    latencies, and others.
  • Random Write This test measures the performance
    of writing a file with accesses being made to
    random locations within the file. Again the
    performance of a system under this type of
    activity can be impacted by several factors such
    as Size of operating systems cache, number of
    disks, seek latencies, and others.
  • Random Mix This test measures the performance of
    reading and writing a file with accesses being
    made to random locations within the file. Again
    the performance of a system under this type of
    activity can be impacted by several factors such
    as Size of operating systems cache, number of
    disks, seek latencies, and others. This test is
    only available in throughput mode. Each
    thread/process runs either the read or the write
    test. The distribution of read/write is done on a
    round robin basis. More than one thread/process
    is required for proper operation.

7
  • Backwards Read This test measures the
    performance of reading a file backwards. This may
    seem like a strange way to read a file but in
    fact there are applications that do this. MSC
    Nastran is an example of an application that
    reads its files backwards. With MSC Nastran,
    these files are very large (Gbytes to Tbytes in
    size). Although many operating systems have
    special features that enable them to read a file
    forward more rapidly, there are very few
    operating systems that detect and enhance the
    performance of reading a file backwards.
  • Record Rewrite This test measures the
    performance of writing and re-writing a
    particular spot within a file. This hot spot can
    have very interesting behaviors. If the size of
    the spot is small enough to fit in the CPU data
    cache then the performance is very high. If the
    size of the spot is bigger than the CPU data
    cache but still fits in the TLB then one gets a
    different level of performance. If the size of
    the spot is larger than the CPU data cache and
    larger than the TLB but still fits in the
    operating system cache then one gets another
    level of performance, and if the size of the spot
    is bigger than the operating system cache then
    one gets yet another level of performance.

8
  • Strided Read This test measures the performance
    of reading a file with a strided access behavior.
    An example would be Read at offset zero for a
    length of 4 Kbytes, then seek 200 Kbytes, and
    then read for a length of 4 Kbytes, then seek 200
    Kbytes and so on. Here the pattern is to read 4
    Kbytes and then Seek 200 Kbytes and repeat the
    pattern. This again is a typical application
    behavior for applications that have data
    structures contained within a file and is
    accessing a particular region of the data
    structure. Most operating systems do not detect
    this behavior or implement any techniques to
    enhance the performance under this type of access
    behavior. This access behavior can also sometimes
    produce interesting performance anomalies. An
    example would be if the applications stride
    causes a particular disk, in a striped file
    system, to become the bottleneck.

9
  • Fwrite This test measures the performance of
    writing a file using the library function
    fwrite().This is a library routine that performs
    buffered write operations. The buffer is within
    the users address space. If an application were
    to write in very small size transfers then the
    buffered blocked I/O functionality of fwrite()
    can enhance the performance of the application by
    reducing the number of actual operating system
    calls and increasing the size of the transfers
    when operating system calls are made. This test
    is writing a new file so again the overhead of
    the metadata is included in the measurement.
  • Frewrite This test measures the performance of
    writing a file using the library function
    fwrite(). This is a library routine that performs
    buffered blocked write operations. The buffer
    is within the users address space. If an
    application were to write in very small size
    transfers then the buffered blocked I/O
    functionality of fwrite() can enhance the
    performance of the application by reducing the
    number of actual operating system calls and
    increasing the size of the transfers when
    operating system calls are made. This test is
    writing to an existing file so the performance
    should be higher as there are no metadata
    operations required.

10
  • Fread This test measures the performance of
    reading a file using the library function
    fread(). This is a library routine that performs
    buffered blocked read operations. The buffer is
    within the users address space. If an
    application were to read in very small size
    transfers then the buffered blocked I/O
    functionality of fread() can enhance the
    performance of the application by reducing the
    number of actual operating system calls and
    increasing the size of the transfers when
    operating system calls are made.
  • Freread This test is the same as fread above
    except that in this test the file that is being
    read was read in the recent past. This should
    result in higher performance as the operating
    system is likely to have the file data in cache.

11
Test options
  • For nfs -azc -U /mnt/nfs -n y -g y -q 1M -y 1K
    -b xxx-y.wks, y 10M, 100M, 1G
  • For ext3, lustre -a -n y -g y -q 1M -y 1K -b
    xxx-y.wks, y 10M, 100M, 1G

12
Iozone result (ext3-10M)
13
Iozone result (lustre-10M)
14
Iozone result (nfs-10M)
15
Iozone result (ext3-100M)
16
Iozone result (lustre-100M)
17
Iozone result (ext3-1G)
18
Iozone result (lustre-1G)
19
Bonnie
  • A program to test hard drives and file systems
    for performance or the lack therof. There are a
    many different types of file system operations
    which different applications use to different
    degrees. Bonnie tests some of them and for each
    test gives a result of the amount of work done
    per second and the percentage of CPU time this
    took. For performance results higher numbers are
    better, for CPU usage lower are better.
  • There are two sections to the program's
    operations.
  • Test the IO throughput in a fashion that is
    designed to simulate some types of database
    applications.
  • Test creation, reading, and deleting many small
    files in a fashion similar to the usage
    patterns of programs such as Squid or INN.
  • bon_csv2html
  • bon_csv2html lt input file gt html file

20
Bonnie result (ext3)
  • bonnie -s 1g -n 32102416100
  • http//ds127.ee.ncku.edu.tw/qq/bonnie.html

21
Bonnie result (lustre,nfs)
22
Shell script (mkdir)
  • !/bin/sh
  • for ((i 0 i lt 20 i)) do
  • j((1000(i1)))
  • cd /mnt/nfs
  • cd /mnt/lustre
  • echo "creating directories ti- (j) date
    'HMS.N'"
  • /home/working/lustre-1.0.4/tests/mkdirmany
    tdiri j
  • cd ..
  • umount nfs
  • umount lustre
  • echo "done date 'HMS.N'"
  • mount ost1/ost /mnt/nfs
  • lconf -v --node client /etc/lustre/config.xml
  • cd nfs
  • cd lustre
  • echo "deleting directories date
    'HMS.N'"
  • for ((k0 klt10k))
  • do
  • rm -rf tdiri-k0-4 rm -rf
    tdiri-k5-9 rm -rf tdiri-k

23
Mkdir result (ext3)
24
Mkdir result (lustre)
25
Mkdir result (nfs)
26
Some notes
  • IDE drives write cache (hdparm W)
  • fdatasync(2) flush user data
  • fsync(2) flush user meta data
  • Mount options
  • async, sync, dirsync
  • atime, noatime
  • Reboot, remount, sleep

27
sync(8)
  • writes any data buffered in memory out to disk.
    This can include (but is not limited to)
    modified superblocks, modified inodes, and
    delayed reads and writes. This must be
    implemented by the kernel The sync program does
    nothing but exercise the sync(2) system call.

28
fsync(2), fdatasync(2)
  • SYNOPSIS
  • include ltunistd.hgt
  • int fsync(int fd)
  • int fdatasync(int fd)
  • fsync
  • copies all in-core parts of a file to disk, and
    waits until the device reports that all
    parts are on stable storage. It also updates
    metadata stat information. It does not
    necessarily ensure that the entry in the
    directory containing the file has also reached
    disk. For that an explicit fsync on the file
    descriptor of the directory is also needed.
  • fdatasync
  • does the same as fsync but only flushes user
    data, not the meta data like the mtime or atime.

29
fsync(2), fdatasync(2) (cont.)
  • NOTES
  • In case the hard disk has write cache enabled,
    the data may not really be on permanent storage
    when fsync/fdatasync return.
  • When an ext2 file system is mounted with the
    sync option, directory entries are also
    implicitly synced by fsync.
  • On kernels before 2.4, fsync on big files can
    be inefficient. An alternative might be to use
    the O_SYNC flag to open(2).

30
Mount
  • Async All I/O to the file system should be done
    asynchronously.
  • Atime Update inode access time for each access.
    This is the default.
  • Noatime Do not update inode access times on
    this file system.
  • Sync All I/O to the file system should be done
    synchronously.
  • Dirsync All directory updates within the file
    system should be done synchronously. This affects
    the following system calls creat, link,
    unlink, symlink, mkdir, rmdir, mknod and rename.

31
Mount options for ext3
  • datajournal / dataordered / datawriteback
    Specifies the journalling mode for file data.
    Metadata is always journaled.
  • Journal All data is committed into the journal
    prior to being written into the main file system.
  • Ordered This is the default mode. All data is
    forced directly out to the main file system prior
    to its metadata being committed to the journal.
  • Writeback Data ordering is not preserved - data
    may be written into the main file system after
    its metadata has been committed to the journal.
    This is rumoured to be the highest-throughput
    option. It guarantees internal file system
    integrity, however it can allow old data to
    appear in files after a crash and journal
    recovery.

32
Mount options for NFS
  • rsize8192,wsize8192
  • This will make your nfs connection faster than
    with the default buffer size of 4096. (NFSv2 does
    not work with larger values of rsize and wsize.)

33
Mounting Lustre
  • mount -t lustre_lite -o osclov1,mdcMDC_ds127.ee.
    ncku.edu.tw_mds1_MNT_client config /mnt/lustre
Write a Comment
User Comments (0)
About PowerShow.com