FireTower Tech, LLP - PowerPoint PPT Presentation

1 / 111
About This Presentation
Title:

FireTower Tech, LLP

Description:

Find Your AVG Transfer Size. iostat ... The avg size of data written to disk per individual write ... r b w swpd free buff cache si so bi bo in cs us sy id wa ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 112
Provided by: charlesp6
Category:
Tags: firetower | llp | avg | free | tech

less

Transcript and Presenter's Notes

Title: FireTower Tech, LLP


1
FireTower Tech, LLP
  • Understanding Disk I/O
  • By Charles Pfeiffer
  • (804) 901-3992
  • CJPfeiffer_at_FireTowerTech.com
  • www.FireTowerTech.com

2
Agenda
  • Arrive 0900 0910
  • Section 1 0910 1000
  • Break 1000 1010
  • Section 2 1010 1100
  • Break 1100 1110
  • Section 3 1110 1200
  • Break 1200 1330
  • Section 4 1330 1420
  • Break 1420 1430
  • Section 5 1430 1520
  • Break 1520 1530
  • QA 1530 1630

3
Section 1
  • General Information
  • RAID
  • Throughput v. Response Time

4
Who Is This Guy?
  • Been an independent consultant for 11 years
  • Sun Certified Systems Administrator
  • Oracle Certified Professional
  • Taught Performance and Optimization class at
    Learning Tree
  • Taught UNIX Administration class at Virginia
    Commonwealth University
  • Primarily focus on complete system performance
    analysis and tuning

5
What Is He Talking About?
  • Disks are horrible!
  • Disks are slow!
  • Disks are a real pain to tune properly!
  • Multiple interfaces and points of bottlenecking!
  • What is the best way to tune disk IO? Avoid it!
  • Disks are sensitive to minor changes!
  • Disks dont play well in the SAN Box!
  • You never get what you pay for!
  • Thankfully, disks are cheap!

6
What Is He Talking About? (continued)
  • Optimize IO for specific data transfers
  • Small IO is easy, based on response time
  • Improved with parallelism, depending on IOps
  • Improved with better quality disks
  • Large IO is much more difficult
  • Increase transfer size. Larger IO slows response
    time!
  • Spend money on quantity not quality. Stripe
    wider!
  • You dont get what you expect (label spec)
  • You dont even come close!

7
Where Do Vendors Get The Speed Spec From?
  • 160 MBps capable does not mean 160 MBps sustained
  • Achieved in optimal conditions
  • Perfectly sized and contiguous disk blocks
  • Streamline disk processing
  • Achieved via a disk-to-disk transfer
  • No OS or FileSystem

8
What Do I Need To Know?
  • What is good v. bad?
  • What are realistic expectations in different
    cases?
  • How can you get the real numbers for yourself?
  • What should you do to optimize your IO?

9
Why Do I Care?
  • IO is the slowest part of the computer
  • IO improves slower than other components
  • CPU performance doubles every year or two
  • Memory and disk capacity double every year or two
  • Disk IO Throughput doubles every 10 to 12 years!
  • A cheap way to gain performance
  • Disks are bottlenecks!
  • Disks are cheap. SANs are not, but disk arrays
    are!

10
What Do Storage Vendors Say?
  • Buy more controllers
  • Sure, if you need them
  • How do you know what you need?
  • Dont just buy them to see if it helps
  • Buy more disks
  • Average SAN disk performs at
  • 50 disks performing at 1 ½ disk
  • Try getting 20 disks to perform at 5 instead (
    1 whole disk)

11
What Do Storage Vendors Say? (continued)
  • Buy more cache
  • Sure, but its expensive
  • Get all you can get out of the cheap disks first
  • Fast response time is good
  • Not if you are moving large amounts of data
  • Large transfers shouldnt get super-fast response
    time
  • Fast response time means you are doing small
    transfers

12
What Do Storage Vendors Say? (continued)
  • Isolate the IO on different subsystems
  • Just isolate the IO on different disks
  • Disks are the bottleneck, not controllers, cache,
    etc.
  • Again, expensive. Make sure you are maximizing
    the disks first.

13
What Do Storage Vendors Say? (continued)
  • Remove hot spots
  • Yes, but dont do this blindly!
  • Contiguous blocks reduce IOps
  • Balance contention (waits) v. IOps (requests)
    carefully!
  • RAID-5 is best
  • No its not, its just easier for them!

14
The Truth About SAN
  • SAN scalability
  • Yeah, but internal disk capacity has caught up
  • SAN ! easy to manage
  • SAN performance
  • Who told you that lie?
  • SAN definitely ! performance

15
The Truth About SAN (continued)
  • But I can stripe wider and I have cache, so
    performance must be good
  • You share IO with everyone else
  • You have little control over what is on each disk
  • Hot Spots v. Fragmentation
  • Small transfer sizes
  • Contention

16
How Should I Plan?
  • What do you need?
  • Quick response for small data sets
  • Move large chunks of data fast
  • A little of both
  • Corvettes v. Dump Trucks
  • Corvettes get from A to B fast
  • Dump Trucks get a ton of dirt from A to B fast

17
RAID Performance Penalties
  • Loss of performance for RAID overhead
  • Applies against each disk in the RAID
  • The penalties are
  • RAID-0 None
  • 1, 01, 10 20
  • 2 10
  • 3, 30 25
  • 4 33
  • 5, 50 43

18
Popular RAID Configurations
  • RAID-0 (Stripe or Concatenation)
  • Dont concatenate unless you have to
  • No fault-tolerance, great performance, cheap
  • RAID-1 (Mirror)
  • Great fault-tolerance, no performance gain,
    expensive
  • RAID-5 (Stripe With Parity)
  • medium fault-tolerance, low performance gain,
    cheap

19
Popular RAID Configurations (continued)
  • RAID-01 (Two or more stripes, mirrored)
  • Great performance/fault-tolerance, expensive
  • RAID-10 (Two or more mirrors, striped)
  • Great performance/fault-tolerance, expensive
  • Better than RAID-01
  • Not all hardware/software offer it yet

20
RAID-10 Is Better Than RAID-01
  • Given six disks
  • RAID-01
  • Stripe disks one through three (Stripe A)
  • Stripe disks four through six (Stripe B)
  • Mirror stripe A to stripe B
  • Lose Disk two. Stripe A is gone
  • Requires you to rebuild the stripe

21
RAID-10 Is Better Than RAID-01
  • RAID-10
  • Mirror disk one to disk two
  • Mirror disk three to disk four
  • Mirror disk five to disk six
  • Stripe all six disks
  • Lose Disk two. Just disk two is gone
  • Only requires you to rebuild disk two as a
    submirror

22
The Best RAID For The Job
23
Throughput Is Opposite Of Response Time
24
Common Throughput Speeds (MBps)
  • Serial 0.014
  • IDE 16.7, Ultra IDE 33
  • USB1 1.5, USB2 60
  • Firewire 50
  • ATA/100 12.5, SATA 150,
  • Ultra SATA 187.5

25
Common Throughput Speeds (MBps) (continued)
  • FW SCSI 20, Ultra SCSI 40,
  • Ultra3 SCSI 80, Ultra160 SCSI 160
  • Ultra320 SCSI 320
  • Gb Fiber 120, 2Gb Fiber 240,
  • 4Gb Fiber 480

26
Expected Throughput
  • Vendor specs are maximum (burst) speeds
  • You wont get burst speeds consistently
  • Except for disk-to-disk with no OS (e.g. EMC BCV)
  • So what should you expect?
  • Fiber 80 as best-case in ideal conditions
  • SCSI 70 as best-case in ideal conditions
  • Disk 60 as best-case in ideal conditions
  • But even that is before we get to transfer size

27
BREAK
  • See you in 10 minutes

28
Section 2
  • Transfer Size
  • Mkfile
  • Metrics

29
Transfer Size
  • Amount of data moved in one IO
  • Must be contiguous block IO
  • Fragmentation carries a large penalty!
  • Device IOps limits restrict throughput
  • Maximum transfer size allowed is different for
    different file systems and devices
  • Is Linux good or bad for large IO?

30
Transfer Size Limits
  • Controllers Unlimited
  • Disks and W2K3 NTFS 2 MB
  • Remember the vendor Speed Spec
  • W2K NTFS, VxFS and UFS 1 MB

31
Transfer Size Limits (continued)
  • NT NTFS and ext3 512 KB
  • ext2 256 KB
  • FAT16 128 KB
  • Old Linux 64 KB
  • FAT 32 KB

32
So Linux Is Bad?!
  • Again, what are you using the server for?
  • Transactional (OLTP) DB fine
  • Web server, small file share fine
  • DW, large file share Might be a problem!

33
Good Transfer Sizes
  • Small IO / Transactional DB
  • Should be 8K to 128K
  • Tend to average 8K to 32K
  • Large IO / Data Warehouse
  • Should be 64K to 1M
  • Tend to average 16K to 64K
  • Not very proportional compared to Small IO!
  • And it takes some tuning to get there!

34
Find Your AVG Transfer Size
  • iostat exn (from a live Solaris server)
  • extended device statistics
    ---- errors ---
  • r/s w/s kr/s kw/s wait actv wsvc_t
    asvc_t w b s/w h/w trn tot device
  • 2.8 1.1 570.7 365.3 0.0 0.1 2.9
    19.0 1 3 0 0 0 0 d10
  • (kr/s kw/s) / (r/s w/s)
  • (570.7 365.3) / (2.8 1.1) 240K

35
Find Your AVG Transfer Size (continued)
  • PerfMon

36
Find Your AVG Transfer Size (continued)
  • AVG Disk Bytes / AVG Disk Transfers
  • Allow PerfMon to run for several minutes
  • Look at the average field for Disk Bytes/sec
  • Look at the average field for Disk Transfers/sec

37
The mkfile Test
  • Simple, low-overhead, write of a contiguous (as
    much as possible) empty file
  • Really is no comparison! Get cygwin/SFU on
    Windows to run the same test
  • time mkfile 100m /mountpoint/testfile
  • Real is total time spent
  • Sys is time spent on hardware (writing blocks)
  • User is time spent at keyboard/monitor

38
The mkfile Test (continued)
  • User time should be minimal
  • Time in user space in the kernel
  • Not interacting with hardware
  • Waiting for user input, etc.
  • Unless its waiting for you to respond to a
    prompt, like to overwrite a file

39
The mkfile Test (continued)
  • System time should be 80 of real time
  • Time in system space in the kernel
  • Interacting with hardware
  • Doing what you want, reading from disk, etc.
  • Real (System User) WAIT
  • Any time not directly accounted for by the kernel
    is time spent waiting for a resource
  • Usually this is waiting for disk access

40
The mkfile Test (continued)
  • Common causes for waits
  • Resource contention (disk or non-disk)
  • Disks are to busy
  • Need wider stripes
  • Not using all of the disks in a stripe
  • Disks repositioning
  • Many small transfers due to fragmentation
  • Bad block/stripe/transfer sizes

41
The Right Block Size
  • Smaller for small IO, bigger for large IO
  • The avg size of data written to disk per
    individual write
  • In most cases you want to be at one extreme
  • As big as you can for large IO / as small as you
    can for small IO
  • Balance performance v. wasted space. Disks are
    cheap!
  • Is there an application block size?
  • OS block size should be

42
More iostat Metrics
  • iostat exn (from a live Solaris server)
  • extended device statistics
    ---- errors ---
  • r/s w/s kr/s kw/s wait actv wsvc_t
    asvc_t w b s/w h/w trn tot device
  • 2.8 1.1 570.7 365.3 0.0 0.1 2.9
    19.0 1 3 0 0 0 0 d10
  • w (wait) 1. Should be
  • b (busy) 3. Should be
  • Asvc_t 19 (ms response). Most argue that this
    should be
    Again, response v. throughput.

43
iostat On Windows
  • Not so easy
  • PerfMon can get you b
  • Physical Disk Disk Time
  • Not available in cygwin or SFU
  • So what do you do for w or asvc_t
  • Not much
  • You can ID wait issues as demonstrated later
  • Depend on the array/SAN tools

44
vmstat Metrics
  • Vmstat
  • procs -----------memory---------- ---swap--
    -----io---- --system-- ----cpu----
  • r b w swpd free buff cache si
    so bi bo in cs us sy id wa
  • 0 0 0 163608 77620 0 0 3
    1 1 0 5 11 1 3 96 0
  • bw (blocked/waiting) processes
  • Should be
  • us(er) v. sy(stem) CPU time

45
vmstat Metrics (continued)
  • Is low CPU idle bad?
  • Low is not 0
  • Idle cycles money wasted
  • Need to be able to process all jobs at peak
  • Dont need to be able to process all jobs at peak
    and have idle cycles for show!
  • Better off watching the run/wait/block queues
  • Run queue should be

46
vmstat On Windows
  • Cygwin works (b/w consolidated to b)

47
vmstat On Windows (continued)
  • PerfMon
  • System time idle time user time

48
vmstat on Windows (continued)
  • PerfMon
  • Run Queue is per processor (
  • Block/Wait queue is blocking queue length

49
Additional Metrics
  • Do not swap!
  • On UNIX you should never swap
  • Use your native OS commands to verify
  • Dont trust vmstat
  • On Windows some swap is OK
  • Use PerfMon to check Pages/sec.
  • Should be
  • Use free in cygwin

50
Additional Metrics (continued)
  • Network IO issues will make your server appear
    slow
  • netstat in displays errors/collisions
  • Collisions are common on auto-negotiate networks
  • Hard set the switch and server link speed/mode
  • Use net statistics workstation on Windows

51
BREAK
  • See you in 10 minutes

52
Section 3
  • Measuring Oracle IO
  • IO Factors/Equations
  • Striping A Stripe

53
Measuring Oracle IO
  • Install Statspack
  • _at_?rdbms/admin/spcreate
  • Schedule snapshots
  • _at_?rdbms/admin/spauto
  • Take your own snapshots
  • Exec statspack.snap
  • Get a report
  • _at_?rdbms/admin/spreport
  • Everybody gets a report

54
Measuring Oracle IO (continued)
  • Read the report
  • Instance Efficiency Percentages
  • Buffer hit
  • Execute to Parse
  • In-memory sort
  • Top 5 Timed Events
  • Db file sequential read is usually at the top and
    is in the most need of tuning

55
Measuring Oracle IO (continued)
  • Queries
  • Check Elapsed Time / Executions to find the long
    running queries
  • Dont forget to tune semi-fast queries that are
    executed many times
  • Tablespace/Datafile IO
  • Physical reads
  • Identify hot spots
  • May need to move/add files

56
Measuring Oracle IO (continued)
  • Memory Advisories
  • Buffer cache
  • PGA
  • Shared Pool

57
IO Performance Factors
  • Controller overhead 0.3 ms
  • Burst controller/disk speed varies. Vendor
    spec.
  • Average Transfer Size varies. Can be anything
    between the block size and the lesser of
    device/FS/OS limitation
  • Average Seek Time varies. Vendor spec. Most
    range between 1 and 10 ms

58
IO Equations
  • Controller Transfer Time (ms)
  • /
  • Controller IOps Limit
  • 1000 /
  • Controller Transfer Rate

59
IO Equations (continued)
  • Rotational Delay (ms)
  • 1/(RPM/30)
  • IO Time (ms)
  • Disk IOps Limit
  • 1000 /
  • Disk Transfer Rate

60
IO Equations (continued)
  • Optimal Disks Per Controller
  • / controller
  • NOT
  • controller speed spec / disk speed spec
  • IOps weight heavier against disks than against
    controllers

61
IO Equations (continued)
  • Stripe Size (read/write count / stripe) or ( / disks in the stripe)
  • What if I have nested stripes? (Dont!)
  • Outer Stripe Size (multiblock read/write count /
    stripes in the outer stripe )or(
    size /
    )
  • Inner Stripe Size / disks in the inner stripe

62
Striping A Stripe
  • Nested stripes must be planned carefully
  • The wrong stripe sizes can lead to degraded
    performance and wasted space
  • Assume we have 16 disks
  • The backend is configured as four RAID-5 luns,
    each one containing four disks
  • We want to stripe the four luns into one large
    volume on the OS with DiskSuite
  • Set Block Size high (e.g. 8K) and assume 32 for
    multiblock count

63
Striping A Stripe (continued)
  • The outer stripe size should 64K
  • 8K 32 / outer stripe
  • The inner stripe size should 16K
  • / in the inner stripe
  • Cant always be dead on
  • Round down to the next available size

64
Striping A Stripe (continued)
  • We throw out parity disks and just use data disks
    for the illustrations in this example
  • Whiteboard

65
Striping A Stripe (continued)
  • We need to write 256K of data
  • Data is divided into 64K chunks
  • Each 64K chunk is handed to one column in the
    outer stripe (a column represents an inner stripe
    set)
  • Each 64K chunk is divided into 16K chunks
  • Each 16K chunk is written to one column (one
    disk) in the inner stripe.
  • Perfect fit. All disks are used equally.

66
Striping A Stripe (continued)
  • 64K Outer Stripe Size Diagram
  • 16K to each inner stripe

67
Striping A Stripe (continued)
  • Same scenario, but use a 32K outer stripe size
    with the 16K inner stripe size
  • Data divided into 32K chunks
  • Each 32K chunk handed to one column in the outer
    stripe
  • Each 32K chunk divided into two 16K chunks

68
Striping A Stripe (continued)
  • The 16K chunks are written to two disks
  • You lose up to half of the performance value for
    the write and for future reads.

69
Striping A Stripe (continued)
  • 32K Outer Stripe Size Diagram
  • 16K to each inner stripe

70
Striping A Stripe (continued)
  • Same scenario, 128K outer stripe size
  • Data is divided into two 128K chunks
  • Third and Fourth RAID-5 sets (inner stripe
    columns) are never hit
  • Data fits nicely within the other two RAID sets
  • 128K divided into 16K chunks
  • Two chunks written to each of four disks

71
Striping A Stripe (continued)
  • 128K Outer Stripe Size Diagram
  • 16K to each inner stripe

72
Striping A Stripe (continued)
  • So you lost the use of half of the raid-5 sets in
    your outer stripe
  • But you made good use of the other two
  • What if the outer stripe size had been 256K
  • Lose the use of all but one RAID-set
  • Basically, only use four of the 16 disks

73
BREAK
  • See you in 10 minutes

74
Section 4
  • Oracle Disk Layout
  • Tuning
  • RamSan

75
Oracle Disk Layout
  • Many (myself included) say stripe wide
  • Dont do so at the expense of other good
    practices
  • Separation of IO is as/more important than
    striping IO
  • Depends on the type of IO
  • Depends on the parallelism of the application
  • Stay away from ASM!
  • Oracle loves to push/sell it
  • Requires an extra DB
  • ASM DB must be online for you to start your DB
  • You lose control over what goes where

76
Oracle Disk Layout (continued)
  • Striping is good, but make sure you retain
    control
  • You need to know what is on each disk. This
    theory kills the big SAN concept
  • Redo logs should be on their own independent
    disks even at the expense of striping because
    they are perfectly sequential
  • Tables and Indexes should be separated and
    striped very wide on their own set of disks
  • If you have multiple high IO tablespaces then
    each of them should be contained on their own
    subset of disks
  • Control files should be isolated and striped
    minimally (to conserve disks)

77
Disk Device Cache
  • Write Cache v. Read Cache
  • Writers block writers
  • Writers block readers
  • Readers block writers
  • Readers block readers
  • Cache it all! Cache is available in many places
  • Disk, Controller, FileSystem, Kernel
  • Dont double-cache one and zero-cache the other

78
Disk Device Cache (continued)
  • Dont double-cache reads if you have a lot of
    memory for buffering on the host. Use the disk
    system cache for writes.
  • You read the same data many times, it is easy to
    cache at the host
  • Reads are faster than writes. We know where the
    blocks to read are located. We have to plan
    where to store the blocks for a write.

79
Sequential v. Random Optimization
  • Sequential IO is 10 times faster than Random IO
  • Reorg/Defrag often to make data sequential
  • Cache writes to improve sequential layout
    percentage
  • Cache reads to aid with the performance of Random
    IO

80
Sequential v. Random Optimization (continued)
  • Random IO requires more disk seeks and more Iops
  • Use small transfer/stripe/block sizes
  • of disks is less important
  • Use disks with fast seek time
  • Sequential IO requires more throughput and
    streaming disks
  • Use large transfer/stripe/block sizes
  • Use a lot of disks
  • Use disks with better RPM

81
Tune Something
  • Kernel Parameters
  • MAXPHYS maximum transfer size limit
  • Yes there is a limit, that restricts you from
    reaching the maximum potential of the filesystem
    and/or disk device when you want to
  • Who thought that was a good idea?
  • Set it to 1M, which is hard maximum

82
Tune Something (continued)
  • Kernel Parameters
  • sd_max_throttle Number of IO requests allowed
    to wait in queue for a busy device.
  • Should be set to 256 / .
  • sd_io_time Amount of time an IO request can
    wait before timing out.
  • Should be set to 120 /

83
Tune Something (continued)
  • Filesystem Parameters
  • Maxcontig maximum number of contiguous blocks.
    Should be / . Set it
    really high if you arent sure. It is just a
    ceiling.
  • Direct/Async IO cache Follow your application
    specs. If you dont have app specs try different
    combinations. Large, sequential writes should
    NOT be double-cached. Async is usually best, but
    there are no guarantees from app to app

84
Tune Something (continued)
  • Filesystem Parameters
  • noatime/dfratime Why waste time updating inode
    access time parameters. They will be updated the
    next time some change happens to the file. Do
    you really need to know in-between? If you do
    fine, but this is extra overhead.
  • Forcedirectio Dont cache writes. Good for
    large, sequential writes.

85
Tune Something (continued)
  • Filesystem Searching
  • Many people like a small number of large
    filesystems because space management is easier
  • Filesystems are also starting points for searches
  • Searches are done using inodes
  • Try not to have too many inodes in one filesystem

86
Tune Something (continued)
  • Driver (HBA, Veritas, etc.) Parameters
  • Investigate conf files in /kernel/drv
  • Check limits on transfer sizes (e.g. vol_maxio
    for Veritas). These should usually be set to 1M
    per controller.
  • Check settings/limits for things like
    direct/async IO and cache. Make sure it falls in
    line with the rest of your configuration

87
Tune Something (continued)
  • Driver (HBA, Veritas, etc.) Parameters
  • Parameters for block shifting if you are using
    DMP (e.g. Veritas dmp_pathswitch_blks_shift
    should be 15).
  • lun_queue_depth limits the number of queue IO
    requets per lun.
  • Sun says 25. EMC says 32. Emulex says 20 (but
    their default is 30).
  • This is very confusing. Anything between 20 and
    32 is probably good?
  • Well, it should really be .

88
Tune Something (continued)
  • Others.
  • We could have a one week class.
  • The previous parameters follow the 90/10 rule and
    give you the most bang for the buck.
  • 10 of the parameters will give you 90 of the
    benefits.
  • This list is more like 3, but still yields about
    90 of the benefits

89
Tune Something (continued)
  • What about Windows?
  • Sorry, not much we can do
  • Cant tune the kernel for Disk IO like you can
    for Network IO
  • Cant tune NTFS
  • At the mercy of Microsofts Best Fit
  • HBA drivers do have parameters that can be tuned
    in a config file or in the registry

90
RAMSAN
  • Do IO on RAM, not on disk
  • Memory is much faster than disk!
  • Random memory outruns sequential disk
  • Bottleneck shifts from 320 MBps (haha!) disk to 4
    Gbps fiber channel adapter
  • Want more than 4 Gbps, just get more HBAs
  • What can your system bus(es) handle?
  • No need to optimize transfer size, stripe, etc.

91
RAMSAN (continued)
  • Problem data is lost when power is cycled
  • Most RAMSANs have battery backup and flush to
    disk when power is lost
  • Data is also flushed to disk throughout the day
    when performance levels are low
  • Only blocks that have a new value are flushed to
    disk
  • Block 1 is 0 and is flushed to disk
  • Block 1 is updated to 1
  • Block 1 is updated to 0
  • Flush cycle runs, but block 1 doesnt need to be
    copied to disk
  • Major performance improvement over similar cache
    monitors

92
RAMSAN (continued)
  • A leading product TMS Tera-RamSan
  • www.superssd.com
  • 3,200,000 IOps
  • 24 GBps
  • Super High Dollar
  • Everyone gets some PDFs

93
RAMSAN (continued)
  • Solid State Disks by
  • TMS
  • Solid Data Systems
  • Dynamic Solutions
  • Infiniband

94
BREAK
  • See you in 10 minutes

95
Section 5
  • IO Calculator
  • Wrap Up

96
Disk IO Performance Calculator
  • Spreadsheet of Performance Equations and
    automated formulas
  • Allows you to plug-n-play numbers and gauge the
    performance impacts
  • Helps determine what you need to get the bottom
    line throughput you are looking for
  • Helps determine the number of disks you can use
    per controller

97
Disk IO Performance Calculator (continued)
  • Works for both large IO and small IO
  • Contains examples to provide a better
    understanding of how different IO components
    impact each other.

98
Lets See The Calculator
99
Large Transfer Size v. Small Transfer Size
  • 986 IOps v. 1,064 IOps
  • 238 MBps v. 9 MBps
  • 8 disks / controller v. 33 disks / controller

100
12 Disks v. 36 Disks (Small Transfer Size)
  • 1,064 IOps v. 3,191 IOps
  • 9 MBps v. 27 MBps
  • 33 disks / controller v. 33 disks / controller

101
10K RPM v. 15K RPM (36 Disks, Small Transfer Size)
  • 3,191 IOps v. 3,588 IOps
  • 27 MBps v. 31 MBps
  • 33 disks / controller v. 29 disks / controller

102
6ms Seek v. 3ms Seek (15K RPM, 36 Disks, Small
Transfer)
  • 3,588 IOps v. 5,730 IOps
  • 31 MBps v. 49 MBps
  • 29 disks / controller v. 18 disks / controller
  • About as good as it gets.
  • 3ms Seek, 15K RPM
  • Yet 36 disks on two controllers only pushes 49
    MBps due to small (normal) transfer size

103
Back to Large Transfer Size (3 ms Seek, 15K RPM,
36 Disks)
  • 5,730 IOps v. 5,021 IOps
  • 49 MBps v. 1,210 MBps
  • 18 disks / controller v. 5 disks / controller
  • 1.2 GBps is pretty good
  • But 36 disks 160 MBps 5.6 GBps
  • Again, only in ideal test conditions
  • Max Transfer Size on every transfer
  • No OS/Filesystem overhead

104
Speed v. IOps
  • Notice we never came close to the speed threshold
    (multiply number of disks by consistent speed)
    for the disks before maxing out IOps
  • Notice that we did come close on two controllers
    with the large transfer size. If you push that
    much IO, you do need more controllers, but notice
    how big that number is

105
Large IO Requires A Large Transfer Size
  • Large IO requires large (not necessarily fast)
    individual transfers
  • You have to tune your transfer size
  • Avoid fragmentation
  • Use good stripe sizes
  • Use good block sizes

106
Now Lets Really See The Calculator
  • Refer To The Spreadsheet
  • Everyone gets their own copy
  • What tests do you want to run? Follow Along.
  • Feel free to contact the developer at any time
  • Charles Pfeiffer, CRT Sr. Consultant
  • (804) 901-3992
  • CJPfeiffer_at_FireTowerTech.com

107
Summary
  • You dont get the label spec in throughput. Not
    even close!
  • Throughput is the opposite of response time!
  • RAID decreases per-disk performance!
  • Make up for it with more disks

108
Summary (continued)
  • Striping a stripe requires careful planning
  • The wrong stripe size will decrease performance
  • Big money disk systems dont necessarily have big
    benefits
  • The range from high-quality to low-quality isnt
    that severe
  • Quantity tends to win out over quality in disks
  • Make your vendor agree to reasonable
    expectations!
  • Use the IO Calculator!

109
This Presentation
  • This document is not for commercial re-use or
    distribution without the consent of the author
  • Neither CRT, nor the author guarantee this
    document to be error free
  • Submit questions/corrections/comments to the
    author
  • Charles Pfeiffer, CJPfeiffer_at_FireTowerTech.com

110
BREAK
  • See you in 10 minutes

111
Are We Done Yet?
  • Final QA
  • Contact Me
  • 804.901.3992
  • CJPfeiffer_at_FireTowerTech.com
Write a Comment
User Comments (0)
About PowerShow.com