Title: Performance Designs
1Performance Designs
- Designing for Specific Performance and Testing
Brendon Higgins CEng MBCS CITP NetApp User
Group June 2009
2Presenter
- Brendon Higgins CEng MBCS CITP
- NetApp Certified Data Management Administrator
- Data Centre Services Engineer
- DLA Piper UK LLP
- Brendon.Higgins_at_dla.com
Author Images used on web forums
3Why Bother Attending
- Learn about storage performance at the design
stage based on a case study example. - How is performance measured
- What performance should be expected
- What performance is being delivered
4Introduction to DLA Piper
- Global Legal Firm
- More than 8,000 people across more than 65
offices in 28 countries - 2,262,000,000 annual revenue
- DLA Piper represents more than 140 of the top 250
Fortune 500 clients and nearly half of the FTSE
350 or their subsidiaries
5Overview of units of performance
Cars Maximum Speed Capacity (cc) BHP 0-60
time Cost
- Storage Systems
- Throughput
- Capacity (GB)
- IOPS
- Latency
- Cost
6Warning Maths!
7Storage - Mebibyte
- The megabyte (MB) unit is used in NetApp storage
as a measure of capacity and it is based on the
binary system - 220 or 1,048,576 bytes
Kilobyte 210 Megabyte 220 Gigabyte 230 Terabyte
240 Petabyte 250
Multiples of 1024
8Throughput
- Throughput is a measure of the total data moved
through a channel in a given time - Megabytes per second (MB/s) are the units used in
this presentation - Standard S.I. prefixes indicating multiplication
by 1,000 - 1,000 kilo
- 1,000,000 mega
- 1,000,000,000 giga
- Duplex communication is assumed between the
devices two way street
9Quoted Throughput
- It is important to understand the difference
throughput quoted - maximum theoretical achievable sustained
- peak measured
- good
- Guideline Figures But what type of measure?
10Moving Data
10Gb
10 Gb
5 Minutes
11Moving Data
- 10 Gb 10,240 MB
- 10,240 300
- 34 MB per sec moved
10Gb
10 Gb
5 Minutes
34 MB/s
12Moving Data
- 10 Gb 10,240 MB
- 10,240 300
- 34 MB per sec moved
10Gb
10 Gb
5 Minutes
34 or 36 MB/s?
But throughput is measured as MB/s
106 10,737,418,240 bytes 300 35,791,394 36
MB/s 288Mbps (8x for bits)
13Maths Error
- The classic mistake is to interchange throughput
MB/s with binary MB - 1,000,000 bytes vs 1,048,576 bytes
- 4.9 error margin
14Maths Error
- The classic mistake is to interchange throughput
MB/s with binary MB - 1,000,000 bytes vs 1,048,576 bytes
- 4.9 error margin
- Or worse, network megabits (Mb/s) when describing
megabytes - 131,072 vs 1,048,576
- 1/8 required value
- 87.5 error margin!
- Remember NASAs Mars Polar Lander? - 125m craft
destroyed after a mix-up between imperial and
metric measurements
15Input/Output Operations Per Second
Data
16IOPS
- The measurement is taken as
- Total number of IOPS
- Average number of random read IOPS
- Average number of random write IOPS
- Average number of sequential read IOPS
- Average number of sequential write IOPS
- Each IOP in a NetApp system is
- between 4 KB (4,096 bytes) and 128KB
- but can peak to 256KB
17Rotational latency
Time in milliseconds (ms) required to move the
head to the data Sequential vs Random
18Cost
- What would you do if it was your money?
19What performance should be expected
20Real World Performance
- Predicting the future and assumptions
- Guestimation for specification
- Multiplying averages in design
- Performance metrics of little value
- Beware FUD - Fear, Uncertainty, and Doubt
21iSCSI vs FCP
- A classic conversation point for block level
performance and a good illustration of FUD. Ask
the person next to you, which they think is the
best - There is no correct answer as it depends on the
other components which make up the system and
external factors such as cost.
Calm down, calm down!
22Design Requirement
- As fast as possible What else
23Case Study Design Requirement
- The application suppliers required
specifications for the SQL server based on
observed history using similar sized deployments. - 100 Gb of storage in the first year
- Then growth to 500Gb over the next 5 years
- At least 400 IOPS
- At least 425 MB/s throughput
- Latency not greater than 12 ms
24Theoretical performance characteristics of devices
Top Secret
- Competitive Advantage
- http//www.spec.org/
- http//en.wikipedia.org/wiki/List_of_device_bandwi
dths
25Graph of Disk Performance
Random 4KB Workload - Can Vary
120 and below for 10k 220 and below for 15k 50-60
and below for SATA.
26Case Study Options
Single Filer 3 Shelves 42 Disks
Dual Filers (Act./Act.) 3 Shelves - S/W
Ownership 21 Disks per Filer
27NetApp Storage Recapitulation
- All NetApp operations are made with 4Kb blocks
- Data write latency is host to filer memory
- Back to back CPs (CP generated CP)
- Aggregate is physical unit
- Only data disks affect performance
28Following Example Disk Configuration
1 Spare disk per filer 20 Disks in each
aggregate RAID Group Size of 16 with 2
groups RAID DP so a parity and double parity
disk per raid group
16 Data disks available in each aggregate
29Calculating Data Rates in Aggregates
- 16x 300Gb 15K FC data disks in the aggregate
- Each NetApp disk IOP is equal to either 4, 64 or
128 KB of storage - The throughput for each IOP/s (1024 x 4) 1000
4.096KB/s
30Calculating Data Rates in Aggregates
- 16x 300Gb 15K FC data disks in the aggregate
- Each NetApp disk IOP is equal to either 4, 64 or
128 KB of storage - The throughput for each IOP/s (1024 x 4) 1000
4.096KB/s - 16x data disks _at_ 220 IOPS 3,520 IOPS
- Each IOPS _at_ 4.096, 65.536 or 131.072 KB/s
31Calculating Data Rates in Aggregates
- 16x 300Gb 15K FC data disks in the aggregate
- Each NetApp disk IOP is equal to either 4, 64 or
128 KB of storage - The throughput for each IOP/s (1024 x 4) 1000
4.096KB/s - 16x data disks _at_ 220 IOPS 3,520 IOPS
- Each IOPS _at_ 4.096, 65.536 or 131.072 KB/s
- 3,520 x 4.096 14,418 KBps or 14 MB/s
- 3,520 x 65.536 230,686.72 KBps or 231 MB/s
- 3,520 x 131.072 461,373.44 KBps or 461 MB/s
32What performance is being delivered
?
33Stress test the design and get host based IO
reports
- A realistic simulation of the IO pattern of the
application can be created using 3rd party tools. - Two of the free tools available are
- Microsofts SQLIO.exe from www.microsoft.com/downl
oads - Intels IOMeter from www.iometer.org/
- IOMeter also has a GUI
34SQLIO.exe 8KB Sequential - Report
- 8 threads reading for 300 secs from file
E\SQLIOtest\sqlio_1v1f.dat - using 8KB sequential IOs
- using specified size 50000 MB for file
E\SQLIOtest\sqlio_1v1f.dat - CUMULATIVE DATA
- throughput metrics
- IOs/sec 6837.39
- MBs/sec 53.41
- latency metrics
- Min_Latency(ms) 0
- Avg_Latency(ms) 9
- Max_Latency(ms) 1449
- histogram
- ms 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 2324 - 59 3 2 2 2 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 14 -
35SQLIO.exe 8KB Random - Report
- 8 threads reading for 300 secs from file
E\SQLIOtest\sqlio_1v1f.dat - using 8KB random IOs
- using specified size 50000 MB for file
E\SQLIOtest\sqlio_1v1f.dat - CUMULATIVE DATA
- throughput metrics
- IOs/sec 3386.12
- MBs/sec 26.45
- latency metrics
- Min_Latency(ms) 0
- Avg_Latency(ms) 18
- Max_Latency(ms) 774
- histogram
- ms 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 - 5 0 1 2 3 5 6 6 5 5 4 4 4 3
3 3 3 3 2 2 2 2 2 2 26
36Filer Statistic
- Filer
- Sysstat
- Lun Status
- Statit
- FilerView
- Host
- Perfstat
- System Manager
- Operations Manager
- Others
37NetApp System Manager
38Statit Disks During 8k Stress Test
- disk ut xfers ureads--chain-usecs
writes--chain-usecs cpreads-chain-usecs - /aggr4/plex0/rg0
- 2c.16 1 3.59 0.69 1.00 6692
1.71 3.37 932 1.19 2.98 373 - 2c.17 2 4.14 0.69 1.00 15962
2.40 2.98 760 1.06 2.75 282 - 2c.18 95 205.80 203.85 1.98 9107
1.37 2.40 4256 0.58 2.09 7261 - 2c.19 93 202.93 201.56 1.98 9121
0.69 3.81 4051 0.69 2.42 3984 - 2c.20 95 206.12 204.91 1.98 8987
0.66 4.00 4890 0.55 2.00 7095 - 1b.21 94 207.78 206.54 1.98 9216
0.69 3.85 4980 0.55 2.67 5750 - 2c.22 93 203.09 201.77 1.98 8973
0.74 3.57 4180 0.58 2.23 6531 - 1b.32 95 208.44 206.81 1.99 9397
0.79 3.33 6350 0.84 1.78 5140 - 2c.33 95 209.55 207.78 1.98 9446
0.79 3.30 5768 0.98 2.08 6974 - 1b.34 94 203.64 202.08 1.98 9038
0.76 3.41 5677 0.79 1.90 6263 - 2c.35 95 207.94 206.44 1.98 9513
0.76 3.41 6566 0.74 2.11 7593 - 2c.36 95 208.60 207.10 1.98 9463
0.82 3.19 6636 0.69 2.54 5712 - 1b.37 96 212.27 210.97 1.98 9585
0.74 3.50 5500 0.55 1.81 9895 - 2c.38 95 208.68 207.17 1.98 9348
0.76 3.38 6143 0.74 2.00 6357 - 1b.48 95 207.17 206.01 1.98 9572
0.69 3.81 7949 0.47 2.06 7243 - 1b.49 94 206.09 204.83 1.98 9206
0.76 3.45 6440 0.50 2.00 5289 - /aggr4/plex0/rg1
39Statit Throughput
- The sum of
- (Read Xfers x (Chain length x IOP Size))
- (Write Xfers x (Chain length x IOP Size))
- (CP Read Xfers x (Chain length x IOP Size))
- For each data disk gives the throughput in KBps
40Measuring Data Rates
Read Xfers 203 Chains 1.98 Write Xfers
1.37 Chains 2.4 CP Read Xfers 0.58 Chains
2.09
CP Read Xfer / Chains
Write Xfer / Chains
Read Xfer / Chains
- disk ut xfers ureads--chain-usecs
writes--chain-usecs cpreads-chain-usecs - /aggr4/plex0/rg0
- 2c.16 1 3.59 0.69 1.00 6692
1.71 3.37 932 1.19 2.98 373 - 2c.17 2 4.14 0.69 1.00 15962
2.40 2.98 760 1.06 2.75 282 - 2c.18 95 205.80 203.85 1.98 9107
1.37 2.40 4256 0.58 2.09 7261 - 2c.19 93 202.93 201.56 1.98 9121
0.69 3.81 4051 0.69 2.42 3984 - 2c.20 95 206.12 204.91 1.98 8987
0.66 4.00 4890 0.55 2.00 7095 - 1b.21 94 207.78 206.54 1.98 9216
0.69 3.85 4980 0.55 2.67 5750 - 2c.22 93 203.09 201.77 1.98 8973
0.74 3.57 4180 0.58 2.23 6531 - 1b.32 95 208.44 206.81 1.99 9397
0.79 3.33 6350 0.84 1.78 5140 - 2c.33 95 209.55 207.78 1.98 9446
0.79 3.30 5768 0.98 2.08 6974 - 1b.34 94 203.64 202.08 1.98 9038
0.76 3.41 5677 0.79 1.90 6263 - 2c.35 95 207.94 206.44 1.98 9513
0.76 3.41 6566 0.74 2.11 7593 - 2c.36 95 208.60 207.10 1.98 9463
0.82 3.19 6636 0.69 2.54 5712 - 1b.37 96 212.27 210.97 1.98 9585
0.74 3.50 5500 0.55 1.81 9895 - 2c.38 95 208.68 207.17 1.98 9348
0.76 3.38 6143 0.74 2.00 6357 - 1b.48 95 207.17 206.01 1.98 9572
0.69 3.81 7949 0.47 2.06 7243 - 1b.49 94 206.09 204.83 1.98 9206
0.76 3.45 6440 0.50 2.00 5289 - /aggr4/plex0/rg1
41Measuring Data Rates
Read Xfers 203 Chains 1.98 Write Xfers
1.37 Chains 2.4 CP Read Xfers 0.58 Chains
2.09 (203 x (1.98 x 4.096)) (1.37 x (2.4 x
4.096)) (0.58 x (2.09 x 4.096)) Gives the
output per disk then multiply by 16 data disks
42Measuring Random Data Rates
Read Xfers 203 Chains 1.98 Write Xfers
1.37 Chains 2.4 CP Read Xfers 0.58 Chains
2.09 (203 x (1.98 x 4.096)) (1.37 x (2.4 x
4.096)) (0.58 x (2.09 x 4.096)) Gives the
output per disk then multiply by 16 data
disks Throughput 26,636 KBps or 26 MB/s
sqlio v1.5.SG throughput metrics IOs/sec
3386.12 MBs/sec 26.45
43Measuring Serial Data Rates
- Read Xfers 256 Chains 3.7
- Write Xfers 0.74 Chains 3.82
- CP Read Xfers 0.86 Chains 2.48
- (256 x (3.7 x 4.096)) (0.74 x (3.82 x 4.096))
(0.86 x (2.48 x 4.096)) - Gives the output per disk then multiply by 16
data disks - 62,400 KBps or 62 MB/s
sqlio v1.5.SG throughput metrics IOs/sec
6837.39 MBs/sec 53.41
NB System was not idle during the test
44Calculating Worse Case Data Rate
- 16x data disks _at_ 220 IOPS 3,520 IOPS
- Each IOPS _at_ 4.096KB/s
- 3,520 x 4.096 14,418 KBps or 14 MB/s
- A conservative numbers have been used for the
IOPS and no estimate of cache/chaining
advantage has been made. Performance can only
be higher if nothing has failed on the system.
45Case Study Observed Results
46SQL IOPs Types
- Sequential writes to Transaction Logs
- Random reads from the Data Files
47A Week of Live Data
48Production System
- Once the new SQL server entered production the
applications team began reporting performance
issues which they believed to be caused by the
storage. - The previous slide showed the typical performance
and throughput, which did not look like a storage
bottleneck. This issue was resolved with another
tool.
49SQL Server 2005 Waits and Queues
50Getting Help
- NetApp Forums and Communities sites
51QA
- Sorry time limit with 2nd presentation due
- Again the forums are a great asset
?