Title: Rules of Thumb in Data Engineering
1Rules of Thumb in Data Engineering
- Jim Gray
- International Conference on Data Engineering
- San Diego, CA
- 4 March 2000
- Gray_at_Microsoft.com, http//research.Microsoft.com/
Gray/Talks/
2Credits Thank You!!
- Prashant Shenoy U. Mass, Amherst analysis of web
caching rules. shenoy_at_cs.umass.edu - Terrance Kelly, U. Michigan,lots of advice on
fixing the paper, tpkelly_at_mynah.eecs.umich.edu
interesting work on caching at
http//ai.eecs.umich.edu/tpkelly/papers/wcp.pdf - Dave Lomet, Paul Larson, Surajit Chaudhurihow
big should database pages be? - Remzi Arpaci-Dusseau, Kim Keeton, Erik Riedel
discussions about balanced systems an IO - Windsor Hsu, Alan Smith, Honesty Young, also
studied TPC-C and balanced systems (very nice
work!) http//golem.cs.berkeley.edu/windsorh/DBC
har/ - Anastassia Ailamaki, Kim Keeton cpi measurements
- Gordon Bell discussions on balanced systems.
3and Apology..
- Printed/Published paper has MANY bugs!
- Conclusions OK (sort of ?), but typos, flaws,
errors, - Revised version at http//research.microsoft.com/
Gray/ and in CoRR and MS Research tech report
archive.By 15 March 2000. - Sorry!
Sorry!
Woops!
4Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
5Trends Moores Law
- Performance/Price doubles every 18 months
- 100x per decade
- Progress in next 18 months ALL previous
progress - New storage sum of all old storage (ever)
- New processing sum of all old processing.
- E. coli double ever 20 minutes!
15 years ago
6Trends ops/s/ Had Three Growth Phases
- 1890-1945
- Mechanical
- Relay
- 7-year doubling
- 1945-1985
- Tube, transistor,..
- 2.3 year doubling
- 1985-2000
- Microprocessor
- 1.0 year doubling
7Trends Gilders Law 3x bandwidth/year for 25
more years
- Today
- 10 Gbps per channel
- 4 channels per fiber 40 Gbps
- 32 fibers/bundle 1.2 Tbps/bundle
- In lab 3 Tbps/fiber (400 x WDM)
- In theory 25 Tbps per fiber
- 1 Tbps USA 1996 WAN bisection bandwidth
- Aggregate bandwidth doubles every 8 months!
1 fiber 25 Tbps
8Trends Magnetic Storage Densities
- Amazing progress
- Ratios have changed
- Capacity grows 60/y
- Access speed grows 10x more slowly
9Trends Density Limits
Density vs Time b/µm2 Gb/in2
Bit Density
- The end is near!
- Products11 GbpsiLab 35 Gbpsilimit
60 Gbpsi - Butlimit keeps rising there are alternatives
b/µm2 Gb/in2
? NEMS, Florescent? Holograpic, DNA?
3,000 2,000
1,000 600
300 200
SuperParmagnetic Limit
100 60
30 20
Wavelength Limit
ODD
10 6
DVD
3 2
CD
1 0.6
Figure adapted from Franco Vitaliano, The NEW
new media the growing attraction of nonmagnetic
storage, Data Storage, Feb 2000, pp 21-32,
www.datastorage.com
1990 1992 1994 1996 1998 2000 2002 2004
2006 2008
10Trends promises NEMS (Nano Electro Mechanical
Systems)(http//www.nanochip.com/) also
Cornell, IBM, CMU,
- 250 Gbpsi by using tunneling electronic
microscope - Disk replacement
- Capacity 180 GB now, 1.4 TB in 2 years
- Transfer rate 100 MB/sec RW
- Latency 0.5msec
- Power 23W active, .05W Standby
- 10k/TB now, 2k/TB in 2002
11Consequence of Moores lawNeed an address bit
every 18 months.
- Moores law gives you 2x more in 18 months.
- RAM
- Today we have 10 MB to 100 GB machines(24-36
bits of addressing) then - In 9 years we will need 6 more bits 30-42 bit
addressing (4TB ram). - Disks
- Today we have 10 GB to 100 TB file
systems/DBs(33-47 bit file addresses) - In 9 years, we will need 6 more bits40-53 bit
file addresses (100 PB files)
12Architecture could change this
- 1-level store
- System 48, AS400 has 1-level store.
- Never re-uses an address.
- Needs 96-bit addressing today.
- NUMAs and Clusters
- Willing to buy a 100 M computer?
- Then add 6 more address bits.
- Only 1-level store pushes us beyond 64-bits
- Still, these are logical addresses, 64-bit
physical will last many years
13Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
14Storage Latency How Far Away is the Data?
Andromeda
9
Tape /Optical
10
2,000 Years
Robot
6
Pluto
Disk
2 Years
10
1.5 hr
Olympia
Memory
100
This Hotel
10
10 min
On Board Cache
On Chip Cache
2
This Room
Registers
1
My Head
1 min
15Storage Hierarchy Speed Capacity vs Cost
Tradeoffs
Price vs Speed
Size vs Speed
Nearline
Cache
Tape
Offline
Main
Tape
Disc
Secondary
Online
Online
Secondary
/MB
Tape
Tape
Disc
Typical System (bytes)
Main
Offline
Nearline
Tape
Tape
Cache
-9
-6
-3
0
3
-9
-6
-3
0
3
10
10
10
10
10
10
10
10
10
10
Access Time (seconds)
Access Time (seconds)
16Disks Today
- Disk is 8GB to 80 GB10-30 MBps5k-15k rpm
(6ms-2ms rotational latency)12ms-7ms
seek7K/IDE-TB, 20k/SCSI-TB - For shared disks most time spent waiting in queue
for access to arm/controller
Wait
Transfer
Transfer
Rotate
Rotate
Seek
Seek
17Standard Storage Metrics
- Capacity
- RAM MB and /MB today at 512MB and 3/MB
- Disk GB and /GB today at 40GB and
20/GB - Tape TB and /TB today at 40GB and
10k/TB (nearline) - Access time (latency)
- RAM 100 ns
- Disk 15 ms
- Tape 30 second pick, 30 second position
- Transfer rate
- RAM 1-10 GB/s
- Disk 20-30 MB/s - - -Arrays can go to
10GB/s - Tape 5-15 MB/s - - - Arrays can go to
1GB/s
18New Storage Metrics Kaps, Maps, SCAN
- Kaps How many kilobyte objects served per second
- The file server, transaction processing metric
- This is the OLD metric.
- Maps How many megabyte objects served per sec
- The Multi-Media metric
- SCAN How long to scan all the data
- the data mining and utility metric
- And
- Kaps/, Maps/, TBscan/
19For the Record (good 1999 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Compaq/compaq.5500.99050701.es.pdf)
X 100
Tape is 1Tb with 4 DLT readers at 5MBps each.
20For the Record (good 1999 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Compaq/compaq.5500.99050701.es.pdf)
Tape is 1Tb with 4 DLT readers at 5MBps each.
21Disk Changes
- Disks got cheaper 20k -gt 1K (or even 200)
- /Kaps etc improved 100x (Moores law!) (or even
500x) - One-time event (went from mainframe prices to PC
prices) - Disk data got cooler (10x per decade)
- 1990 disk 1GB and 50Kaps and 5 minute scan
- 2000 disk 70GB and 120Kaps and 45 minute scan
- So
- 1990 1 Kaps per 20 MB
- 2000 1 Kaps per 500 MB
- disk scans take longer (10x per decade)
- Backup/restore takes a long time (too long)
22Storage Ratios Changed
- 10x better access time
- 10x more bandwidth
- 100x more capacity
- Data 25x cooler (1Kaps/20MB vs 1Kaps/500MB)
- 4,000x lower media price
- 20x to 100x lower disk price
- Scan takes 10x longer (3 min vs 45 min)
- DRAM/disk media price ratio changed
- 1970-1990 1001
- 1990-1995 101
- 1995-1997 501
- today 0.03/MB disk 1001
3/MB dram
23Data on Disk Can Move to RAM in 10 years
1001
10 years
24More Kaps and Kaps/ but.
- Disk accesses got much less expensive Better
disks Cheaper disks! - But disk arms are expensivethe scarce resource
- 45 minute Scanvs 5 minutes in 1990
25Disk vs Tape
- Disk
- 40 GB
- 20 MBps
- 5 ms seek time
- 3 ms rotate latency
- 7/GB for drive 3/GB for ctlrs/cabinet
- 4 TB/rack
- 1 hour scan
- Tape
- 40 GB
- 10 MBps
- 10 sec pick time
- 30-120 second seek time
- 2/GB for media8/GB for drivelibrary
- 10 TB/rack
- 1 week scan
Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 20 drives
The price advantage of tape is narrowing, and
the performance advantage of disk is growing At
10K/TB, disk is competitive with nearline tape.
26Caveat Tape vendors may innovate
- Sony DTF-2 is 100 GB, 24 MBps 30 second
pick time - So, 2x better
- Prices not clear
- http//bpgprod.sel.sony.com/DTF/seismic/dtf2.html
27Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
- At 1GBps it takes 12 days!
- Store it in two (or more) places online (on
disk?). A geo-plex - Scrub it continuously (look for errors)
- On failure,
- use other copy until failure repaired,
- refresh lost copy from safe copy.
- Can organize the two copies differently
(e.g. one by time, one by space)
28The Absurd 10x (5 year) Disk
- 2.5 hr scan time (poor sequential access)
- 1 aps / 5 GB (VERY cold data)
- Its a tape!
1 TB
100 MB/s
200 Kaps
29How to cool disk data
- Cache data in main memory
- See 5 minute rule later in presentation
- Fewer-larger transfers
- Larger pages (512-gt 8KB -gt 256KB)
- Sequential rather than random access
- Random 8KB IO is 1.5 MBps
- Sequential IO is 30 MBps (201 ratio is growing)
- Raid1 (mirroring) rather than Raid5 (parity).
30Stripes, Mirrors, Parity (RAID 0,1, 5)
- RAID 0 Stripes
- bandwidth
- RAID 1 Mirrors, Shadows,
- Fault tolerance
- Reads faster, writes 2x slower
- RAID 5 Parity
- Fault tolerance
- Reads faster
- Writes 4x or 6x slower.
0,3,6,..
1,4,7,..
2,5,8,..
0,1,2,..
0,1,2,..
0,2,P2,..
1,P1,4,..
P0,3,5,..
31RAID 10 (strips of mirrors) Winswastes space,
saves arms
- RAID 5 (6 disks 1 vol)
- Performance
- 675 reads/sec
- 210 writes/sec
- Write
- 4 logical IO,
- 2 seek 1.7 rotate
- SAVES SPACE
- Performance degrades on failure
- RAID1 (6 disks, 3 pairs)
- Performance
- 750 reads/sec
- 300 writes/sec
- Write
- 2 logical IO
- 2 seek 0.7 rotate
- SAVES ARMS
- Performance improves on failure
32Shows Best Page Index Page Size 16KB
33Auto Manage Storage
- 1980 rule of thumb
- A DataAdmin per 10GB, SysAdmin per mips
- 2000 rule of thumb
- A DataAdmin per 5TB
- SysAdmin per 100 clones (varies with app).
- Problem
- 5TB is 60k today, 10k in a few years.
- Admin cost gtgt storage cost !!!!
- Challenge
- Automate ALL storage admin tasks
34Summarizing storage rules of thumb (1)
- Moores law 4x every 3 years 100x more per
decade - Implies 2 bit of addressing every 3 years.
- Storage capacities increase 100x/decade
- Storage costs drop 100x per decade
- Storage throughput increases 10x/decade
- Data cools 10x/decade
- Disk page sizes increase 5x per decade.
35Summarizing storage rules of thumb (2)
- RAMDisk and DiskTape cost ratios are 1001
and 31 - So, in 10 years, disk data can move to RAM since
prices decline 100x per decade. - A person can administer a million dollars of disk
storage that is 1TB - 100TB today - Disks are replacing tapes as backup devices.You
cant backup/restore a Petabyte quicklyso
geoplex it. - Mirroring rather than Parity to save disk arms
36Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
37Standard Architecture (today)
38Amdahls Balance Laws
- parallelism law If a computation has a serial
part S and a parallel component P, then the
maximum speedup is (SP)/S. - balanced system law A system needs a bit of IO
per second per instruction per secondabout 8
MIPS per MBps. - memory law ?1 the MB/MIPS ratio (called alpha
(?)), in a balanced system is 1. - IO law Programs do one IO per 50,000
instructions.
39Amdahls Laws Valid 35 Years Later?
- Parallelism law is algebra so SURE!
- Balanced system laws?
- Look at tpc results (tpcC, tpcH) at
http//www.tpc.org/ - Some imagination needed
- Whats an instruction (CPI varies from 1-3)?
- RISC, CISC, VLIW, clocks per instruction,
- Whats an I/O?
40TPC systems
- Normalize for CPI (clocks per instruction)
- TPC-C has about 7 ins/byte of IO
- TPC-H has 3 ins/byte of IO
- TPC-H needs ½ as many disks, sequential vs random
- Both use 9GB 10 krpm disks (need arms, not bytes)
41TPC systems Whats alpha (MB/MIPS)?
- Hard to say
- Intel 32 bit addressing ( 4GB limit). Known CPI.
- IBM, HP, Sun have 64 GB limit. Unknown CPI.
- Look at both, guess CPI for IBM, HP, Sun
- Alpha is between 1 and 6
Mips Memory Alpha
Amdahl 1 1 1
tpcC Intel 8x262 2Gips 4GB 2
tpcH Intel 8x458 4Gips 4GB 1
tpcC IBM 24 cpus ? 12 Gips 64GB 6
tpcH HP 32 cpus ? 16 Gips 32 GB 2
42Instructions per IO?
- We know 8 mips per MBps of IO
- So, 8KB page is 64 K instructions
- And 64KB page is 512 K instructions.
- But, sequential has fewer instructions/byte. (3
vs 7 in tpcH vs tpcC). - So, 64KB page is 200 K instructions.
43Amdahls Balance Laws Revised
- Laws right, just need interpretation
(imagination?) - Balanced System Law A system needs 8
MIPS/MBpsIO, but instruction rate must be
measured on the workload. - Sequential workloads have low CPI (clocks per
instruction), - random workloads tend to have higher CPI.
- Alpha (the MB/MIPS ratio) is rising from 1 to 6.
This trend will likely continue. - One Random IOs per 50k instructions.
- Sequential IOs are larger One sequential IO per
200k instructions
44PAP vs RAP
- Peak Advertised Performance vs Real Application
Performance
45Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
46Standard IO (Infiniband) in 5 Years?
- Probably
- Replace PCI with something better will still
need a mezzanine bus standard - Multiple serial links directly from processor
- Fast (10 GBps/link) for a few meters
- System Area Networks (SANS) ubiquitous (VIA
morphs to SIO?)
47Ubiquitous 10 GBps SANs in 5 years
- 1Gbps Ethernet are reality now.
- Also FiberChannel ,MyriNet, GigaNet, ServerNet,,
ATM, - 10 Gbps x4 WDM deployed now (OC192)
- 3 Tbps WDM working in lab
- In 5 years, expect 10x, wow!!
1 GBps
120 MBps (1Gbps)
80 MBps
5 MBps
40 MBps
20 MBps
48Networking
- WANS are getting faster than LANSG8 OC192
8Gbps is standard - Link bandwidth improves 4x per 3 years
- Speed of light (60 ms round trip in US)
- Software stacks have always been the problem.
Time SenderCPU ReceiverCPU bytes/bandwidth
This has been the problem
49The Promise of SAN/VIA10x in 2 years
http//www.ViArch.org/
- Yesterday
- 10 MBps (100 Mbps Ethernet)
- 20 MBps tcp/ip saturates 2 cpus
- round-trip latency 250 µs
- Now
- Wires are 10x faster Myrinet, Gbps Ethernet,
ServerNet, - Fast user-level communication
- tcp/ip 100 MBps 10 cpu
- round-trip latency is 15 us
- 1.6 Gbps demoed on a WAN
50How much does wire-time cost?/Mbyte?
- Cost Time
- Gbps Ethernet .2µ 10 ms
- 100 Mbps Ethernet .3µ 100 ms
- OC12 (650 Mbps) .003 20 ms
- DSL .0006 25 sec
- POTs .002 200 sec
- Wireless .80 500 sec
51The Network Revolution
- Networking folks are finally streamlining LAN
case (SAN). - Offloading protocol to NIC
- ½ power point is 8KB
- Min round trip latency is 50 µs.
- 3k ins .1 ins/byte
- High-Performance Distributed Objects over a
System Area NetworkLi, L. Forin, A. Hunt, G.
Wang, Y. , MSR-TR-98-68
52Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
53The Five Minute Rule
- Trade DRAM for Disk Accesses
- Cost of an access (Drive_Cost /
Access_per_second) - Cost of a DRAM page ( /MB/ pages_per_MB)
- Break even has two terms
- Technology term and an Economic term
- Grew page size to compensate for changing ratios.
- Now at 5 minutes for random, 10 seconds sequential
54The 5 Minute Rule Derived
Disk Access Cost /T DiskPrice .
AccessesPerSecond
( )/T
Cost a RAM Page RAM__Per_MB
PagesPerMB
T TimeBetweenReferences to Page
- Breakeven
- RAM__Per_MB _____DiskPrice
. - PagesPerMB T x
AccessesPerSecond
- T DiskPrice x
PagesPerMB . - RAM__Per_MB x
AccessPerSecond
55Plugging in the Numbers
PPM/aps disk/Ram Break Even
Random 128/120 1 1000/3 300 5 minutes
Sequential 1/30 .03 300 10seconds
- Trend is longer times because disk not
changing much, RAM declining 100x/decade
5 Minutes 10 second rule
56When to Cache Web Pages.
- Caching saves user time
- Caching saves wire time
- Caching costs storage
- Caching only works sometimes
- New pages are a miss
- Stale pages are a miss
57The 10 Instruction Rule
- Spend 10 instructions /second to save 1 byte
- Cost of instruction I ProcessorCost/MIPSLi
feTime - Cost of byte B RAM__Per_B/LifeTime
- Breakeven NxI B N B/I (RAM__B X
MIPS)/ ProcessorCost (3E-6x5E8)/500 3
ins/B for Intel (3E-6x3E8)/10 10 ins/B for
ARM
58Web Page Caching Saves People Time
- Assume people cost 20/hour (or .2 /hr ???)
- Assume 20 hit in browser, 40 in proxy
- Assume 3 second server time
- Caching saves people time 28/year to 150/year
of people time or .28 cents to 1.5/year.
59Web Page Caching Saves Resources
- Wire cost is penny (wireless) to 100µ LAN
- Storage is 8 µ/mo
- Breakeven wire cost storage rent 4 to 7
months - Add people cost breakeven is 4 years.cheap
people (.2/hr) ? 6 to 8 months.
60Caching
- Disk caching
- 5 minute rule for random IO
- 11 second rule for sequential IO
- Web page caching
- If page will be re-referenced in 18 months
with free users 15 years with valuable
usersthen cache the page in the client/proxy. - Challenge guessing which pages will be
re-referenceddetecting stale pages (page
velocity)
61Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb