Title: Rules of Thumb in Data Engineering
1Rules of Thumb in Data Engineering
- Jim Gray
- UC Santa Cruz
- 7 May 2002
- Gray_at_Microsoft.com, http//research.Microsoft.com/
Gray/Talks/
2Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
3Meta-Message Technology Ratios Matter
- Price and Performance change.
- If everything changes in the same way, then
nothing really changes. - If some things get much cheaper/faster than
others, then that is real change. - Some things are not changing much
- Cost of people
- Speed of light
-
- And some things are changing a LOT
4Trends Moores Law
- Performance/Price doubles every 18 months
- 100x per decade
- Progress in next 18 months ALL previous
progress - New storage sum of all old storage (ever)
- New processing sum of all old processing.
- E. coli double ever 20 minutes!
15 years ago
5Trends ops/s/ Had Three Growth Phases
- 1890-1945
- Mechanical
- Relay
- 7-year doubling
- 1945-1985
- Tube, transistor,..
- 2.3 year doubling
- 1985-2000
- Microprocessor
- 1.0 year doubling
6So a problem
- Suppose you have a ten-year compute job on the
worlds fastest supercomputer. What should you
do. - ? Commit 250M now?
- ? Program for 9 years Software speedup 26
64x Moores law speedup 26 64x so
4,000x speedup spend 1M (not 250M on
hardware) runs in 2 weeks, not 10 years. - Homework problem What is the optimum strategy?
7Storage capacity beating Moores law
- 1 k/TB today (raw disk)
- 100/TB by end of 2007
-
8Trends Magnetic Storage Densities
- Amazing progress
- Ratios have changed
- ImprovementsCapacity 60/yBandwidth 40/yAcce
ss time 16/y
9Trends Density Limits
Density vs Time b/µm2 Gb/in2
Bit Density
- The end is near!
- Products23 GbpsiLab 50 Gbpsilimit
60 Gbpsi - Butlimit keeps rising there are alternatives
b/µm2 Gb/in2
? NEMS, Florescent? Holographic, DNA?
3,000 2,000
1,000 600
300 200
SuperParmagnetic Limit
100 60
30 20
Wavelength Limit
ODD
10 6
DVD
3 2
CD
1 0.6
Figure adapted from Franco Vitaliano, The NEW
new media the growing attraction of nonmagnetic
storage, Data Storage, Feb 2000, pp 21-32,
www.datastorage.com
1990 1992 1994 1996 1998 2000 2002 2004
2006 2008
10Trends promises NEMS (Nano Electro Mechanical
Systems)(http//www.nanochip.com/) also
Cornell, IBM, CMU,
- 250 Gbpsi by using tunneling electronic
microscope - Disk replacement
- Capacity 180 GB now, 1.4 TB in 2 years
- Transfer rate 100 MB/sec RW
- Latency 0.5msec
- Power 23W active, .05W Standby
- 10k/TB now, 2k/TB in 2004
11Consequence of Moores lawNeed an address bit
every 18 months.
- Moores law gives you 2x more in 18 months.
- RAM
- Today we have 10 MB to 100 GB machines(24-36
bits of addressing) then - In 9 years we will need 6 more bits 30-42 bit
addressing (4TB ram). - Disks
- Today we have 10 GB to 100 TB file
systems/DBs(33-47 bit file addresses) - In 9 years, we will need 6 more bits40-53 bit
file addresses (100 PB files)
12Architecture could change this
- 1-level store
- System 48, AS400 has 1-level store.
- Never re-uses an address.
- Needs 96-bit addressing today.
- NUMAs and Clusters
- Willing to buy a 100 M computer?
- Then add 6 more address bits.
- Only 1-level store pushes us beyond 64-bits
- Still, these are logical addresses, 64-bit
physical will last many years
13Trends Gilders Law 3x bandwidth/year for 25
more years
- Today
- 40 Gbps per channel (?)
- 12 channels per fiber (wdm) 500 Gbps
- 32 fibers/bundle 16 Tbps/bundle
- In lab 3 Tbps/fiber (400 x WDM)
- In theory 25 Tbps per fiber
- 1 Tbps USA 1996 WAN bisection bandwidth
- Aggregate bandwidth doubles every 8 months!
1 fiber 25 Tbps
14Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
15How much storage do we need?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
- Soon everything can be recorded and indexed
- Most bytes will never be seen by humans.
- Data summarization, trend detection anomaly
detection are key technologies - See Mike Lesk How much information is there
http//www.lesk.com/mlesk/ksg97/ksg.html - See Lyman Varian
- How much information
- http//www.sims.berkeley.edu/research/projects/how
-much-info/
Everything! Recorded
All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
16Storage Latency How Far Away is the Data?
Andromeda
9
Tape /Optical
10
2,000 Years
Robot
6
Pluto
Disk
2 Years
10
1.5 hr
Springfield
Memory
100
This Campus
10
10 min
On Board Cache
On Chip Cache
2
This Room
Registers
1
My Head
1 min
17Storage Hierarchy Speed Capacity vs Cost
Tradeoffs
Price vs Speed
Size vs Speed
Nearline
Cache
Tape
Offline
Main
Tape
Disc
Secondary
Online
Online
Secondary
/MB
Tape
Tape
Disc
Typical System (bytes)
Main
Offline
Nearline
Tape
Tape
Cache
-9
-6
-3
0
3
-9
-6
-3
0
3
10
10
10
10
10
10
10
10
10
10
Access Time (seconds)
Access Time (seconds)
18Disks Today
- Disk is 18GB to 180 GB10-50 MBps5k-15k rpm
(6ms-2ms rotational latency)12ms-7ms
seek1K/IDE-TB, 6k/SCSI-TB - For shared disks most time spent waiting in queue
for access to arm/controller
Wait
Transfer
Transfer
Rotate
Rotate
Seek
Seek
19The Street Price of a Raw disk TB about 1K/TB
20Standard Storage Metrics
- Capacity
- RAM MB and /MB today at 512MB and 200/GB
- Disk GB and /GB today at 80GB and
7k/TB - Tape TB and /TB today at 40GB and
7k/TB (nearline) - Access time (latency)
- RAM 1100 ns
- Disk 515 ms
- Tape 30 second pick, 30 second position
- Transfer rate
- RAM 1-10 GB/s
- Disk 10-50 MB/s - - -Arrays can go to
10GB/s - Tape 5-15 MB/s - - - Arrays can go to
1GB/s
21New Storage Metrics Kaps, Maps, SCAN
- Kaps How many kilobyte objects served per second
- The file server, transaction processing metric
- This is the OLD metric.
- Maps How many megabyte objects served per sec
- The Multi-Media metric
- SCAN How long to scan all the data
- the data mining and utility metric
- And
- Kaps/, Maps/, TBscan/
22For the Record (good 2002 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Compaq/compaq.5500.99050701.es.pdf)
X 100
Tape slice is 8Tb with 1 DLT reader at 6MBps per
100 tapes.
23For the Record (good 2002 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Compaq/compaq.5500.99050701.es.pdf)
Tape is 1Tb with 4 DLT readers at 5MBps each.
24Disk Changes
- Disks got cheaper 20k -gt 200
- /Kaps etc improved 100x (Moores law!) (or even
500x) - One-time event (went from mainframe prices to PC
prices) - Disks got cooler (50x in decade)
- 1990 1 Kaps per 20 MB
- 2002 1 Kaps per 1,000 MB
- Disk scans take longer (10x per decade)
- 1990 disk 1GB and 50Kaps and 5 minute scan
- 2002 disk 160GB and 160Kaps and 1 hour scan
- So.. Backup/restore takes a long time (too long)
25Storage Ratios Changed
- 10x better access time
- 10x more bandwidth
- 100x more capacity
- Data 25x cooler (1Kaps/20MB vs 1Kaps/GB)
- 4,000x lower media price
- 20x to 100x lower disk price
- Scan takes 10x longer (3 min vs 1hr)
- RAM/disk media price ratio changed
- 1970-1990 1001
- 1990-1995 101
- 1995-1997 501
- today 1/GB disk 2001
200/GB ram
26More Kaps and Kaps/ but.
- Disk accesses got much less expensive Better
disks Cheaper disks! - But disk arms are expensivethe scarce resource
- 1 hour Scanvs 5 minutes in 1990
27Data on Disk Can Move to RAM in 10 years
1001
10 years
28The Absurd 10x (4 year) Disk
- 2.5 hr scan time (poor sequential access)
- 1 aps / 5 GB (VERY cold data)
- Its a tape!
1 TB
100 MB/s
200 Kaps
29Disk vs Tape
- Disk
- 160 GB
- 40 MBps
- 4 ms seek time
- 2 ms rotate latency
- 1/GB for drive 1/GB for ctlrs/cabinet
- 60 TB/rack
- 1 hour scan
- Tape
- 80 GB
- 10 MBps
- 10 sec pick time
- 30-120 second seek time
- 2/GB for media5/GB for drivelibrary
- 20 TB/rack
- 1 week scan
Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 8 drives
The price advantage of tape is gone, and the
performance advantage of disk is growing At
10K/TB, disk is competitive with nearline tape.
30Caveat Tape vendors may innovate
- Sony DTF-2 is 100 GB, 24 MBps 30 second
pick time - So, 2x better
- Prices not clear
- http//bpgprod.sel.sony.com/DTF/seismic/dtf2.html
31Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
- At 1GBps it takes 12 days!
- Store it in two (or more) places online (on
disk?). A geo-plex - Scrub it continuously (look for errors)
- On failure,
- use other copy until failure repaired,
- refresh lost copy from safe copy.
- Can organize the two copies differently
(e.g. one by time, one by space)
32Auto Manage Storage
- 1980 rule of thumb
- A DataAdmin per 10GB, SysAdmin per mips
- 2002 rule of thumb
- A DataAdmin per 5TB
- SysAdmin per 100 clones (varies with app).
- Problem
- 5TB is gt5k today, 500 in a few years.
- Admin cost gtgt storage cost !!!!
- Challenge
- Automate ALL storage admin tasks
33How to cool disk data
- Cache data in main memory
- See 5 minute rule later in presentation
- Fewer-larger transfers
- Larger pages (512-gt 8KB -gt 256KB)
- Sequential rather than random access
- Random 8KB IO is 1.5 MBps
- Sequential IO is 30 MBps (201 ratio is growing)
- Raid1 (mirroring) rather than Raid5 (parity).
34Stripes, Mirrors, Parity (RAID 0,1, 5)
- RAID 0 Stripes
- bandwidth
- RAID 1 Mirrors, Shadows,
- Fault tolerance
- Reads faster, writes 2x slower
- RAID 5 Parity
- Fault tolerance
- Reads faster
- Writes 4x or 6x slower.
0,3,6,..
1,4,7,..
2,5,8,..
0,1,2,..
0,1,2,..
0,2,P2,..
1,P1,4,..
P0,3,5,..
35RAID 10 (strips of mirrors) Winswastes space,
saves arms
- RAID 5 (6 disks 1 vol)
- Performance
- 675 reads/sec
- 210 writes/sec
- Write
- 4 logical IO,
- 2 seek 1.7 rotate
- SAVES SPACE
- Performance degrades on failure
- RAID1 (6 disks, 3 pairs)
- Performance
- 750 reads/sec
- 300 writes/sec
- Write
- 2 logical IO
- 2 seek 0.7 rotate
- SAVES ARMS
- Performance improves on failure
36Shows Best Page Index Page Size 16KB
37Summarizing storage rules of thumb (1)
- Moores law 4x every 3 years 100x more per
decade - Implies 2 bit of addressing every 3 years.
- Storage capacities increase 100x/decade
- Storage costs drop 100x per decade
- Storage throughput increases 10x/decade
- Data cools 10x/decade
- Disk page sizes increase 5x per decade.
38Summarizing storage rules of thumb (2)
- RAMDisk and DiskTape cost ratios are 1001
and 11 - So, in 10 years, disk data can move to RAM since
prices decline 100x per decade. - A person can administer a million dollars of disk
storage that is 1TB - 100TB today - Disks are replacing tapes as backup devices.You
cant backup/restore a Petabyte quicklyso
geoplex it. - Mirroring rather than Parity to save disk arms
39Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
40Standard Architecture (today)
41Amdahls Balance Laws
- parallelism law If a computation has a serial
part S and a parallel component P, then the
maximum speedup is (SP)/S. - balanced system law A system needs a bit of IO
per second per instruction per secondabout 8
MIPS per MBps. - memory law ?1 the MB/MIPS ratio (called alpha
(?)), in a balanced system is 1. - IO law Programs do one IO per 50,000
instructions.
42Amdahls Laws Valid 35 Years Later?
- Parallelism law is algebra so SURE!
- Balanced system laws?
- Look at tpc results (tpcC, tpcH) at
http//www.tpc.org/ - Some imagination needed
- Whats an instruction (CPI varies from 1-3)?
- RISC, CISC, VLIW, clocks per instruction,
- Whats an I/O?
43TPC systems
- Normalize for CPI (clocks per instruction)
- TPC-C has about 7 ins/byte of IO
- TPC-H has 3 ins/byte of IO
- TPC-H needs ½ as many disks, sequential vs random
- Both use 9GB 10 krpm disks (need arms, not bytes)
44TPC systems Whats alpha (MB/MIPS)?
- Hard to say
- Intel 32 bit addressing ( 4GB limit). Known CPI.
- IBM, HP, Sun have 64 GB limit. Unknown CPI.
- Look at both, guess CPI for IBM, HP, Sun
- Alpha is between 1 and 6
Mips Memory Alpha
Amdahl 1 1 1
tpcC Intel 8x262 2Gips 4GB 2
tpcH Intel 8x458 4Gips 4GB 1
tpcC IBM 24 cpus ? 12 Gips 64GB 6
tpcH HP 32 cpus ? 16 Gips 32 GB 2
45Instructions per IO?
- We know 8 mips per MBps of IO
- So, 8KB page is 64 K instructions
- And 64KB page is 512 K instructions.
- But, sequential has fewer instructions/byte. (3
vs 7 in tpcH vs tpcC). - So, 64KB page is 200 K instructions.
46Amdahls Balance Laws Revised
- Laws right, just need interpretation
(imagination?) - Balanced System Law A system needs 8
MIPS/MBpsIO, but instruction rate must be
measured on the workload. - Sequential workloads have low CPI (clocks per
instruction), - random workloads tend to have higher CPI.
- Alpha (the MB/MIPS ratio) is rising from 1 to 6.
This trend will likely continue. - One Random IO per 50k instructions.
- Sequential IOs are larger One sequential IO per
200k instructions
47PAP vs RAP (a y2k perspective)
- Peak Advertised Performance vs Real Application
Performance
48Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
49Standard IO (Infiniband) next Year?
- Probably
- Replace PCI with something better will still
need a mezzanine bus standard - Multiple serial links directly from processor
- Fast (10 GBps/link) for a few meters
- System Area Networks (SANS) ubiquitous (VIA
morphs to Infiniband?)
50Ubiquitous 10 GBps SANs in 5 years
- 1Gbps Ethernet are reality now.
- Also FiberChannel ,MyriNet, GigaNet, ServerNet,,
ATM, - 10 Gbps x4 WDM deployed now (OC192)
- 3 Tbps WDM working in lab
- In 5 years, expect 10x, wow!!
1 GBps
120 MBps (1Gbps)
80 MBps
5 MBps
40 MBps
20 MBps
51Networking
- WANS are getting faster than LANSG8 OC192
9Gbps is standard - Link bandwidth improves 4x per 3 years
- Speed of light (60 ms round trip in US)
- Software stacks have always been the problem.
Time SenderCPU ReceiverCPU bytes/bandwidth
This has been the problem for small (10KB or
less) messages
52The Promise of SAN/VIA10x in 2 years
http//www.ViArch.org/
- Yesterday
- 10 MBps (100 Mbps Ethernet)
- 20 MBps tcp/ip saturates 2 cpus
- round-trip latency 250 µs
- Now
- Wires are 10x faster Myrinet, Gbps Ethernet,
ServerNet, - Fast user-level communication
- tcp/ip 100 MBps 10 cpu
- round-trip latency is 15 us
- 1.6 Gbps demoed on a WAN
53The Network Revolution
- Networking folks are finally streamlining LAN
case (SAN). - Offloading protocol to NIC
- ½ power point is 8KB
- Min round trip latency is 50 µs.
- 3k ins .1 ins/byte
- High-Performance Distributed Objects over a
System Area NetworkLi, L. Forin, A. Hunt, G.
Wang, Y. , MSR-TR-98-68
54How much does wire-time cost?/Mbyte?
- Cost Time
- Gbps Ethernet .2µ 10 ms
- 100 Mbps Ethernet .3µ 100 ms
- OC12 (650 Mbps) .003 20 ms
- DSL .0006 25 sec
- POTs .002 200 sec
- Wireless .80 500 sec
55Data delivery costs 1/GB today
- Rent for big customers 300/megabit per
second per month - Improved 3x in last 6 years (!).
- That translates to 1/GB at each end.
- Overhead (routers, people,..) makes it 6/GB at
each end. - You can mail a 160 GB disk for 20.
- Thats 16x cheaper
- If overnight its 4 MBps.
- 7 disks 30 MBps (1/4 Gbps)
- TeraScale SneakerNet
7x160 GB 1 TB
56Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
57The Five Minute Rule
- Trade DRAM for Disk Accesses
- Cost of an access (Drive_Cost /
Access_per_second) - Cost of a DRAM page ( /MB/ pages_per_MB)
- Break even has two terms
- Technology term and an Economic term
- Grew page size to compensate for changing ratios.
- Now at 5 minutes for random, 10 seconds sequential
58The 5 Minute Rule Derived
Disk Access Cost /T DiskPrice .
AccessesPerSecond
( )/T
Cost a RAM Page RAM__Per_MB
PagesPerMB
T TimeBetweenReferences to Page
- Breakeven
- RAM__Per_MB _____DiskPrice
. - PagesPerMB T x
AccessesPerSecond
- T DiskPrice x
PagesPerMB . - RAM__Per_MB x
AccessPerSecond
59Plugging in the Numbers
PPM/aps disk/Ram Break Even
Random 128/120 1 1000/3 300 5 minutes
Sequential 1/30 .03 300 10seconds
- Trend is longer times because disk not
changing much, RAM declining 100x/decade
5 Minutes 10 second rule
60The 10 Instruction Rule
- Spend 10 instructions /second to save 1 byte
- Cost of instruction I ProcessorCost/MIPSLi
feTime - Cost of byte B RAM__Per_B/LifeTime
- Breakeven NxI B N B/I (RAM__B X
MIPS)/ ProcessorCost (3E-6x5E8)/500 3
ins/B for Intel (3E-6x3E8)/10 10 ins/B for
ARM
61Trading Storage for Computation
- You can spend 10 bytes of RAM to save 1
instruction/second. - Rent for Disk 1/GB (forever)
- Processor costs 10 to 1,000/mips10 - 1,000
for 100 Tera Ops. - So 1/TeraOp (or a penny per TeraOp)
- 1 GB 1 Top 1 MB 1 Gop 1 KB 1 Mop
- Save a 1KB object on disk if it costs more than
10 ms to compute.
62When to Cache Web Pages.
- Caching saves user time
- Caching saves wire time
- Caching costs storage
- Caching only works sometimes
- New pages are a miss
- Stale pages are a miss
63Web Page Caching Saves People Time
- Assume people cost 20/hour (or .2 /hr ???)
- Assume 20 hit in browser, 40 in proxy
- Assume 3 second server time
- Caching saves people time 28/year to 150/year
of people time or .28 cents to 1.5/year.
64Web Page Caching Saves Resources
- Wire cost is penny (wireless) to 100µ LAN
- Storage is 8 µ/mo
- Breakeven wire cost storage rent 18 months
to 300 years - Add people cost breakeven gt15 years.cheap
people (.2/hr) ? gt3 years.
65Caching
- Disk caching
- 5 minute rule for random IO
- 10 second rule for sequential IO
- Web page caching
- If page will be re-referenced in 18 months
with free users 15 years with valuable
usersthen cache the page in the client/proxy. - Challenge guessing which pages will be
re-referenceddetecting stale pages (page
velocity)
66Meta-Message Technology Ratios Matter
- Price and Performance change.
- If everything changes in the same way, then
nothing really changes. - If some things get much cheaper/faster than
others, then that is real change. - Some things are not changing much
- Cost of people
- Speed of light
-
- And some things are changing a LOT
67Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb