Title: Long Term Storage Trends and You
1Long Term Storage Trends and You
- Jim GrayMicrosoft Research
- 28 Sept 2006
storage bricks 200x
Illiac Disk 1968
Minoan Phaistos Disk1700 BC About 1KB No one can
read it
2The Abstract
- We are headed for a world of 10TB disk drives,
64GB flash cards, and a massive main memories.
This talk begins with an exploration of these
storage trends and how they impact storage heat - everything has to get colder,
- utilities have to be redesigned to deal with scan
times measured in days, and - massive replication is needed to mask failures.
- I assume we all agree that "tape is dead", so I
am robbed of that lunatic idea, but I am still
left with two crazy ideas - smart disks and
- the death of SAN.
- These contrarian ideas are related of course.
- The second half of the talk discusses the tape
postmortem and these two crazy ideas.
3The Reality
- This is an update of a 6-year old talk
- Rules of Thumb in Data Engineering
- Rules of Thumb in Data Engineering, pdf,
MSR-TR-99-100, 1999. Proc ICDE 2000, - talk.
- In light of 6 years change progress.
- brief note on some recent studies.
4Whats New / Surprising
- Not a big surprise just amazing!
- exponential growth in capacity
- latency lags bandwidth
- 5 minute rule is 30 minute rule
- FLASH is coming
- low end storage (GBs now 100 GBs soon)
- low latency storage (fraction of ms)
- high /byte but good /access
- Smart Disks still seem far of, but...
5To Blob or Not To Blob (½)
- Folklore
- DB is good for billions of small things
- Files are good for thousands of big things
- Put another way
- DB is bad at big objects
- Files Systems have trouble with billions of
files. - This is a fact, not a law of nature
- DB and FS could learn each others tricks.
- But what is big and small? Put another
way what is break-even size?
6To Blob or Not To Blob (2/2)
- Folklore BLOBS win for things less than 1MB.
- RefinementIf fragmentation, BLOBs win below
250KB. - Humor most files are less than 250KB. (but
most bytes are in big files). - To BLOB or Not To BLOB Large Object Storage in
a Database or a Filesystem? Russell Sears,
Catharine Van Ingen, Jim Gray, MSR-TR-2006-45,
April 2006
7How Reliable are Cheap Disks? (1/5)
- Prices, Specs, and Gurus suggestSCSI good SATA
bad. - 3x cheaper but
- 10x shorter MTTF
- 10x shorter warranty
- 100x higher Uncorrectable Error on Read (UER)
- Spec Sheet says 1 UER every 10 Terabytes!
- So, we measured and here is what we saw
8How Reliable are Cheap Disks? (2/5)
DISK DRIVE FAILURES
- Things fail much more often than predicted
- Vendors say 0.5 /year
- Customers see 10x that rate
- Vendors say
- 60 are no trouble found
- 30 are mis-handling (dropped/cooked/bent pins)
- 10 are real failures.
- Will UERs be worse than the specs?We need to
worry about ctlr, pci, ram, software,
9How Reliable are Cheap Disks? (3/5)
Observed failure rates. Observed failure rates. Observed failure rates. Observed failure rates. Observed failure rates. Observed failure rates.
System Type Part Years Fails Fails /Year
TerraServer SAN SCSI 10krpm 858 24 2.8
TerraServer SAN controllers 72 2 2.8
TerraServer SAN san switch 9 1 11.1
TerraServer Brick SATA 7krpm 138 10 7.2
Web Property 1 SCSI 10krpm 15,805 972 6.0
Web Property 1 controllers 900 139 15.4
Web Property 2 PATA 7krpm 22,400 740 3.3
Web Property 2 motherboard 3,769 66 1.7
Empirical Measurements of Disk Failure Rates and
Error Rates, Jim Gray, Catharine van Ingen,
MSR-TR-2005-166, December 2005
10How Reliable are Cheap Disks? (4/5)
- The experiment
- Do 180,000 times ( 1.8PB 1E16 bits)
- Create and write 10GB disk file
- Read it to check the checksum
- On various office systems for 4 months (8
drive years) - Expected 114 UER events, Observed 3 or 4
UER events - Two events corrected by OS on retry -- 1 real
one - no disk failures
- a file-system corruption (due to controller we
guess) - Many reboots due to security patches
- 4 system hangs (bad controllers / drivers).
- UER better than advertised (checked end-to-end)
- Empirical Measurements of Disk Failure Rates and
Error Rates, MSR-TR-2005-166
11Moral Design For Failure (5/5)
- Things break
- disks break
- controllers break
- systems break
- software breaks
- data centers break
- networks break
- Design for independent failure modes
- guard against operations errors
- guard against sympathetic failures
- guard against viruses
- Simple recovery is testable
- The cost of reliability is simplicity.Few are
willing to pay that price T. Hoare
12Its Hard to Archive a PetabyteIt takes a LONG
time to restore it.
- At 1GBps it takes 12 days!
- Store it in two (or more) places online. A
geo-plex - Scrub it continuously (look for errors)
- On failure,
- use other copy until failure repaired,
- refresh lost copy from safe copy.
- Can organize the two copies differently
(e.g. one by time, one by space)
13Why 4 copies
- duplex storage masks MOST failures
- But,.. when one is broken you are worried
- So, triplex it (a la GFS, Cosmos, Blue)
- And you need geo-plex anyway
- So, why not 22 rather than 33?
- Symmetric and simple good.
14Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
15Meta-Message Technology Ratios Matter
- Price and Performance change.
- If everything changes in the same way, then
nothing really changes. - If some things get much cheaper/faster than
others, then that is real change. - Some things are not changing much
- Cost of people
- Speed of light
-
- And some things are changing a LOT
16The Perfect Memory (ratio problems)
- Store name-value pairs
- Read value given name (or predicate?) instantly!
- Capacity has grown 2x/year (or 2x/2y)
- But ratios are changing
- Latency lags bandwidth (Patterson
http//portal.acm.org/citation.cfm?id1022596) - Bandwidth lags capacity
- Pipelining (prefetch) can hide latency
- No way to fake bandwidth you have to pay for
it!
17Find Useful Ways To waste Space
- 1 TB disks now
- 100TB disks in 10 years? (or.)
- Cost 1GB now, 10/TB in future
- Smart disks eventually (or now if you count xbox,
ipod, ) - Petabyte 1,400 disks now 140 disks in
2012 - Simple math
- 30M seconds/year,
- 1GBps 30 PB/y
- Find creative ways to waste 99 of capacity
but not use any bandwidth (ice cold data)
18Technology Trends
- 1 TB disks now
- 100TB disks in 10 years? (or.)
- Cost 1GB now, 10/TB in future
- Smart disks eventually (or now if you count xbox,
ipod, ) - Petabyte 1,400 disks now 300 disks in 2010
- Simple math
- 30M seconds/year,
- 1GBps 30 PB/y
19Technology Trend Implication
- Find creative ways to waste 99 of capacity
but not use any bandwidth (ice cold data) - replication
- snapshots
- archive
- Pipeline-Prefetch rewards
- sequential access patterns
- very large transfers
- large 1MB now,
- large 100MB in future
- Dataflow programming stream data to programs.
20Technology Trend Implication
- Q For an infinite disk, how long does it
take to - check disk (scrub)
- defragment
- reorganize
- backup
- A A LONG time
- Doing all four takes 4x longer
- Nightly/weekly ltlt 4xInfinity
- Short-term fix
- combine utility scans
- one pass algorithms.
- Van Ingen Where have all the IOPS gone?
MSR-TR-2005-181
21Bandwidth links and parallel links
- Today
- 40 Gbps per channel (?)
- 12 channels per fiber (wdm) 500 Gbps
- 32 fibers/bundle 16 Tbps/bundle
- In lab 20 Tbps/fiber (400 x WDM)
- 1 Tbps USA 1996 WAN bisection bandwidth
- Serial links are fast can be used in parallel
1 fiber 25 Tbps
22Free Storage like free puppies
- Storage is cheap (1k/TB)
- Storage management is not100K /TB /Year (or
less )opX gt 100 capX - Goal opX ltlt capX
23Trends Moores Law
- Performance/Price doubles every 18 months
- 100x per decade
- Progress in next 18 months ALL previous
progress - New storage sum of all old storage (ever)
- New processing sum of all old processing.
- E. coli double ever 20 minutes!
15 years ago
24Trends ops/s/ Had Three Growth Phases
- 1890-1945
- Mechanical
- Relay
- 7-year doubling
- 1945-1985
- Tube, transistor,..
- 2.3 year doubling
- 1985-2010
- Microprocessor
- 1.0 year doubling
25So a problem
- Suppose you have a ten-year compute job on the
worlds fastest supercomputer. What should you
do. - ? Commit 250M now?
- ? Program for 9 years Software speedup 26
64x Moores law speedup 26 64x so
4,000x speedup spend 1M (not 250M on
hardware) runs in 2 weeks, not 10 years. - Homework problem What is the optimum strategy?
26Storage Capacity Beating Moores Law
- 500/TB today (raw disk)
- 50/TB by 2010
- 2005 shipped 350M drives (28 increase over
2004) 0.1 Zeta Byte (!) -
27Trends Magnetic Storage Densities
- Amazing progress
- Ratios have changed
- ImprovementsCapacity 60/yBandwidth 40/yAcce
ss time 16/y
28Trends Density Limits
Bit Density
Density vs Time b/µm2 Gb/in2
- The end is near!
- In 2000 Products_at_23 GbpsiLab 50
Gbpsilimit 60 Gbpsi - Butlimit keeps rising there are alternatives
- Today Products _at_ 245 gbsilimit at 5 tbpsi
b/µm2 Gb/in2
3,000 2,000
? NEMS, Florescent? Holographic, DNA?
1,000 600
300 200
SuperParmagnetic Limit
100 60
30 20
Wavelength Limit
ODD
10 6
DVD
3 2
CD
1 0.6
1990 1992 1994 1996 1998 2000 2002 2004
2006 2008
Figure adapted from Franco Vitaliano, The NEW
new media the growing attraction of nonmagnetic
storage, Data Storage, Feb 2000, pp 21-32
29Consequence of Moores lawNeed an address bit
every 18 months.
- Moores law gives you 2x more in 18 months.
- RAM
- Today we have 1 GB to 1 TB machines(30-40 bits
of addressing) - In 9 years we will need 6 more bits 36-46 bit
addressing (64GB - 64TB ram). - Disks
- Today we have 10 GB to 10 TB files DBs(33-43
bit file addresses) - In 9 years, we will need 6 more bits40-50 bit
file addresses (1 PB files (! (?)))
30Architecture could change this
- 1-level store
- System 48, AS400 has 1-level store.
- Never re-uses an address.
- Needs 96-bit addressing today.
- NUMAs and Clusters
- Willing to buy a 100 M computer?
- Then add 6 more address bits.
- Only 1-level store pushes us beyond 64-bits
- Still, these are logical addresses, 64-bit
physical will last many years
31Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
32How much storage do we need?
Yotta Zetta Exa Peta Tera Giga Mega Kilo
- Soon everything can be recorded and indexed
- Most bytes will never be seen by humans.
- Data summarization, trend detection anomaly
detection are key technologies - See Mike Lesk How much information is there
http//www.lesk.com/mlesk/ksg97/ksg.html - See Lyman Varian
- How much information
- http//www.sims.berkeley.edu/research/projects/how
-much-info/
Everything! Recorded
All Books MultiMedia
All LoC books (words)
.Movie
A Photo
A Book
24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9
nano, 6 micro, 3 milli
33Storage Latency How Far Away is the Data?
Andromeda
9
Tape /Optical
10
2,000 Years
Robot
6
Pluto
Disk
2 Years
10
1.5 hr
Olympia
Memory
100
This Campus
10
10 min
On Board Cache
On Chip Cache
2
This Room
Registers
1
My Head
1 min
34Storage Hierarchy Speed Capacity vs Cost
Tradeoffs
Price vs Speed
Size vs Speed
Offline
Cache
Nearline
Main
Disc
Tape
Tape
Secondary
Online
Secondary
Online
/GB
Disc
Typical System (bytes)
Main
Nearline
Offline
Cache
-9
-6
-3
0
3
-9
-6
-3
0
3
10
10
10
10
10
10
10
10
10
10
Access Time (seconds)
Access Time (seconds)
35Disks Today
- Disk is 30GB to 1 TB10-80 MBps5k-15k rpm
(6ms-2ms rotational latency)10ms-3ms seek/TB
.5K/ATA, 1.2k/SCSI - For shared disks most time spent waiting in queue
for access to arm/controller
Wait
Transfer
Transfer
Rotate
Rotate
Seek
Seek
36The Street Price of a Raw disk TB about 1K/TB
37Standard Storage Metrics
- Capacity
- RAM MB and /MB today at 4GB and
100/GB - Disk GB and /GB today at 700GB and
500/TB - Tape TB and /TB today at 400GB and
300/TB (nearline) - Access time (latency)
- RAM 1100 ns
- Disk 515 ms
- Tape 30 second pick, 30 second position
- Transfer rate
- RAM 1-10 GB/s
- Disk 50 MB/s - - -Arrays can go to
1GB/s - Tape 50 MB/s - - - Arrays can go to
1GB/s
38New Storage Metrics Kaps, Maps, SCAN
- Kaps How many kilobyte objects served per second
- The file server, transaction processing metric
- This is the OLD metric.
- Maps How many megabyte objects served per sec
- The Multi-Media metric
- SCAN How long to scan all the data
- the data mining and utility metric
- And
- Kaps/, Maps/, TBscan/
39For the Record (good 2002 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Compaq/compaq.5500.99050701.es.pdf)
X 100
Tape slice is 8Tb with 1 LTO reader at 50MBps per
100 tapes.
40For the Record (good 2002 devices packaged in
systemhttp//www.tpc.org/results/individual_resul
ts/Compaq/compaq.5500.99050701.es.pdf)
Tape is 1Tb with 4 DLT readers at 5MBps each.
41Disk Changes
- Disks got cheaper 20k -gt 200
- /Kaps etc improved 100x (Moores law!) (or even
500x) - One-time event (went from mainframe prices to PC
prices) - Disks got cooler (50x per decade)
- 1990 1 Kaps per 20 MB (1GB disk)
- 2006 1 Kaps per 10,000 MB (.75TB disk)
- Disk scans take longer (10x per decade)
- 1990 disk 1GB and 50Kaps and 5 minute scan
- 2006 disk 750GB and 150Kaps and 5 hour scan
- So.. Backup/restore takes a long time (too long)
42Storage Ratios Changed
- 10x better access time
- 10x more bandwidth
- 100x more capacity
- Data 25x cooler (1Kaps/20MB vs 1Kaps/GB)
- 4,000x lower media price
- 20x to 100x lower disk price
- Scan takes 10x longer (3 min vs 1hr)
- RAM/disk media price ratio changed
- 1970-1990 1001
- 1990-1995 101
- 1995-1997 501
- 2006 0.5/GB disk 2001
100/GB ram
43More Kaps and Kaps/
- Disk accesses got much less expensive Better
disks Cheaper disks! - But disk arms are expensivethe scarce resource
- 5 hour Scanvs 5 minutes in 1990
Assumptions 15krpm, Dell TPC-C pricing for
scsi disks cabinets and controllers depreciated
over 3 years.
44Data on Disk Can Move to RAM in 10 years
1001
10 years
45The Absurd Disk Has Arrived
- 2.5 hr scan time (poor sequential access)
- 1 kaps / 10 GB (VERY cold data)
- Its a tape!
1 TB
100 MB/s
100 Kaps
46FLASH The Gap Filler?
- Flash chips are 4GB today cards 64GB.
- 20/GB
- 1/5 RAM price
- but 20x disk price, but 20x better kaps
- Predicted to double each year to Tbit
- doubled each year since 1997
- Will eat disk market from below
- cameras, ipods, then laptops then
- similar to cost/page or cost/first-page in
printers - Block-oriented read-write (2KB)
- 20MB/s per chip
- read 16 chips in parallel (64KB page, 320MB/s
- 125 µs latency on read (25 fixed, 100 transfer)
- Write has 2ms latency (clear the page)
- Pages can only be written 1M times
(approximately).
Year chip gbit Package GB
2006 16 4
2007 32 8
2008 64 16
2009 128 32
2010 256 64
2011 512 128
2012 1024 256
80 package
47Flash CERTAINLY Represents an Opportunity To
Rethink
- A Non-Volatile disk buffer (inside drive?)
- Low latency (100us) cache near cpu
- WAL Cache for Databases
- Quick restart
- FLASH is a block oriented deviceIt likes
read/write sequential It likes big (64KB
reads/writes)
A Design for High-Performance Flash
Disks Andrew Birrell Michael Isard Chuck
Thacker Ted Wobber December 2005,
MSR-TR-2005-176
48Disk vs Tape
- Tape
- 400 GB (80/cartrige)
- 40 MBps
- 10 sec pick time
- 30-120 second seek time
- 200/TB for media800/TB for drivelibrary
- 1 week scan
- Disk
- 750 GB
- 50 MBps
- 4 ms seek time
- 2 ms rotate latency
- 0.5 /GB for drive 0.5 /GB for ctlrs/cabinet
- 3.6 PB/rack
- 5 hour scan
Guestimates Cern 200 TB 3480 tapes 2 col
50GB Rack 1 TB 1.25 drives
The price advantage of tape is gone, and the
performance advantage of disk is growing At
1K/TB, disk is competitive with nearline tape.
49Auto Manage Storage
- 1980 rule of thumb
- A DataAdmin per 10GB, SysAdmin per mips
- 2006 rule of thumb
- A DataAdmin per 50TB (WITH GOOD TOOLS)
- Data Admin per ½ TB with crappy tools!
- SysAdmin per 100 clones (varies with app).
- Problem
- 5TB is gt5k today, 500 in a few years.
- Admin cost gtgt storage cost !!!!
- Challenge
- Automate ALL storage admin tasks
50How to cool disk data
- Cache data in main memory
- See 30 minute rule later in presentation
- Fewer-larger transfers
- Larger pages (512-gt 8KB -gt 256KB)
- Sequential rather than random access
- Random 8KB IO is 1 MBps
- Sequential IO is 60 MBps (601 ratio is growing)
- Raid1 (mirroring) rather than Raid5 (parity).
51Stripes, Mirrors, Parity (RAID 0,1, 5)
- RAID 0 Stripes
- bandwidth
- RAID 1 Mirrors, Shadows,
- Fault tolerance
- Reads faster, writes 2x slower
- RAID 5 Parity
- Fault tolerance
- Reads faster
- Writes 4x or 6x slower.
0,3,6,..
1,4,7,..
2,5,8,..
0,1,2,..
0,1,2,..
0,2,P2,..
1,P1,4,..
P0,3,5,..
52RAID 10 (strips of mirrors) Winswastes space,
saves arms
- RAID 5 (6 disks 1 vol)
- Performance
- 675 reads/sec
- 210 writes/sec
- Write
- 4 logical IO,
- 2 seek 1.7 rotate
- SAVES SPACE
- Performance degrades on failure
- RAID1 (6 disks, 3 pairs)
- Performance
- 750 reads/sec
- 300 writes/sec
- Write
- 2 logical IO
- 2 seek 0.7 rotate
- SAVES ARMS
- Performance improves on failure
53Best Index Page Size gt64KB
small page has few entries, so little benefit big
pages waste ram and bandwidth
Best near 100KB
54Summarizing storage rules of thumb (1)
- Moores law 4x every 3 years 100x more per
decade - Ratios change!!!
- Implies 2 bit of addressing every 3 years.
- Storage capacities increase 100x/decade
- Storage costs drop 100x per decade
- Storage throughput increases 10x/decade
- Data cools 10x/decade
- Disk page sizes increase 5x per decade.
55Summarizing storage rules of thumb (2)
- RAMDisk and DiskTape cost ratios are 1001
and 11 - Prices decline 100x per decade, so, in 10 years,
disk data can move to RAM. - A person should be able to administer a million
dollars of storage that is 1PB today - Disks are replacing tapes as backup devices.You
cant backup/restore a Petabyte quicklyso
geoplex it. - Mirroring rather than Parity to save disk arms
56Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
57Standard Architecture (today)
58Amdahls Balance Laws
- parallelism law If a computation has a serial
part S and a parallel component P, then the
maximum speedup is (SP)/S. - balanced system law A system needs a bit of IO
per second per instruction per secondabout 8
MIPS per MBps. - memory law ?1 the MB/MIPS ratio (called alpha
(?)), in a balanced system is 1. - IO law Programs do one IO per 50,000
instructions.
59Amdahls Laws Valid 40 Years Later?
- Parallelism law is algebra so SURE!
- Balanced system laws?
- Look at tpc results (tpcC, tpcH) at
http//www.tpc.org/ - Some imagination needed
- Whats an instruction (CPI varies from 1-3)?
- RISC, CISC, VLIW, clocks per instruction,
- Whats an I/O?
60TPC systems Disk/CPU and I/B
- Normalize for CPI (clocks per instruction)
- TPC-C has about 14 ins/byte of IO
- TPC-H has 1 ins/byte of IO
61TPC systems Whats alpha (MB/MIPS)?
- Hard to say
- Intel 32 bit addressing ( 4GB limit). Known CPI.
- IBM, HP, Sun have 64 GB limit. Unknown CPI.
- Look at both, guess CPI for IBM, HP, Sun
- Alpha is between 4 and 16
Mips Memory Alpha Disks/cpu
Amdahl 1 1 1 1
tpcC Intel 4x3Ghz 6Gips 24GB 4 25..100
tpcH Intel 4x2.4Ghz 10Gips 64GB 16 10..40
62Instructions per IO?
- We know 8 mips per MBps of IO
- So, 8KB page is 64 K instructions
- And 64KB page is 512 K instructions.
- But, sequential has fewer instructions/byte. (3
vs 7 in tpcH vs tpcC). - So, 64KB page is 200 K instructions.
63Amdahls Balance Laws Revised
- Laws right, just need interpretation
(imagination?) - Balanced System Law A system needs 8
MIPS/MBpsIO, but instruction rate must be
measured on the workload. - Sequential workloads have low CPI (clocks per
instruction), - random workloads tend to have higher CPI.
- Alpha (the MB/MIPS ratio) is rising from 1 to 16.
This trend will likely continue. - One Random IO per 50k instructions.
- Sequential IOs are larger One sequential IO per
200k instructions
64PAP vs RAP (a 2006 perspective)
- Peak Advertised Performance vs Real Application
Performance
65Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
66Standard IO (Infiniband) next Year?
- Probably
- Replace PCI with something better will still
need a mezzanine bus standard - Multiple serial links directly from processor
- Fast (10 GBps/link) for a few meters
- System Area Networks (SANS) ubiquitous (VIA
morphs to Infiniband?)
ie 2001
in 2006Inifiniband got marginalized by 10Gbps
Ethernet. It has low-latency, but that is a
niche. PCI-Express came along
67Ubiquitous 10 GBps SANs in 5 years
- 1Gbps Ethernet are reality now.
- Also FiberChannel ,MyriNet, GigaNet, ServerNet,,
ATM, - 10 Gbps x4 WDM deployed now (OC192)
- 3 Tbps WDM working in lab
- In 5 years, expect 10x, wow!!
1 GBps
120 MBps (1Gbps)
80 MBps
5 MBps
40 MBps
20 MBps
68Networking
- WANS are getting faster than LANSG8 OC192
9Gbps is standard - Link bandwidth improves 4x per 3 years
- Speed of light (60 ms round trip in US)
- Software stacks have always been the problem.
Time SenderCPU ReceiverCPU bytes/bandwidth
This has been the problem for small (10KB or
less) messages
69The Promise of SAN/VIA10x in 2 years
http//www.ViArch.org/
- Yesterday
- 10 MBps (100 Mbps Ethernet)
- 20 MBps tcp/ip saturates 2 cpus
- round-trip latency 250 µs
- Now
- Wires are 10x faster Myrinet, Gbps Ethernet,
ServerNet, - Fast user-level communication
- tcp/ip 100 MBps 10 cpu
- round-trip latency is 15 us
- 1.6 Gbps demoed on a WAN
70The Network Revolution
- Networking folks are finally streamlining LAN
case (SAN). - Offloading protocol to NIC
- ½ power point is 8KB
- Min round trip latency is 50 µs.
- 3k ins .1 ins/byte
- High-Performance Distributed Objects over a
System Area NetworkLi, L. Forin, A. Hunt, G.
Wang, Y. , MSR-TR-98-68
71How much does wire-time cost?/Mbyte?
- Cost Time
- Gbps Ethernet .2µ 10 ms
- 100 Mbps Ethernet .3µ 100 ms
- OC12 (650 Mbps) .003 20 ms
- DSL .0006 25 sec
- POTs .002 200 sec
- Wireless .80 500 sec
72Data delivery costs 1/GB today
- Rent for big customers 30/megabit per
second per month - Improved 3x in last 6 years (!).
- That translates to 0.1 /GB at each end.
- Overhead (routers, people,..) makes it 1/GB at
each end. - You can mail a 750 GB disk for 20.
- Thats 30x .. 3 x cheaper
- If overnight its 7 MBps.
- 7 disks 50 MBps (1/4 Gbps)
- TeraScale SneakerNet
7x750 GB 5 TB
73Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
74The Five Minute Rule
- Trade DRAM for Disk Accesses
- Cost of an access (Drive_Cost /
Access_per_second) - Cost of a DRAM page ( /MB/ pages_per_MB)
- Break even has two terms
- Technology term and an Economic term
- Grew page size to compensate for changing ratios.
- Now at 5 minutes for random, 10 seconds sequential
75The 5 Minute Rule Derived
Disk Access Cost /T DiskPrice .
AccessesPerSecond
( )/T
Cost a RAM Page RAM__Per_MB
PagesPerMB
T TimeBetweenReferences to Page
- Breakeven
- RAM__Per_MB _____DiskPrice
. - PagesPerMB T x
AccessesPerSecond
- T DiskPrice x
PagesPerMB . - RAM__Per_MB x
AccessPerSecond
76Plugging in the Numbers
PPM/aps disk/Ram Break Even
Random 128/120 1 200/0.1 2,000 28 minutes
Sequential 1/60 .01 2,000 30seconds
- Trend is longer times because disk not
changing much, RAM declining 100x/decade
30 Minutes 30 second rule
77When to Cache Web Pages.
- Caching saves user time
- Caching saves wire time
- Caching costs storage
- Caching only works sometimes
- New pages are a miss
- Stale pages are a miss
78Web Page Caching Saves People Time
- Assume people cost 20/hour (or .2 /hr ???)
- Assume 20 hit in browser, 40 in proxy
- Assume 3 second server time
- Caching saves people time 28/year to 150/year
of people time or .28 cents to 1.5/year.
79Web Page Caching Saves Resources
- Wire cost is penny (wireless) to 100µ LAN
- Storage is 8 µ/mo
- Breakeven wire cost storage rent 18 months
to 300 years - Add people cost breakeven gt15 years.cheap
people (.2/hr) ? gt3 years.
80Caching
- Disk caching
- 30 minute rule for random IO
- 30 second rule for sequential IO
- Web page caching
- If page will be re-referenced in 18 months
with free users 15 years with valuable
usersthen cache the page in the client/proxy. - Challenge guessing which pages will be
re-referenceddetecting stale pages (page
velocity)
81Meta-Message Technology Ratios Matter
- Price and Performance change.
- If everything changes in the same way, then
nothing really changes. - If some things get much cheaper/faster than
others, then that is real change. - Some things are not changing much
- Cost of people
- Speed of light
-
- And some things are changing a LOT
82Outline
- Moores Law and consequences
- Storage rules of thumb
- Balanced systems rules revisited
- Networking rules of thumb
- Caching rules of thumb
83Whats New / Surprising
- Not a big surprise just amazing!
- exponential growth in capacity
- latency lags bandwidth lags cpacity
- 5 minute rule is 30 minute rule
- FLASH is coming
- low end storage (GBs now 100 GBs soon)
- low latency storage (fraction of ms)
- high /byte but good /access
- Smart Disks still seem far of, but...