CS252 Graduate Computer Architecture Lecture 22 IO Introduction - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

CS252 Graduate Computer Architecture Lecture 22 IO Introduction

Description:

Add extra information so that we can recover from errors ... accelerates disk downsizing: 8 inch to 5.25 inch. Mass market disk drives become a reality ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 55
Provided by: davidapa6
Category:

less

Transcript and Presenter's Notes

Title: CS252 Graduate Computer Architecture Lecture 22 IO Introduction


1
CS252Graduate Computer ArchitectureLecture
22I/O Introduction
  • November 16th, 2003
  • Prof. John Kubiatowicz
  • http//www.cs.berkeley.edu/kubitron/courses/cs252
    -F03

2
Review Error Correction
  • Motivation
  • DRAM is dense ?Signals are easily disturbed
  • High Capacity ? higher probability of failure
  • Approach Redundancy
  • Add extra information so that we can recover from
    errors
  • Can we do better than just create complete
    copies?
  • Block Codes Data Coded in blocks
  • k data bits coded into n encoded bits
  • Measure of overhead Rate of Code K/N
  • Often called an (n,k) code
  • Consider data as vectors in GF(2) i.e. vectors
    of bits
  • Code Space is set of all 2n vectors, Data space
    set of 2k vectors
  • Encoding function Cf(d)
  • Decoding function df(C)
  • Not all possible code vectors, C, are valid!
  • Systematic Codes original data appears within
    coded data

3
General IdeaCode Vector Space
  • Not every vector in the code space is valid
  • Hamming Distance (d)
  • Minimum number of bit flips to turn one code word
    into another
  • Number of errors that we can detect (d-1)
  • Number of errors that we can fix ½(d-1)

4
Review Code Types
  • Linear CodesCode is generated by G and in
    null-space of H
  • Hamming Codes Design the H matrix
  • d 3 ? Columns nonzero, Distinct
  • d 4 ? Columns nonzero, Distinct, Odd-weight
  • Reed-solomon codes
  • Based on polynomials in GF(2k) (I.e. k-bit
    symbols)
  • Data as coefficients, code space as values of
    polynomial
  • P(x)a0a1x1 ak-1xk-1
  • Coded P(0),P(1),P(2).,P(n-1)
  • Can recover polynomial as long as get any k of n
  • Alternatively as long as no more than n-k coded
    symbols erased, can recover data.
  • Side note Multiplication by constant in GF(2k)
    can be represented by k?k matrix a?x
  • Decompose unknown vector into k bits
    xx02x12k-1xk-1
  • Each column is result of multiplying a by 2i

5
Motivation Who Cares About I/O?
  • CPU Performance 60 per year
  • I/O system performance limited by mechanical
    delays (disk I/O)
  • lt 10 per year (IO per sec or MB per sec)
  • Amdahl's Law system speed-up limited by the
    slowest part!
  • 10 IO 10x CPU gt 5x Performance (lose
    50)
  • 10 IO 100x CPU gt 10x Performance (lose 90)
  • I/O bottleneck
  • Diminishing fraction of time in CPU
  • Diminishing value of faster CPUs

6
I/O Systems
interrupts
Processor
Cache
Memory - I/O Bus
Main Memory
I/O Controller
I/O Controller
I/O Controller
Graphics
Disk
Disk
Network
7
Technology Trends
Disk Capacity now doubles every 18
months before 1990 every 36 motnhs
Today Processing Power Doubles Every 18
months  Today Memory Size Doubles Every 18
months(4X/3yr)  Today Disk Capacity Doubles
Every 18 months  Disk Positioning Rate (Seek
Rotate) Doubles Every Ten Years!
The I/O GAP
8
Storage Technology Drivers
  • Driven by the prevailing computing paradigm
  • 1950s migration from batch to on-line processing
  • 1990s migration to ubiquitous computing
  • computers in phones, books, cars, video cameras,
  • nationwide fiber optical network with wireless
    tails
  • Effects on storage industry
  • Embedded storage
  • smaller, cheaper, more reliable, lower power
  • Data utilities
  • high capacity, hierarchically managed storage

9
Historical Perspective
  • 1956 IBM Ramac early 1970s Winchester
  • Developed for mainframe computers, proprietary
    interfaces
  • Steady shrink in form factor 27 in. to 14 in.
  • 1970s developments
  • 5.25 inch floppy disk formfactor (microcode into
    mainframe)
  • early emergence of industry standard disk
    interfaces
  • ST506, SASI, SMD, ESDI
  • Early 1980s
  • PCs and first generation workstations
  • Mid 1980s
  • Client/server computing
  • Centralized storage on file server
  • accelerates disk downsizing 8 inch to 5.25 inch
  • Mass market disk drives become a reality
  • industry standards SCSI, IPI, IDE
  • 5.25 inch drives for standalone PCs, End of
    proprietary interfaces

10
Disk History
Data density Mbit/sq. in.
Capacity of Unit Shown Megabytes
1973 1. 7 Mbit/sq. in 140 MBytes
1979 7. 7 Mbit/sq. in 2,300 MBytes
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even mroe data into
even smaller spaces
11
Historical Perspective
  • Late 1980s/Early 1990s
  • Laptops, notebooks, (palmtops)
  • 3.5 inch, 2.5 inch, (1.8 inch formfactors)
  • Formfactor plus capacity drives market, not so
    much performance
  • Recently Bandwidth improving at 40/ year
  • Challenged by DRAM, flash RAM in PCMCIA cards
  • still expensive, Intel promises but doesnt
    deliver
  • unattractive MBytes per cubic inch
  • Optical disk fails on performace (e.g., NEXT) but
    finds niche (CD ROM)

12
Disk History
1989 63 Mbit/sq. in 60,000 MBytes
1997 1450 Mbit/sq. in 2300 MBytes
1997 3090 Mbit/sq. in 8100 MBytes
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even mroe data into
even smaller spaces
13
MBits per square inch DRAM as of Disk over
time
9 v. 22 Mb/si
470 v. 3000 Mb/si
0.2 v. 1.7 Mb/si
source New York Times, 2/23/98, page C3,
Makers of disk drives crowd even mroe data into
even smaller spaces
14
Disk Performance Model /Trends
  • Capacity
  • 100/year (2X / 1.0 yrs)
  • Transfer rate (BW)
  • 40/year (2X / 2.0 yrs)
  • Rotation Seek time
  • 8/ year (1/2 in 10 yrs)
  • MB/
  • gt 100/year (2X / lt1.5 yrs)
  • Fewer chips areal density

15
Photo of Disk Head, Arm, Actuator
Spindle
Arm
Head
Actuator
16
Nano-layered Disk Heads
  • Special sensitivity of Disk head comes from
    Giant Magneto-Resistive effect or (GMR)
  • IBM is leader in this technology
  • Same technology as TMJ-RAM breakthrough we
    described in earlier class.

Coil for writing
17
Disk Device Terminology
  • Several platters, with information recorded
    magnetically on both surfaces (usually)
  • Bits recorded in tracks, which in turn divided
    into sectors (e.g., 512 Bytes)
  • Actuator moves head (end of arm,1/surface) over
    track (seek), select surface, wait for sector
    rotate under head, then read or write
  • Cylinder all tracks under heads

18
Disk Performance Example
Disk Latency Queuing Time Seek Time
Rotation Time Xfer Time Ctrl Time
Order of magnitude times for 4K byte transfers
Seek 12 ms or less Rotate 4.2 ms _at_ 7200 rpm
0.5 rev/(7200 rpm/60m/s) (8.3 ms _at_
3600 rpm ) Xfer 1 ms _at_ 7200 rpm (2 ms _at_ 3600
rpm) Ctrl 2 ms (big variation)
Disk Latency Queuing Time (12 4.2 1
2)ms QT 19.2ms Average Service Time 19.2 ms
19
Disk Time Example
  • Disk Parameters
  • Transfer size is 8K bytes
  • Advertised average seek is 12 ms
  • Disk spins at 7200 RPM
  • Transfer rate is 4 MB/sec
  • Controller overhead is 2 ms
  • Assume that disk is idle so no queuing delay
  • What is Average Disk Access Time for a Sector?
  • Ave seek ave rot delay transfer time
    controller overhead
  • 12 ms 0.5/(7200 RPM/60) 8 KB/4 MB/s 2 ms
  • 12 4.15 2 2 20 ms
  • Advertised seek time assumes no locality
    typically 1/4 to 1/3 advertised seek time 20 ms
    gt 12 ms

20
State of the Art Ultrastar 72ZX
  • 73.4 GB, 3.5 inch disk
  • 2/MB
  • 10,000 RPM 3 ms 1/2 rotation
  • 11 platters, 22 surfaces
  • 15,110 cylinders
  • 7 Gbit/sq. in. areal den
  • 17 watts (idle)
  • 0.1 ms controller time
  • 5.3 ms avg. seek
  • 50 to 29 MB/s(internal)

Track
Sector
Cylinder
Track Buffer
Platter
Arm
Head
source www.ibm.com www.pricewatch.com 2/14/00
21
CS 252 Administrivia
  • Upcoming schedule of project events in CS 252
  • People who have not talked to me about their
    projects should do so soon!
  • Oral Presentations Tue/Wed 12/9-10
  • Final projects due Friday 12/12?
  • Final topics
  • Some queuing theory
  • Multiprocessing
  • Final Midterm 12/1 (Monday after thanksgiving)
  • Alternative 12/3?

22
Alternative Data Storage Technologies Early 1990s
  • Cap BPI TPI BPITPI Data Xfer Access
  • Technology (MB) (Million) (KByte/s) Time
  • Conventional Tape
  • Cartridge (.25") 150 12000 104 1.2
    92 minutes
  • IBM 3490 (.5") 800 22860 38 0.9 3000 seconds
  • Helical Scan Tape
  • Video (8mm) 4600 43200 1638 71 492 45 secs
  • DAT (4mm) 1300 61000 1870 114 183 20 secs
  • Magnetic Optical Disk
  • Hard Disk (5.25") 1200 33528 1880 63 3000 18
    ms
  • IBM 3390 (10.5") 3800 27940 2235 62 4250 20 ms
  • Sony MO (5.25") 640 24130 18796 454 88 100 ms

23
Tape vs. Disk
  • Longitudinal tape uses same technology as
  • hard disk tracks its density improvements
  • Disk head flies above surface, tape head lies on
    surface
  • Disk fixed, tape removable
  • Inherent cost-performance based on geometries
  • fixed rotating platters with gaps
  • (random access, limited area, 1 media /
    reader)
  • vs.
  • removable long strips wound on spool
  • (sequential access, "unlimited" length,
    multiple / reader)
  • New technology trend
  • Helical Scan (VCR, Camcoder, DAT)
  • Spins head at angle to tape to improve
    density

24
Current Drawbacks to Tape
  • Tape wear out
  • Helical 100s of passes to 1000s for longitudinal
  • Head wear out
  • 2000 hours for helical
  • Both must be accounted for in economic /
    reliability model
  • Long rewind, eject, load, spin-up times not
    inherent, just no need in marketplace (so far)
  • Designed for archival

25
Automated Cartridge System
8 feet
STC 4400
10 feet
  • 6000 x 0.8 GB 3490 tapes 5 TBytes in 1992
    500,000 O.E.M. Price
  • 6000 x 10 GB D3 tapes 60 TBytes in 1998

  • Library of Congress all information in the
    world in 1992, ASCII of all books 30 TB

26
Relative Cost of Storage TechnologyLate
1995/Early 1996
  • Magnetic Disks
  • 5.25 9.1 GB 2129 0.23/MB 1985 0.22/M
    B
  • 3.5 4.3 GB 1199 0.27/MB 999 0.23/MB
  • 2.5 514 MB 299 0.58/MB 1.1
    GB 345 0.33/MB
  • Optical Disks
  • 5.25 4.6 GB 1695199 0.41/MB 1499189
    0.39/MB
  • PCMCIA Cards
  • Static RAM 4.0 MB 700 175/MB
  • Flash RAM 40.0 MB 1300 32/MB
  • 175 MB 3600 20.50/MB

27
Manufacturing Advantages
of Disk Arrays
Disk Product Families
Conventional 4 disk designs
14
10
5.25
3.5
High End
Low End
Disk Array 1 disk design
3.5
28
Replace Small of Large Disks with Large of
Small Disks! (1988 Disks)
IBM 3390 (K) 20 GBytes 97 cu. ft. 3 KW 15
MB/s 600 I/Os/s 250 KHrs 250K
IBM 3.5" 0061 320 MBytes 0.1 cu. ft. 11 W 1.5
MB/s 55 I/Os/s 50 KHrs 2K
x70 23 GBytes 11 cu. ft. 1 KW 120 MB/s 3900
IOs/s ??? Hrs 150K
Data Capacity Volume Power Data Rate I/O
Rate MTTF Cost
large data and I/O rates high MB per cu. ft.,
high MB per KW reliability?
Disk Arrays have potential for
29
Array Reliability
  • Reliability of N disks Reliability of 1 Disk
    N
  • 50,000 Hours 70 disks 700 hours
  • Disk system MTTF Drops from 6 years to 1
    month!
  • Arrays (without redundancy) too unreliable to
    be useful!

Hot spares support reconstruction in parallel
with access very high media availability can be
achieved
30
Redundant Arrays of Disks
Files are "striped" across multiple
spindles  Redundancy yields high data
availability
Disks will fail Contents reconstructed from data
redundantly stored in the array
Capacity penalty to store it Bandwidth penalty
to update
Mirroring/Shadowing (high capacity
cost) Horizontal Hamming Codes
(overkill) Parity Reed-Solomon Codes Failure
Prediction (no capacity overhead!) VaxSimPlus
Technique is controversial
Techniques
31
Redundant Arrays of DisksRAID 1 Disk
Mirroring/Shadowing
recovery group
 Each disk is fully duplicated onto its
"shadow" Very high availability can be
achieved Bandwidth sacrifice on write
Logical write two physical writes Reads may
be optimized Most expensive solution 100
capacity overhead
Targeted for high I/O rate , high availability
environments
32
Redundant Arrays of Disks RAID 3 Parity Disk
10010011 11001101 10010011 . . .
P
logical record
1 0 0 1 0 0 1 1
1 1 0 0 1 1 0 1
1 0 0 1 0 0 1 1
0 0 1 1 0 0 0 0
Striped physical records
Parity computed across recovery group to
protect against hard disk failures 33
capacity cost for parity in this configuration
wider arrays reduce capacity costs, decrease
expected availability, increase
reconstruction time Arms logically
synchronized, spindles rotationally synchronized
logically a single high capacity, high
transfer rate disk
Targeted for high bandwidth applications
Scientific, Image Processing
33
Redundant Arrays of Disks RAID 5 High I/O Rate
Parity
Increasing Logical Disk Addresses
D0
D1
D2
D3
P
A logical write becomes four physical
I/Os Independent writes possible because
of interleaved parity Reed-Solomon Codes ("Q")
for protection during reconstruction
D4
D5
D6
P
D7
D8
D9
P
D10
D11
D12
P
D13
D14
D15
Stripe
P
D16
D17
D18
D19
Targeted for mixed applications
Stripe Unit
D20
D21
D22
D23
P
. . .
. . .
. . .
. . .
. . .
Disk Columns
34
Problems of Disk Arrays Small Writes
RAID-5 Small Write Algorithm
1 Logical Write 2 Physical Reads 2 Physical
Writes
D0
D1
D2
D3
D0'
P
old data
new data
old parity
(1. Read)
(2. Read)
XOR


XOR
(3. Write)
(4. Write)
D0'
D1
D2
D3
P'
35
Subsystem Organization
array controller
host
single board disk controller
host adapter
manages interface to host, DMA
single board disk controller
control, buffering, parity logic
single board disk controller
physical device control
single board disk controller
striping software off-loaded from host to array
controller no applications modifications no
reduction of host performance
often piggy-backed in small format devices
36
System Availability Orthogonal RAIDs
Array Controller
String Controller
. . .
String Controller
. . .
String Controller
. . .
String Controller
. . .
String Controller
. . .
String Controller
. . .
Data Recovery Group unit of data redundancy
Redundant Support Components fans, power
supplies, controller, cables
End to End Data Integrity internal parity
protected data paths
37
System-Level Availability
host
host
Fully dual redundant
I/O Controller
I/O Controller
Array Controller
Array Controller
. . .
. . .
. . .
Goal No Single Points of Failure
. . .
. . .
. . .
with duplicated paths, higher performance can
be obtained when there are no failures
Recovery Group
38
OceanStoreGlobal Scale Persistent Storage
  • Global-Scale Persistent Storage

39
Utility-based Infrastructure
Canadian OceanStore
Sprint
ATT
IBM
Pac Bell
IBM
  • Service provided by confederation of companies
  • Monthly fee paid to one service provider
  • Companies buy and sell capacity from each other

40
Important P2P Technology(Decentralized Object
Location and Routing)
DOLR
41
The Path of an OceanStore Update
42
Archival Disseminationof Fragments
43
Aside Why erasure coding?High
Durability/overhead ratio!
Fraction Blocks Lost Per Year (FBLPY)
  • Exploit law of large numbers for durability!
  • 6 month repair, FBLPY
  • Replication 0.03
  • Fragmentation 10-35

44
The Dissemination ProcessAchieving Failure
Independence
  • Anti-correlation
  • Analysis/Models
  • Mutual Information
  • Human Input
  • Data Mining

Independent Set Generation
Effective Dissemination
45
The Berkeley PetaByte Archival Service
  • OceanStore Concepts Applied to Tape-less backup
  • Self-Replicating, Self-Repairing, Self-Managing
  • No need for actual Tape in system
  • (Although could be there to keep with tradition)

46
But What about queue time?Or why nonlinear
response
Response Time (ms)
300
Metrics Response Time Throughput latency
goes as Tseru/(1-u) u utilization
200
100
0
100
0
Throughput (Utilization) ( total BW)
Response time Queue Device Service time
47
Introduction to Queueing Theory
Black Box Queueing System
Arrivals
Departures
  • Queueing Theory applies to long term, steady
    state behavior ? Arrival rate Departure rate
  • Littles Law Mean number tasks in system
    arrival rate x mean reponse time
  • Observed by many, Little was first to prove
  • Simple interpretation you should see the same
    number of tasks in queue when entering as when
    leaving.
  • Applies to any system in equilibrium, as long as
    nothing in black box is creating or destroying
    tasks

48
A Little Queuing Theory Use of random
distributions
  • Server spends a variable amount of time with
    customers
  • Weighted mean m1 (f1 x T1 f2 x T2 ... fn x
    Tn)/F ? p(T)xT
  • ?2 (f1 x T12 f2 x T22 ... fn x Tn2)/F m12
    ? p(T)xT2 - m12
  • Squared coefficient of variance C ?2/m12
  • Unitless measure (100 ms2 vs. 0.1 s2)
  • Exponential distribution C 1 most short
    relative to average, few others long 90 lt 2.3 x
    average, 63 lt averageHypoexponential
    distribution C lt 1 most close to average,
    C0.5 gt 90 lt 2.0 x average, only 57 lt
    averageHyperexponential distribution C gt 1
    further from average C2.0 gt 90 lt 2.8 x
    average, 69 lt average

49
A Little Queuing Theory Variable Service Time
  • Disk response times C ? 1.5 (majority seeks lt
    average)
  • Yet usually pick C 1.0 for simplicity
  • Memoryless, exponential dist
  • Many complex systems well described by
    memoryless distribution!
  • Another useful value is average time must wait
    for server to complete current task m1(z)
  • Called Average Residual Wait Time
  • Not just 1/2 x m1 because doesnt capture
    variance
  • Can derive m1(z) 1/2 x m1 x (1 C)
  • No variance ? C 0 gt m1(z) 1/2 x m1
  • Exponential ? C 1 gt m1(z) m1

50
A Little Queuing Theory Average Wait Time
  • Calculating average wait time in queue Tq
  • All customers in line must complete avg time
    m1?Tser 1/?
  • If something at server, it takes to complete on
    average m1(z)
  • Chance server is busy u?/? average delay is u
    x m1(z)
  • Tq u x m1(z) Lq x TserTq u x m1(z) ?
    x Tq x Tser Tq u x m1(z) u x TqTq x (1
    u) m1(z) x uTq m1(z) x u/(1-u) Tser x
    1/2 x (1C) x u/(1 u))Notation ?
    average number of arriving customers/second Tser
    average time to service a customer u server
    utilization (0..1) u ? x Tser Tq average
    time/customer in queue Lq average length of
    queueLq ? x Tq m1(z) average residual wait
    time Tser x 1/2 x (1C)

51
A Little Queuing Theory M/G/1 and M/M/1
  • Assumptions so far
  • System in equilibrium
  • Time between two successive arrivals in line are
    random
  • Server can start on next customer immediately
    after prior finishes
  • No limit to the queue works First-In-First-Out
  • Afterward, all customers in line must complete
    each avg Tser
  • Described memoryless or Markovian request
    arrival (M for C1 exponentially random),
    General service distribution (no restrictions), 1
    server M/G/1 queue
  • When Service times have C 1, M/M/1 queueTq
    Tser x u x(1 C) /(2 x (1 u)) Tser x u /(1
    u)
  • Tser average time to service a
    customeru server utilization (0..1) u r x
    TserTq average time/customer in queue

52
A Little Queuing Theory An Example
  • processor sends 10 x 8KB disk I/Os per second,
    requests service exponentially distrib., avg.
    disk service 20 ms
  • On average, how utilized is the disk?
  • What is the number of requests in the queue?
  • What is the average time spent in the queue?
  • What is the average response time for a disk
    request?
  • Notation
  • ? average number of arriving customers/second
    10Tser average time to service a customer 20
    ms (0.02s)u server utilization (0..1) u ? x
    Tser 10/s x .02s 0.2Tq average time/customer
    in queue Tser x u / (1 u) 20 x
    0.2/(1-0.2) 20 x 0.25 5 ms (0 .005s)Tsys
    average time/customer in system Tsys Tq Tser
    25 msLq average length of queueLq ? x Tq
    10/s x .005s 0.05 requests in queueLsys
    average tasks in system Lsys ? x Tsys
    10/s x .025s 0.25

53
A Little Queuing TheoryYet Another Example
  • Processor access memory over network
  • DRAM service properties
  • speed 9cycles 2cycles/word
  • With 8 word cachelines Tser 25cycles,
    ?1/25.04 ops/cycle
  • Deterministic Servicetime! (C0)
  • Processor behavior
  • CPI1, 40 memory, 7.5 cache misses
  • Rate ? 1 inst/cycle.4.075 .03 ops/cycle
  • Notation
  • ? average number of arriving
    customers/cycle.03 Tser average time to service
    a customer 25 cyclesu server utilization
    (0..1) u ? x Tser .03?25.75Tq average
    time/customer in queue Tser x u x (1 C)
    /(2 x (1 u)) (25 x 0.75 x ½)/(1 0.75)
    37.5 cyclesTsys average time/customer in
    system Tsys Tq Tser 62.5 cyclesLq average
    length of queueLq ? x Tq .03 x 37.5
    1.125 requests in queueLsys average tasks in
    system Lsys ? x Tsys .03 x 62.5 1.875

54
Summary
  • Disk industry growing rapidly, improves
  • bandwidth 40/yr ,
  • areal density 60/year, /MB faster?
  • queue controller seek rotate transfer
  • Advertised average seek time benchmark much
    greater than average seek time in practice
  • Response time vs. Bandwidth tradeoffs
  • Queueing theory or (c1)
  • Value of faster response time
  • 0.7sec off response saves 4.9 sec and 2.0 sec
    (70) total time per transaction gt greater
    productivity
  • everyone gets more done with faster response,
    but novice with fast response expert with slow
Write a Comment
User Comments (0)
About PowerShow.com