IO, Disks, and RAID

About This Presentation

Title:

IO, Disks, and RAID

Description:

How does a computer system interact with its environment? Disks ... (3) Cut out a slice of pizza and eat it (4) Return the knife and fork to the pile ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 69

Provided by: ranveer7

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: IO, Disks, and RAID

1
I/O, Disks, and RAID
2
Goals for Today

Review I/O
How does a computer system interact with its
environment?
Disks
How does a computer system permanently store
data?
Prelim graded!
Discuss and pass back today
RAID
How to make storage both efficient and reliable?

3
The Requirements of I/O

So far in this course
We have learned how to manage CPU, memory
What about I/O?
Without I/O, computers are useless (disembodied
brains?)
But thousands of devices, each slightly
different
How can we standardize the interfaces to these
devices?
Devices unreliable media failures and
transmission errors
How can we make them reliable???
Devices unpredictable and/or slow
How can we manage them if we dont know what they
will do or how they will perform?
Some operational parameters
Byte/Block
Some devices provide single byte at a time (e.g.
keyboard)
Others provide whole blocks (e.g. disks,
networks, etc)
Sequential/Random
Some devices must be accessed sequentially (e.g.
tape)
Others can be accessed randomly (e.g. disk, cd,
etc.)
Polling/Interrupts
Some devices require continual monitoring

4
Modern I/O Systems
5
Example Device-Transfer Rates (Sun Enterprise
6000)

Device Rates vary over many orders of magnitude
System better be able to handle this wide range
Better not have high overhead/byte for fast
devices!
Better not waste time waiting for slow devices

6
The Goal of the I/O Subsystem

Provide Uniform Interfaces, Despite Wide Range of
Different Devices
This code works on many different devices
int fd open(/dev/something) for (int i
0 i lt 10 i) fprintf(fd,Count
d\n,i) close(fd)
Why? Because code that controls devices (device
driver) implements standard interface.
We will try to get a flavor for what is involved
in actually controlling devices in rest of
lecture
Can only scratch surface!

7
Want Standard Interfaces to Devices

Block Devices e.g. disk drives, tape drives,
DVD-ROM
Access blocks of data
Commands include open(), read(), write(), seek()
Raw I/O or file-system access
Memory-mapped file access possible
Character Devices e.g. keyboards, mice, serial
ports, some USB devices
Single characters at a time
Commands include get(), put()
Libraries layered on top allow line editing
Network Devices e.g. Ethernet, Wireless,
Bluetooth
Different enough from block/character to have own
interface
Unix and Windows include socket interface
Separates network protocol from network operation
Includes select() functionality
Usage pipes, FIFOs, streams, queues, mailboxes

8
How Does User Deal with Timing?

Blocking Interface Wait
When request data (e.g. read() system call), put
process to sleep until data is ready
When write data (e.g. write() system call), put
process to sleep until device is ready for data
Non-blocking Interface Dont Wait
Returns quickly from read or write request with
count of bytes successfully transferred
Read may return nothing, write may write nothing
Asynchronous Interface Tell Me Later
When request data, take pointer to users buffer,
return immediately later kernel fills buffer and
notifies user
When send data, take pointer to users buffer,
return immediately later kernel takes data and
notifies user

9
Life Cycle of An I/O Request
User Program
Kernel I/O Subsystem
Device Driver Top Half
Device Driver Bottom Half
Device Hardware
10
A Kernel I/O Structure
11
Device Drivers

Device Driver Device-specific code in the kernel
that interacts directly with the device hardware
Supports a standard, internal interface
Same kernel I/O system can interact easily with
different device drivers
Special device-specific configuration supported
with the ioctl() system call
Device Drivers typically divided into two pieces
Top half accessed in call path from system calls
Implements a set of standard, cross-device calls
like open(), close(), read(), write(), ioctl(),
strategy()
This is the kernels interface to the device
driver
Top half will start I/O to device, may put thread
to sleep until finished
Bottom half run as interrupt routine
Gets input or transfers next block of output
May wake sleeping threads if I/O now complete

12
I/O Device Notifying the OS

The OS needs to know when
The I/O device has completed an operation
The I/O operation has encountered an error
I/O Interrupt
Device generates an interrupt whenever it needs
service
Pro handles unpredictable events well
Con interrupts relatively high overhead
Polling
OS periodically checks a device-specific status
register
I/O device puts completion information in status
register
Could use timer to invoke lower half of drivers
occasionally
Pro low overhead
Con may waste many cycles on polling if
infrequent or unpredictable I/O operations
Some devices combine both polling and interrupts
For instance High-bandwidth network device
Interrupt for first incoming packet
Poll for following packets until hardware empty

13
How does the processor actually talk to the
device?

CPU interacts with a Controller
Contains a set of registers that can be read and
written
May contain memory for request queues or
bit-mapped images
Regardless of the complexity of the connections
and buses, processor accesses registers in two
ways
I/O instructions in/out instructions
Example from the Intel architecture out 0x21,AL
Memory mapped I/O load/store instructions
Registers/memory appear in physical address space
I/O accomplished with load and store instructions

14
Transfering Data To/From Controller

Programmed I/O
Each byte transferred via processor in/out or
load/store
Pro Simple hardware, easy to program
Con Consumes processor cycles proportional to
data size
Direct Memory Access
Give controller access to memory bus
Ask it to transfer data to/from memory directly
Sample interaction with DMA controller (from
book)

15
Main components of Intel Chipset Pentium 4

Northbridge
Handles memory
Graphics
Southbridge I/O
PCI bus
Disk controllers
USB controllers
Audio
Serial I/O
Interrupt controller
Timers

16
The Memory Hierarchy

Each level acts as a cache for the layer below it

CPU
registers, L1 cache
L2 cache
primary memory
disk storage (secondary memory)
random access
tape or optical storage (tertiary memory)
sequential access
17
Disks
18
What does the disk look like?
19
Some parameters

2-30 heads (platters 2)
diameter 14 to 2.5
700-20480 tracks per surface
16-1600 sectors per track
sector size
64-8k bytes
512 for most PCs
note inter-sector gaps
capacity 20M-300G
main adjectives BIG, slow

20
Disk overheads

To read from disk, we must specify
cylinder , surface , sector , transfer size,
memory address
Transfer time includes
Seek time to get to the track
Latency time to get to the sector and
Transfer time get bits off the disk

Track
Sector
Rotation Delay
Seek Time
21
Modern disks
22
52 years ago

On 13th September 1956, IBM 305 RAMAC computer
system first to use disk storage
80000 times more data on the 8GB 1-inch drive in
his right hand than on the 24-inch RAMAC one in
his left

23
Disks vs. Memory

Smallest write sector
Atomic write sector
Random access 5ms
not on a good curve
Sequential access 200MB/s
Cost .002MB
Crash doesnt matter (non-volatile)

(usually) bytes
byte, word
50 ns
faster all the time
200-1000MB/s
.10MB
contents gone (volatile)

24
Disk Structure

Disk drives addressed as 1-dim arrays of logical
blocks
the logical block is the smallest unit of
transfer
This array mapped sequentially onto disk sectors
Address 0 is 1st sector of 1st track of the
outermost cylinder
Addresses incremented within track, then within
tracks of the cylinder, then across cylinders,
from innermost to outermost
Translation is theoretically possible, but
usually difficult
Some sectors might be defective
Number of sectors per track is not a constant

25
Non-uniform sectors / track

Maintain same data rate with Constant Linear
Velocity
Approaches
Reduce bit density per track for outer layers
Have more sectors per track on the outer layers
(virtual geometry)

26
Disk Scheduling

The operating system tries to use hardware
efficiently
for disk drives ? having fast access time, disk
bandwidth
Access time has two major components
Seek time is time to move the heads to the
cylinder containing the desired sector
Rotational latency is additional time waiting to
rotate the desired sector to the disk head.
Minimize seek time
Seek time ? seek distance
Disk bandwidth is total number of bytes
transferred, divided by the total time between
the first request for service and the completion
of the last transfer.

27
Disk Scheduling (Cont.)

Several scheduling algos exist service disk I/O
requests.
We illustrate them with a request queue (0-199).
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53

28
FCFS
Illustration shows total head movement of 640
cylinders.
29
SSTF

Selects request with minimum seek time from
current head position
SSTF scheduling is a form of SJF scheduling
may cause starvation of some requests.
Illustration shows total head movement of 236
cylinders.

30
SSTF (Cont.)
31
SCAN

The disk arm starts at one end of the disk,
moves toward the other end, servicing requests
head movement is reversed when it gets to the
other end of disk
servicing continues.
Sometimes called the elevator algorithm.
Illustration shows total head movement of 236
cylinders.

32
SCAN (Cont.)
33
C-SCAN

Provides a more uniform wait time than SCAN.
The head moves from one end of the disk to the
other.
servicing requests as it goes.
When it reaches the other end it immediately
returns to beginning of the disk
No requests serviced on the return trip.
Treats the cylinders as a circular list
that wraps around from the last cylinder to the
first one.

34
C-SCAN (Cont.)
35
C-LOOK

Version of C-SCAN
Arm only goes as far as last request in each
direction,
then reverses direction immediately,
without first going all the way to the end of the
disk.

36
C-LOOK (Cont.)
37
Selecting a Good Algorithm

SSTF is common and has a natural appeal
SCAN and C-SCAN perform better under heavy load
Performance depends on number and types of
requests
Requests for disk service can be influenced by
the file-allocation method.
Disk-scheduling algo should be a separate OS
module
allowing it to be replaced with a different
algorithm if necessary.
Either SSTF or LOOK is a reasonable default algo

38
Summary

I/O Devices Types
Many different speeds (0.1 bytes/sec to
GBytes/sec)
Different Access Patterns
Block Devices, Character Devices, Network Devices
Different Access Timing
Blocking, Non-blocking, Asynchronous
I/O Controllers Hardware that controls actual
device
Processor Accesses through I/O instructions,
load/store to special physical memory
Report their results through either interrupts or
a status register that processor looks at
occasionally (polling)
Device Driver Device-specific code in kernel
Disks
Latency Seek Rotational Transfer
Also, queuing time
Rotational latency on average ½ rotation
Improve performance (decrease queuing time) via
scheduling

39
Announcements

Homework 4 available later tonight
It is a programming assignment, so start early
Prelims graded
Mean 67.7 (Median 67), Stddev 14.2, High 96 out
of 100!
Good job!
Re-grade policy
Submit written re-grade request to Nazrul.
Entire prelim will be re-graded.
We were generous the first time
If still unhappy, submit another re-grade
request.
Nazrul will re-grade herself
If still unhappy, submit a third re-grade
request.
I will re-grade. Final grade is law.

40
Grade distribution
41
Question 2

Algorithm
(1) Pick up a knife
(2) Pick a fork
(3) Cut out a slice of pizza and eat it
(4) Return the knife and fork to the pile
Correctness Constraints
wait for a knife and then a fork, in that order!
Key Deadlock cannot occur since algorithm
defines partial order
thus, no circular waiting exists

42
Question 3

32 bit virtual address and 32-bit physical
address, 8kB pages
bits for offset? bits for index?
Bytes required for PTE? Bytes required for page
table?
3 bytes and 21931.5 MB, respectively

13 and 19, respectively
43
Question 3 continued

32 bit virtual address and 24-bit physical
address, 8kB pages
bits for offset? bits for index?
Bytes required for PTE? Bytes required for page
table?
2 bytes and 21921 MB, respectively

13 and 19, respectively
44
Question 4

Give a brief definition of the term working
set?
Virtual memory pages touched within a window of
time (or window of page references).

45
Question 5 CPU Scheduling

CPU Utilization w/ 10 I/O bound process and 1
CPU-bound
I/O bound compute for 1ms, sleep for 10ms
CPU bound computes indefinitely
Context-switch overhead is 0.1ms
CPU utilization w/ 1 ms quantum?
scheduler incurs a 0.1ms context-switching cost
for every context-switch, regardless of process
type
Cpu util execTime/(execTimecontextSwitch)
1/(10.1)0.9090
CPU utilization w/ 10 ms quantum?
I/OexI CPUexC / (I/O(exIcs)
CPU(exCcs))
101 110 / (10(10.1)
1(100.1))
20/(1110.1) 20/21.1 0.9478673

46
Question 5 continued

What strategy can a process employ to maximize
the amount of CPU time allocated to that process?
Multilevel(-feedback) queue
Use a large fraction of assigned quantum
then relinquish the CPU before end of quantum
thus, increasing the priority associated with the
process
Round robin
Use entire quantum
Or say no specific strategy
Alternatively, use more threads

47
How is the disk formatted?

After manufacturing disk has no information
Is stack of platters coated with magnetizable
metal oxide
Before use, each platter receives low-level
format
Format has series of concentric tracks
Each track contains some sectors
There is a short gap between sectors
Preamble allows h/w to recognize start of sector
Also contains cylinder and sector numbers
Data is usually 512 bytes
ECC field used to detect and recover from read
errors

48
Cylinder Skew

Why cylinder skew?
How much skew?
Example, if
10000 rpm
Drive rotates in 6 ms
Track has 300 sectors
New sector every 20 µs
If track seek time 800 µs
40 sectors pass on seek
Cylinder skew 40 sectors

49
Formatting and Performance

If 10K rpm, 300 sectors of 512 bytes per track
153,600 bytes every 6 ms ? 24.4 MB/sec transfer
rate
If disk controller buffer can store only one
sector
For 2 consecutive reads, 2nd sector flies past
during memory transfer of 1st track
Idea Use single/double interleaving

50
Disk Partitioning

Each partition is like a separate disk
Sector 0 is MBR
Contains boot code partition table
Partition table has starting sector and size of
each partition
High-level formatting
Done for each partition
Specifies boot block, free list, root directory,
empty file system
What happens on boot?
BIOS loads MBR, boot program checks to see active
partition
Reads boot sector from that partition that then
loads OS kernel, etc.

51
Handling Errors

A disk track with a bad sector
Solutions
Substitute a spare for the bad sector (sector
sparing)
Shift all sectors to bypass bad one (sector
forwarding)

52
RAID Motivation

Disks are improving, but not as fast as CPUs
1970s seek time 50-100 ms.
2000s seek time lt5 ms.
Factor of 20 improvement in 3 decades
We can use multiple disks for improving
performance
By Striping files across multiple disks (placing
parts of each file on a different disk), parallel
I/O can improve access time
Striping reduces reliability
100 disks have 1/100th mean time between failures
of one disk
So, we need Striping for performance, but we need
something to help with reliability / availability
To improve reliability, we can add redundant data
to the disks, in addition to Striping

53
RAID

A RAID is a Redundant Array of Inexpensive Disks
In industry, I is for Independent
The alternative is SLED, single large expensive
disk
Disks are small and cheap, so its easy to put
lots of disks (10s to 100s) in one box for
increased storage, performance, and availability
The RAID box with a RAID controller looks just
like a SLED to the computer
Data plus some redundant information is Striped
across the disks in some way
How that Striping is done is key to performance
and reliability.

54
Some Raid Issues

Granularity
fine-grained Stripe each file over all disks.
This gives high throughput for the file, but
limits to transfer of 1 file at a time
coarse-grained Stripe each file over only a few
disks. This limits throughput for 1 file but
allows more parallel file access
Redundancy
uniformly distribute redundancy info on disks
avoids load-balancing problems
concentrate redundancy info on a small number of
disks partition the set into data disks and
redundant disks

55
Raid Level 0

Level 0 is nonredundant disk array
Files are Striped across disks, no redundant info
High read throughput
Best write throughput (no redundant info to
write)
Any disk failure results in data loss
Reliability worse than SLED

Stripe 0
Stripe 3
Stripe 1
Stripe 2
Stripe 7
Stripe 4
Stripe 6
Stripe 5
Stripe 8
Stripe 11
Stripe 10
Stripe 9
data disks
56
Raid Level 1

Mirrored Disks
Data is written to two places
On failure, just use surviving disk
On read, choose fastest to read
Write performance is same as single drive, read
performance is 2x better
Expensive

Stripe 0
Stripe 3
Stripe 1
Stripe 2
Stripe 0
Stripe 3
Stripe 1
Stripe 2
Stripe 7
Stripe 7
Stripe 4
Stripe 6
Stripe 5
Stripe 4
Stripe 6
Stripe 5
Stripe 8
Stripe 11
Stripe 8
Stripe 11
Stripe 10
Stripe 9
Stripe 10
Stripe 9
data disks
mirror copies
57
Parity and Hamming Code

What do you need to do in order to detect and
correct a one-bit error ?
Suppose you have a binary number, represented as
a collection of bits ltb3, b2, b1, b0gt, e.g. 0110
Detection is easy
Parity
Count the number of bits that are on, see if its
odd or even
EVEN parity is 0 if the number of 1 bits is even
Parity(ltb3, b2, b1, b0 gt) P0 b0 ? b1 ? b2 ?
b3
Parity(ltb3, b2, b1, b0, p0gt) 0 if all bits are
intact
Parity(0110) 0, Parity(01100) 0
Parity(11100) 1 gt ERROR!
Parity can detect a single error, but cant tell
you which of the bits got flipped

58
Parity and Hamming Code

Detection and correction require more work
Hamming codes can detect double bit errors and
detect correct single bit errors
7/4 Hamming Code
h0 b0 ? b1 ? b3
h1 b0 ? b2 ? b3
h2 b1 ? b2 ? b3
H0(lt1101gt) 0
H1(lt1101gt) 1
H2(lt1101gt) 0
Hamming(lt1101gt) ltb3, b2, b1, h2, b0, h1, h0gt
lt1100110gt
If a bit is flipped, e.g. lt1110110gt
Hamming(lt1111gt) lth2, h1, h0gt lt111gt compared
to lt010gt, lt101gt are in error. Error occurred in
bit 5.

59
Raid Level 2

Bit-level Striping with Hamming (ECC) codes for
error correction
All 7 disk arms are synchronized and move in
unison
Complicated controller
Single access at a time
Tolerates only one error, but with no performance
degradation

Bit 0
Bit 3
Bit 1
Bit 2
Bit 4
Bit 5
Bit 6
data disks
ECC disks
60
Raid Level 3

Use a parity disk
Each bit on the parity disk is a parity function
of the corresponding bits on all the other disks
A read accesses all the data disks
A write accesses all data disks plus the parity
disk
On disk failure, read remaining disks plus parity
disk to compute the missing data

Single parity disk can be used to detect and
correct errors
Bit 0
Bit 3
Bit 1
Bit 2
Parity
Parity disk
data disks
61
Raid Level 4

Combines Level 0 and 3 block-level parity with
Stripes
A read accesses all the data disks
A write accesses all data disks plus the parity
disk
Heavy load on the parity disk

Stripe 0
Stripe 3
Stripe 1
Stripe 2
P0-3
Stripe 7
Stripe 4
Stripe 6
Stripe 5
P4-7
Stripe 8
Stripe 11
P8-11
Stripe 10
Stripe 9
Parity disk
data disks
62
Raid Level 5

Block Interleaved Distributed Parity
Like parity scheme, but distribute the parity
info over all disks (as well as data over all
disks)
Better read performance, large write performance
Reads can outperform SLEDs and RAID-0

Stripe 0
Stripe 3
Stripe 1
Stripe 2
P0-3
P4-7
Stripe 6
Stripe 4
Stripe 5
Stripe 7
Stripe 8
Stripe 10
Stripe 11
P8-11
Stripe 9
data and parity disks
63
Raid Level 6

Level 5 with an extra parity bit
Can tolerate two failures
What are the odds of having two concurrent
failures ?
May outperform Level-5 on reads, slower on writes

64
RAID 01 and 10
65
Stable Storage

Handling disk write errors
Write lays down bad data
Crash during a write corrupts original data
What we want to achieve? Stable Storage
When a write is issued, the disk either correctly
writes data, or it does nothing, leaving existing
data intact
Model
An incorrect disk write can be detected by
looking at the ECC
It is very rare that same sector goes bad on
multiple disks
CPU is fail-stop

66
Approach

Use 2 identical disks
corresponding blocks on both drives are the same
3 operations
Stable write retry on 1st until successful, then
try 2nd disk
Stable read read from 1st. If ECC error, then
try 2nd
Crash recovery scan corresponding blocks on both
disks
If one block is bad, replace with good one
If both are good, replace block in 2nd with the
one in 1st

67
CD-ROMs

Spiral makes 22,188 revolutions around disk
(approx 600/mm).
Will be 5.6 km long. Rotation rate 530 rpm to
200 rpm

68
CD-ROMs

Logical data layout on a CD-ROM

Write a Comment

User Comments (0)

About PowerShow.com

IO, Disks, and RAID - PowerPoint PPT Presentation

IO, Disks, and RAID

How does a computer system interact with its environment? Disks ... (3) Cut out a slice of pizza and eat it (4) Return the knife and fork to the pile ... – PowerPoint PPT presentation