Title: IO System: Disks
1I/O System Disks
UNIVERSITY of WISCONSIN-MADISONComputer Sciences
Department
CS 537Introduction to Operating Systems
Andrea C. Arpaci-DusseauRemzi H.
Arpaci-Dusseau Haryadi S. Gunawi
- Topics in this lecture
- Disk internals
- Disk layout
- Disk scheduling
- Device drivers
- More disk internals
2I/O System
user process
user process
user process
file system
OS
I/O system
device driver
device controller
disk
3Disk Terminology
spindle
read/write head
platter
top surface
bottom surface
sector
track
cylinder
ZBR (Zoned bit recording) More sectors on outer
tracks
4Terminology
- A disk is a stack of 1 or more platters
- CD or floppy disk only has 1 platter
- A platter can have two active surfaces (top and
bottom) - Each surface is divided into concentric tracks
- Usually tens-of-thousand of tracks per surface
- A track is divided into sectors
- Outer-tracks have more sectors
- A sector is a unit of disk read/write
- A sector is a chunk of bytes (usually 512 bytes)
- A cylinder is a combination of tracks that are
lined-up together - Track 1 from all surfaces make up a cylinder
- A disk head (or disk arm) is used to read sectors
from a surface - (1 disk head / 1 active surface)
- Disk head can only move from one track to another
- Real life, disk head moves side-ways (like a
pendulum) rather than in-and-out - To read the requested sector
- Disk head must move to correct track
- Then wait until the requested sector is under the
disk head
5From file to bytes
- From user to the bits on the disk
- (Note disk is spinning all the time)
- User says read 1st byte from fileA
- FS knows where fileA is located on the disk (i.e.
the block number) - (Of course this location information is also
obtained from the disk) - FS uses block number
- A block could consists of multiple sectors
- Common block size 4 KB (8 sectors)
- Common sector size 512 Bytes
- FS sends low-level request to the block layer
(still inside OS) - I want to read block X
- Block layer converts block X to sector Y
- Device driver
- Convert sector Y into track position, platter
position, etc. - Send controls to the device controller
- Device controller
- Move to desired track (if head currently not in
the right track) - Decide which head to turn on (which platter)
- Wait for rotation, until the sector is exactly
under the head
user process (read fileA)
File system (fileA ?block number)
I/O system (blknum ? sector num)
device driver sector num ? platter, track
position, etc.
device controller
disk
6Disk Performance
- How long to read or write n sectors?
- Positioning time Transfer time (n)
- Positioning time Seek time Rotational Delay
- Transfer time n / (RPM bytes/track)
- Seek Time to position head over destination
cylinder - Today min 10 ms
- Rotation Wait for sector to rotate underneath
head - Today max 15,000 RPM (250 HZ 4 ms / rotation
in worst case) - Spinning is faster than moving head from one
track to another - Transfer rate
- Today max 114 MB/s
- Two classes of disk
- IDE / ATA Integrated Drive Electronics /
Advanced Technology Attachment (low-end /
nearline) - SCSI Small Computer System Interface (high-end /
enterprise) - The numbers above are current SCSI performance
(IDE/ATA is worse)
7Disk Calculations
t 1
t 64K
- Example disk
- surfaces 4
- tracks/surface 64K
- sectors/track 1K (assumption??)
- bytes/sector 512
- RPM 7200
- 120 rotations/sec (1/120) secs / rotation 8.3
ms / rotation - Seek cost 4 ms 16 ms
- Assume zero cost for transferring data
- Capacity?
- Surfaces tracks/surface sectors/track
bytes/sector - 4 64 K 1 K 512 B 128 GB
- How many disk heads? 4
- How many cylinders? 64K
- How many sectors/cylinder?
- sectors/track tracks/cylinder
- 1K 4 4K sectors
- What is the maximum transfer rate (bandwidth)?
1K
s 1
s 3
s 2
512 Bytes
8Disk Calculations (2)
- Average positioning time for random request?
- Avg. seek time Avg. rotational time 10 ms 4
ms 14 ms - Some parameters
- Full rotational time 8.3 ms (say just 8 ms)
- Time and bandwidth for random request of size
- 4KB 8 sectors 8 14 ms
- 128 KB 256 sectors 256 14 ms
- 1 MB? 2048 sectors 2048 14 ms
- For sequential request
- How many bytes per track? 512 KB
- 4KB? 8 sectors
- 1 positioning time 14 ms
- Transfer time (8 sectors / 1K sectors) 1 full
rotational time 0.0625 ms - 128 KB? 256 sectors
- Still fits in a track, hence 1 positioning time
14 ms - Transfer time (256 sectors / 1K sectors) 1
full rotational time 256/1024 8 ms 2 ms - 1 MB?
- Does not fit in a track, fits in 2 tracks (at
best) or 3 tracks (at worst). Say it fits in 2
tracks.
9Disk Abstraction
- OS does not know of internal complexity of disk
- Disk exports array of Logical Block Numbers
(LBNs) - Disks map internal sectors to LBNs
- How should disk map internal sectors to LBNs?
- Goal Sequential accesses (or contiguous LBNs)
should achieve best performance - Approaches
- Traditional ordering (left picture)
- Serpentine ordering (right picture)
- Start from outer-most cylinder
FS read block 5
LBN
1
2
3
4
5
..
..
..
40
31
33
32
10
1
3
2
20
11
13
12
10
1
3
2
20
11
13
12
1010
1001
1003
1002
30
21
23
22
10Disk Scheduling
- Goal Minimize positioning time
- Performed by both OS and disk itself Why?
- FCFS Schedule requests in order received
- Ex track 100, 90, 110, 80, 120, 70, 130
- Service sequence as incoming sequence
- Seek distance 10 20 30 40 50 60 210
tracks - Advantage Fair
- Disadvantage High seek cost and rotation
- Shortest seek time first (SSTF)
- Handle nearest cylinder next
- Ex track 100, 90, 110, 80, 120, 70, 130
- Service sequence 100, 110, 120, 130, 90, 80, 70
- Seek distance 10 10 10 40 10 10 90
tracks - Advantage Reduces arm movement (seek time)
- Disadvantage Unfair, can starve some requests
11Disk Scheduling
- SCAN (elevator) Move from outer cylinder in,
then back out again - Ex track 100, 90, 110, 80, 120, 70, 130
- Service sequence 100, 110, 120, 130, MAX (e.g.
300), 90, 80, 70, 0 - LOOK similar to SCAN, except stop at last
request - Ex track 100, 90, 110, 80, 120, 70, 130
- Service sequence 100, 110, 120, 130, 90, 80, 70
- Seek distance (same as SSTF) 90 tracks
- Advantage More fair to requests, similar
performance as SSTF - Circular-Look (C-Look)
- Move head only from outer cylinder inward (then
start over) - Why? Reduce the maximum delay for tracks at the
edge - Ex track 100, 90, 110, 80, 120, 70, 130
- Service sequence 100, 110, 120, 130, 70, 80, 90
- Seek distance 10 10 10 60 10 10 110
tracks
12Disk Scheduling
- Real goal Minimize positioning time
- Trend Rotation time dominating positioning time
- Very difficult for OS to predict
- ZBR, track and cylinder skew, serpertine layout,
bad block remapping, caching, ... - Disk controller can calculate positioning time
- Shortest positioning time first (SPTF)
- Incorporate rotational time
- Technique to prevent starvation
- Two queues
- Handle requests in current queue
- Add newly arriving requests added to other queue
13Linux Disk Scheduling (options)
- Elevator scheduler (C-LOOK)
- Deadline scheduler
- FIFO
- Serviced first
- Anticipatory scheduling
- Assume a block request will soon be followed by a
nearby one - Ex current queue contains blk 100, 2000
- Read(blk100) think time read(blk101)
- Think time is fast (processing and memory
accesses) - Hence rather than reading 2000, WAIT a while
- If no request coming, go back to elevator mode
- CFQ Completely Fair Queueing
- Other algorithms do not account the owner of the
request - Distribute disk time fairly across processes
- Uses anticipatory and deadline
14Summary
- CPU performance 8x in 3 years (exponential)
- Disk speed, linear improvement
- Seek time 1.2x improvement annually
- I/O performance is critical!
- I/O has always been the bottleneck
- Hence I/O goals are
- Reduce I/O if possible
- Design a fast I/O mechanism
15Device DriversExtra
16Device Drivers
- Mechanism Encapsulate details of device
- Example disk drivers
- ATA, SCSI, Flash disk, iSCSI, etc.
- File system not aware of device details (only see
LBNs) - Much of OS code is in device drivers
- Responsible for many of the errors as well
- 85 of Windows XP crashes caused by device
drivers - Device driver interacts with device controller
- Read status registers, read data
- Write control registers, provide data for write
operations - How does device driver access controller?
- Special instructions
- E.g. IN , OUT
- Valid only in kernel mode, No longer popular
- Memory-mapped
- Map a section of memory for I/O
- Communication is now in the form of read/write
these special memory addresses - Protect by placing in kernel address space only
- May map part of device in user address space for
fast access (e.g. data path, but not control path)
17Device Drivers Starting I/O
- Programmed I/O (PIO)
- Must initiate and watch every byte
- Disadvantage Large overhead for large transfers
- Direct Memory Access (DMA)
- Offload work from CPU to to special-purpose
processor responsible for large transfers - Example write a chunk of data from memory to
disk - We can bypass the CPU such that data is not
copied back and forth in CPU cache - Basic procedure
- CPU Write DMA command block into main memory
- Pointer to source and destination address
- Size of transfer
- CPU Inform DMA controller of address of command
block - DMA controller Handles transfer with I/O device
controller
18Device DriversWhen is I/O complete?
- Polling
- Handshake by setting and clearing flags
- Controller sets flag when done
- CPU repeatedly checks flag
- Disadvantage Busy-waiting
- CPU wastes cycles when I/O device is slow
- Must be attentive to device, or could lose data
- Interrupts Handle asynchronous events
- Controller asserts interrupt request line when
done - CPU jumps to appropriate interrupt service
routine (ISR) - Interrupt vector Table of ISR addresses
- Index by interrupt number
- Low priority interrupts postponed until higher
priority finished - Combine with DMA Do not interrupt CPU for every
byte
19More Disk InternalsExtra
20Positioning
- Drive servo system keeps head on track
- How does the disk head know where it is?
- Platters not perfectly aligned, tracks not
perfectly concentric (runout) -- difficult to
stay on track - More difficult as density of disk increase
- More bits per inch (BPI), more tracks per inch
(TPI) - Use servo burst
- Record placement information every few (3-5)
sectors - When head cross servo burst, figure out location
and adjust as needed
21Buffering
- Disks contain internal memory (2MB-16MB) used as
cache - Read-ahead Track buffer
- Read contents of entire track into memory during
rotational delay - Write caching with volatile memory
- Immediate reporting Claim written to disk when
not - Data could be lost on power failure
- Use only for user data, not file system meta-data
- Command queueing
- Have multiple outstanding requests to the disk
- Disk can reorder (schedule) requests for better
performance
22Reliability
- Disks fail more often....
- When continuously powered-on
- With heavy workloads
- Under high temperatures
- How do disks fail?
- Whole disk can stop working (e.g., motor dies)
- Transient problem (cable disconnected)
- Individual sectors can fail (e.g., head crash or
scratch) - Data can be corrupted or block not
readable/writable - Disks can internally fix some sector problems
- ECC (error correction code) Detect/correct bit
flips - Retry sector reads and writes Try 20-30
different offset and timing combinations for
heads - Remap sectors Do not use bad sectors in future
- How does this impact performance contract??