Title: Disk fundamentals
1Disk fundamentals
2The virtual layering
- Virtual layering of disk storage system
- disk controller firmware controller chips or
card to map physical disk geometry for different
drive brands and models - BIOS low level functions to read/write sectors
or format tracks - The OS API services to open/close files, set
properties, read/write files
3Virtual levels of disk access
4Common to all systems
- Physical partitioning of data
- Access to data at the file level
- Map filenames to physical locations
5Hardware level
- Platters
- sides
- Tracks
- Cylinders
- sectors
6OS level
- OS level view of the disk is in terms of
partitions, directories and files
7Assembly access to disk
- Readily available using BIOS under MS-DOS for ME,
NT, XP, Windows7, etc. - Store and retrieve data in a special format (like
Hamming or Huffman codes) - Recover lost data
- Perform diagnostics
- Using NT or XP you must use Win32 API for disk
manipulationor write device drivers with high
privilege
8Tracks, cylinders, sectors
- Disk is made up of multiple platters
- Attached to a spindle which rotates at constant
speed - Above the surface of each platter is a r/w head
that records magnetic pulses - The heads move in or out as a group
- See text sketch p 465
9Tracks, cylinders, sectors
- Surface of disk is formatted into (invisible)
concentric bands called tracks where data is
stored magnetically. - A disk will have thousands of tracks.
- Moving r/w head from one track to another is
called seeking. - (not mentioned) latency is the time it takes a
particular sector to rotate around under the head - Seek time for a disk is one sort of performance
measure - RPM is another performance measure- usually 7200
- The outermost track is track 0 and numbers
increase as you move toward the center.
10Tracks, cylinders, sectors
- All tracks readable from a given r/w head
position together form a cylinder. - A file would typically be stored on disk using
adjacent cylinders. This reduces seek time. - A sector is a 512-byte portion of a track
- Physical sectors are magnetically marked at the
factory using low-level formatting. Their size
does not change regardless of the OS used. A hard
disk may have 63 or more sectors per track.
11Photo of hard disk with reflective platter visible
12A platter from a 5.25" hard disk, with 20
concentric tracks drawnover the surface. Each
track is divided into 16 imaginary sectors
13Figure 1
14Sectors tracks
- A sector is the basic unit of data storage on a
hard disk. The term "sector" emanates from a
mathematical term referring to that pie shaped
angular section of a circle, bounded on two sides
by radii and the third by the perimeter of the
circle - See Figure 1. An explanation in its
simplest form, a hard disk is comprised of a
group of predefined sectors that form a circle.
That circle of predefined sectors is defined as a
single track. A group of concentric circles
(tracks) define a single surface of a disks
platter. Early hard disks had just a single
one-sided platter, while today's hard disks are
comprised of several platters with tracks on both
sides, all of which comprise the entire hard disk
capacity. Early hard disks had the same number of
sectors per track location, and in fact, the
number of sectors in each track were fairly
standard between models. Today's advances in
drive technology have allowed the number of
sectors per track, or SPT, to vary significantly,
but more about that later.
15More about disks
- When a hard disk is prepared with its default
values, each sector will be able to store 512
bytes of data. Without elaborating, there are a
few operating system disk setup utilities that
permit this 512 byte number per sector to be
modified, however 512 is the standard, and found
on virtually all hard drives by default. Each
sector, however, actually holds much more than
512 bytes of information. Additional bytes are
needed for control structures, information
necessary to manage the drive, locate data and
perform other functions. Exact sector structure
depends on the drive manufacturer and model,
however the contents of a sector usually include
the following elements - ID Information Within each sector a small space
is left to identify the sector's number and
location, which is used to locate the sector on
the disk and provide for status information about
the sector itself. For example, a single bit is
used to indicate if the sector has been marked
defective and remapped. - Synchronization Fields These are used internally
by the drive controller to guide the read
process. - Data The actual data in the sector.
- ECC Error correcting code used to ensure data
integrity. - Gaps Often referred to as spacers used to
separate sector areas and provide time for the
controller to process what it has been read
before processing additional data. - Servo Information In addition to the sectors,
each of which contain the items above, space on
each track is allocated for servo information on
drives that utilize embedded servo drives.
Most, if not all, modern drives not employ servo
technology.
16Aside Zoned Bit Recording
- We would be remiss in our discussion of drive
sectors, tracks and performance without
mentioning mass improvements such as Zoned Bit
Recording. One of the methods used to increase
capacity and data access speeds on hard disks is
by improving the utilization of the larger, outer
tracks of the disk. Early hard disks were
extremely primitive, and their controllers
weren't capable of handling complicated
arrangements such as being able to change tracks.
As the result of this arrangement, every track
had the same number of sectors, with the standard
set at 17 sectors per track. - As you can see from our sketch above, Figure 1,
tracks are concentric circles, with the ones on
the outside of the platter much larger in
circumference than the ones closer to the center.
Since there is a constraint on how tightly the
inner circles can be packed with bits, developers
packed them tightly as possible given the state
of technology at the time. By reducing bit
density, developers were able to assign the same
number of sectors to the outer circles.
Essentially this meant that the inner sectors
were being packed so tightly there was no room
for error, and the outer sectors underutilized,
as in theory they could hold many more sectors
given the same linear bit density limitations as
were imposed on the inner sectors.
17Zoned Bit Recording
- Drive developers, in an effort to create larger
drive sizes, as well as improve utilization and
performance, developed a technology referred to
as zoned bit recording (ZBR). Zoned bit recording
is often referred to as multiple zone recording
or just zone recording. With this technology,
tracks are grouped into zones based on their
distance from the center of the disk, and each
zone is assigned a number of sectors per track.
As you move from the innermost part of the disk
to the outer edge, you move through different
zones, each containing more sectors per track
than the one before. This makes more efficient
use of the larger tracks on the outside of the
disk. In essence, with ZBR, the size (or length)
of a sector remains reasonably constant over the
entire surface of the disk. Stark contrast to
very early hard disks that did not employ ZBR, as
their tracks were limited to only 9 sectors
regardless of track size. - An interesting added benefit from zoned bit
recording is that the raw data transfer rate of
the disk, also referred to as the media transfer
rate (a bit of a misnomer), when reading the
outside cylinders is considerably higher than
when reading the inside ones. Although the
angular velocity of the platters is constant
regardless of which track is being read, the
outer cylinders contain more data. Bear in mind
though that angular velocity does not necessarily
compensate for the fact that the outer tracks
(periphery of the platter) is moving much faster
than the tracks at the core of the platter. - Take note that constant angular velocity is not
the case for all drive technologies, such as
older CD-ROM drives. - Since data is written to the outer tracks of a
drive first, hence the drive is filled with data
from the outside in. The fastest data transfer
occurs when the drive is first used and data
retained in the outer tracks. Many people that
perform benchmarks on their systems and their
hard drives when new, then make some tweaks and
changes to their system only to return to their
benchmarks weeks or months later only to be
unpleasantly surprised that the disk and its
benchmarks are getting slower. Actually, the disk
has probably has not changed at all, but the
second benchmark may have been run on tracks
closer to the center of the disk. While most
people that take benchmarking seriously
defragment their drives before running the tests,
fragmentation of the file system can have impact
performance benchmarks.
18fragmentation
- Disk storage becomes fragmented over time just
like main memory. - A fragmented file is not located in contiguous
disk sectors. This slows access time.
19translation
- Translation is the process converting physical
geometry into logical structure - The drive itself or a card has a controller to
perform this operation. - The OS works with logical (not physical) sector
numbers.
20Logical Block Addressing aside
- Prior to the advent of Logical Block Addressing,
all hard drives were accessed via CHS (Cylinder,
Head, Sector) or Extended CHS, which means that
the drive was accessed by specifying its
cylinder, head and sector address. More
appropriately, it was referred to as accessing
the drive through its "geometry". Extended CHS
was a transition change in the way a drive was
accessed in order to work around the 504 MiB
barrier, however, the addressing was still done
in terms of cylinder, head and sector numbers and
then translated one or more times before actually
accessing the drive itself. - By contrast, logical block addressing (LBA)
involves a completely new method of addressing
sectors. New in that it is new to the EIDE/IDE
interface. LBA was first developed around SCSI
hard drives. With LBA, instead of referring to a
drives cylinder, head and sector number geometry
in order to access or "address" it, each sector
is assigned a unique "sector number". In essence,
LBA is a means by which a drive is accessed by
linearly addressing sector addresses, beginning
at sector 1 of head 0, cylinder 0 as LBA 0, and
proceeding on in sequence to the last physical
sector on the drive, which, for instance, on a
standard 540 Meg drive would be LBA 1,065,456.
While this was new it the AT Specification ATA-2,
it has always been the one and only addressing
mode in SCSI. AT Attachment ATA-2 has been
subsequently replaced, and the latest AT
specification is at ATA-7. Note also that LBA
does not allow you to address more sectors than
CHS style addressing would.
21Logical Block Addressing
- In order for you to employ LBA support, it must
be supported by both the BIOS and the operating
system. In addition, since it is a new method of
communicating with the hard drive, the drive
itself must support LBA as well. All newer hard
drives do in fact support LBA. Often we review
other sites to ensure that we provide you with
accurate information, and with respect to LBA, we
came upon a unique, but inaccurate, statement.
One purported authority on computer systems
stated that when drives supporting LBA are
auto-detected by a BIOS that supports LBA, it
will be set up to use that mode. This is
inaccurate and misleading, as there's nothing in
the BIOS code that will set up your drive to use
LBA mode. If you have ever used Fdisk, you may
recall that during the drive setup process, you
are asked whether you want to enable LBA. Hence,
it is a function of the operating system, and
therefore don't expect your BIOS to somehow
mysteriously setup your drive. - While it is true that a drive enabled for LBA is
not subject to the 504 MiB drive size barrier,
there still remains considerable confusion about
Logical Block Address and what it does. Many
knowledgeable technicians and users believe that
it is LBA addressing that avoids the 504 MiB
barrier, however this is not quite accurate.
Logical Block Addressing isn't getting around the
barrier, because it is just another manner in
which to address the same geometry. If you were
still limited to 1,024 cylinders, 16 heads and 63
sectors, you would still have logical sectors
beginning with number 0, and progressing
sequentially through to 1,032,191, with the 504
MiB still in place. What does avoid this barrier
is that LBA mode automatically enables geometry
translation. This translation is required because
the operating system calling the BIOS Int 13h
routines knows nothing about LBA. Therefore it is
the translation part of LBA that really gets
around the barrier. - When LBA is enabled, the BIOS will enable
geometry translation. This translation may be
done in the same way that it is done in Extended
CHS or large mode via a drives geometry, or it
may be done using a different algorithm called
LBA-assist translation. It is this translated
geometry that is presented to the operating
system for use in Int 13h calls. Basically, the
difference between LBA and ECHS is that when
using ECHS the BIOS translates the parameters
used by these calls from the translated geometry
to the drive's logical geometry. With LBA, it
translates from the translated geometry directly
into a logical block (sector) number. - LBA is currently the dominant form of hard disk
addressing. When the 8.4 GB limit of the Int13h
interface was reached in 1998-1999, it became
impossible to express the geometry of large hard
disks using cylinder, head and sector numbers,
regardless of whether translated or not, while
remaining below the Int13h limits of 1,024
cylinders, 256 heads and 63 sectors. This is one
of the reasons that today's hard drives no longer
indicate their classical geometry.
22Disk partitioning
- A single harddrive may be partitioned into
logical units named partitions or volumes
represented by a letter, A, B, C, .. - A partition may be primary or extended and a
drive may contain both types. - A primary partition is bootable.
- An extended partition may be further divided into
unlimited logical partitions. Each is mapped to a
drive letter and can not be bootable. But each
may be formatted with a different file system.
23Multiboot systems
- It is common to create multiple primary
partitions each booting a different OS. - Mathlab is dual boot
- In industry, you might have primary partitions
for development and production. - Logical partitions hold data. Different OS can
access the same file systems. Both Linux and DOS
can read FAT32 disks.
24FDISK.exe under MS-DOS
- Create and remove partitions
- Does not preserve data
- Later versions (Win2000 and later) have a disk
manager utility
25File systems
- Every OS has some disk management system.
- At the lowest level it manages partitions, at the
next highest, files and dirctories. - It must keep track of location, size and
attributes for each file.
26FAT File-Allocation-Table (see also later slide)
- Maps logical sectors to clusters (a basic storage
unit) - Maps files and directories to sequences of
clusters. - A cluster is the smallest unit of space used by a
file, consisting of one or more adjacent disk
sectors.
27Wikipedia FAT
- File Allocation Table (FAT) is a file system
developed by Microsoft for MS-DOS and was the
primary file system for consumer versions of
Microsoft Windows up to and including Windows Me.
FAT as it applies to flexible/floppy and optical
disk cartridges (FAT12 and FAT16 without long
file name support) has been standardized as
ECMA-107 and ISO/IEC 9293. The file system is
partially patented. - The FAT file system is relatively uncomplicated,
and is supported by virtually all existing
operating systems for personal computers. This
ubiquity makes it an ideal format for floppy
disks and solid-state memory cards, and a
convenient way of sharing data between disparate
operating systems installed on the same computer
(a dual boot environment). - The most common implementations have a serious
drawback in that when files are deleted and new
files written to the media, directory fragments
tend to become scattered over the entire media,
making reading and writing a slow process.
Defragmentation is one solution to this, but is
often a lengthy process in itself and has to be
performed regularly to keep the FAT file system
clean.
28Wikipedia NTFS
- NTFS (New Technology File System) is the standard
file system of Windows NT, including its later
versions Windows 2000, Windows XP, Windows Server
2003, Windows Server 2008, and Windows Vista.5 - NTFS replaced Microsoft's previous FAT file
system, used in MS-DOS and early versions of
Windows. NTFS has several improvements over FAT
and HPFS (High Performance File System) such as
improved support for metadata and the use of
advanced data structures to improve performance,
reliability, and disk space utilization plus
additional extensions such as security access
control lists and file system journaling. The
exact specification is a trade secret, although
(since NTFS v3.00) it can be licensed
commercially from Microsoft through their
Intellectual Property Licensing program.
29XP disk management tool
30Cluster sizes for 1.25-2gig volume
- FAT Type FAT16 FAT32
- Cluster Size 32 kiB 4 kiB
- Number of FAT Entries65,526 524,208
- Size of FAT 128 kiB 2 MiB
31Clusters used by FAT
- A chain of clusters is referenced by a FAT that
keeps track of all clusters used by a file.
Pictures show cluster chain and wasted space
examples.
sector
2
1
5
6
7
8
4
3
cluster1
cluster2
4096 used
4096 used
1000 bytes used
32FAT 12
- Still supported by Windows and Linux
- Cluster size is 512 bytes perfect for small
files - Each table entry is 12 bits
- A volume holds less than 4087 clusters
33FAT 16
- The only system for drives formatted under ms-dos
- Supported by all versions of windows and linux
- Drawbacks
- Storage is inefficient on volumes over 1 gig due
to large cluster size - Each table entry is 16 bits limiting the total
number of clusters that can be accessed - Volume holds between 4087 and 65,526 clusters
- Boot sector has no backup so a read error can be
catastrophic - No built in security or individual user
permissions
34FAT 32
- Introduced with OEM2 release of win 95 and later
refined - A single file can be up to 4gb (minus 2b)
- Each table entry is 32 bits
- a volume holds 65,526 up to 268,435,456 clusters
- Volume can hold up to 32 gig
- Smaller clusters than FAT 16 on volumes 1gb to
8gb resulting in less waste - Boot record has a backup of critical information
35NTFS
- Supported under NT, 2000, XP
- Handles large volumes possibly spread over
multiple drives - For disksgt2gig, default cluster is 4kb
- Supports unicode filenames up to 255 chars long
- Permissions
- Built-in encryption
- Change journal can track file revisions
- Disk quotas for individuals or groups of users
- Robust recovery for data error and automatically
repairs errors - Supports multiple disk mirroring (a mirror is a
copy)
36ECC and Hamming
- Hamming is a fairly expensive single-error
correction scheme developed by Hamming at Bell
Labs. - 2-power bits store parity of the other bits which
they correct. So bit 1 is parity for all the odd
bits. Bit 2 is parity for bits 3, 6, 7, 10,11,
14, 15, bit 4 is correction bit for bits 5, 6, 7,
12, 13, 14, 15. Bit 8 corrects 9, 10, ..15, and
so on.
37Hamming performance
- To send an 8 bit (ASCII code for example) piece
of data, we will use correcting bits 1, 2, 4, and
8 (4 bits) plus the 8 data bits means we will
package 12 bits. Notice this is a 33
overhead. - To send 16 bits of data we would use correction
bits 1,2,4,8 and 16 for a 21-bit package where
overhead has dropped to less than 25 - We can send up to 247 bits of data using parity
bits 1,2,4,8,16,32,64,and 128 (8 correction
bits) so the overhead has dropped down to
8/255smaller than 3
38Hamminga 12 bit example
- Compute the correcting bits to send 8 bits of
data, like A or 9. - Assume even parity bits.
39ECC example
- In bit interleaved parity disk 4 might hold
parity bits for data on the other three disks. - Bits are read simultaneously off the 4 disks. If
data is lost on one of the 3 data disks it can be
recovered from the parity disk. - For example, if 2 good data bits read (with X
marking lost data) are 1X1 with parity bit1 we
see lost data (X) must be a 1.
40MS DOS boot record
- See text pg 471
- Root directory is the main directory for a disk
volume A directory entry for a file contains
filename, size, attribute and starting cluster
number.
41Directory trees
- FAT and NTFS have root directories containing
primary list of files on the disk. - Subdirectories may be contained in the directory
42Directory trees
Root directory
cpp
java
asm
etc
bin
jar
jdk
source
lib
bin
43MS DOS directory structure
- MS-DOS entries are 32 bytes long with fields
shown in table 14-5
44MS DOS directory entry
Hex ofs Field format
00 Filename ASCII
08 extension ASCII
0B attr 8-bit bin
0C reserved
16 time 16-bit bin
18 data 16-bit bin
1A Start cluster 16-bit bin
1C size 32-bit bin
45Filename status byte
Status byte description
00h Entry never used
01h With attr0fh and status byte 1h, this is the first entry of a long filename
05h
E5h Entry is for a filename where the file has been erased
2E5h (.) for directory name
4nh First long name entry with attr0fh this marks the end. nentries for filename
46Attribute field is bit-mapped
reserved
archive
reserved
subdir
Volume label
System
hidden
Read-only
An entry of 0Fh indicates that the current dir
entry is for an extended filename
47Date stamp
Year 0..119 and is added to 1980
Month1..12
Day1..31
month
day
15
8
5
year
9
48Time stamp
Hour0..23
Minute0..59
Seconds0..59
seconds
hours
minutes
0
15
5
11
4
10
49MSDOS 32 bit date/timesame as 16 bit, but date
is high word of a double word
- Year bits 31-25
- Month 24-21
- Day 20-16
- Hour 15-11
- Min 10-5
- Sec 4-0
50Cluster chain example- just links are shown
2 3 4 8 9 10 eoc
1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16
File starting cluster1, filesize7
51Cluster chain example2- just links are shown
6 7 11 12 eoc
1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16
File starts in cluster 5, size5
52FAT
- When a file is create the OS looks for an
available cluster entry in the FAT. Gaps occur
if insufficient contiguous entries are available
typically as files are deleted new ones
added. - As files are modified and resaved, their chains
become fragmented. - As r/w heads jumps between cylinders to locate
all of a files clusters, performance degrades.
533 programs
- Previous (5th) edition text contained 3 programs
to read sectors, check free diskspace, and look
at clusters. - But the first two do not run under xp.