Title: Data Representation
1Chapter 3
2Chapter goals
- Describe numbering systems and their use in data
representation - Compare and contrast various data representation
methods - Describe how nonnumeric data is represented
3Data representation
- Humans have many symbolic forms to represent
information - Alphabet, numbers, pictograms ? ?
- Computer can only represent information with
electrical signals - Is a circuit on or off?
4Computers, numbers, and binary data
- Computers only use on/off signals to represent
information - These signals can only represent numeric data
- Even character based data is represented as a
number
5Why binary data?
- Electricity has two states, on and off
- On 1
- Off 0
- Binary numbers only have 0s and 1s
- Data is stored as collections of binary numbers
6Binary numbers are computer friendly
- Binary numbers are signals that can easily be
transported - Binary numbers can be easily processed
(transformed) by two-state electrical devices
that are easy to design and fabricate - These devices (and/or gates, adders) are strung
together like an assembly line to carry out a
function
7Logic gates
8Boolean algebra
- System developed by George Boole (19th century
mathematician) that can determine if two values
are - Equal, not equal, less than, greater than, etc.
- Boolean algebra allows the CPU to carry out
binary arithmetic (see White p.36-37)
9Binary numbers
- Can be combined into a positional numbering
system - Base for decimal numbers is 10, base for binary
numbers is 2 - Each position to the left is an increasing factor
of 2
10Terminology for number systems
- Base is also referred to as the radix
- Binary numbers have a radix of 2
- Decimal numbers have a radix of 10
- Radix point separates whole values from
fractional values - Decimal point is a kind of radix point
11Base 2 positional example
12Numbering systems
- Higher base (radix) means fewer positions are
needed to represent a number - Base 2 needs many more positions than base 10
- Base 16 (hexidecimal) is often used to represent
binary numbers
13Computers binary numbers
- Each digit of a binary number is called a bit
- Bit string group of digits that describes a
single value
14Bit strings
- Left most bit (most significant bit) called high
order bit - Right most bit (least significant bit) called low
order bit - 8 bits make a byte
- Programming languages/spreadsheets/etc.
automatically translate from base 10 to base 2
and back again
15Hexadecimal notation
- Base or radix is 16
- More compact than binary
- Symbols used are 0-9, A-F
- One hexadecimal position corresponds to 4 bits
- Used to designate memory locations, colors (html
VB)
16Goals of computer data representation
- Any representation format for numeric data
represents a balance among several factors,
including - Compactness
- Accuracy
- Range
- Ease of manipulation
- Standardization
17Balancing objectives
- Compactness and range are inversely related the
more compact, the smaller the range - Accuracy increases with of bits used,
especially with real numbers example, 1/3, or
0.33333333 (non-terminating fraction)
18Other objectives
- Does information format make it easier for
processor to perform operations? - Is data in a standard format, allowing simple
transfer between computers?
19CPU standard data types
- Integer
- Real number
- Character
- Boolean
- Memory address
20Integer data types
- Unsigned assumed to be positive
- Signed uses one bit (usually high order bit) to
indicate sign - 0 is positive, 1 is negative
21Representing negative integers
- Excess notation and twos complement
- Allow subtraction to be carried out as addition
- Number is converted to its complement
- 1 is added to the result
- When added to another binary number, carry bit is
ignored
22Range and overflow
- Most CPUs use a fixed width of 32 or 64 bits to
represent an integer - For small numbers format is padded with leading
zeros - Machine processes fixed width information more
easily than variable width
23Integer overflow
- If number is too big for fixed width integer
format CPU throws an overflow error - Integer format width is tradeoff between overflow
and wasted space (padded zeros) - CPU often use double precision data types for
arithmetic operations
24Representing real numbers
- More complicated problem than storing integers
- Real numbers contain whole fractional
components - How to represent both parts together in one
format?
25Fixed format for real numbers
26Floating point notation
- Any real number can be re-written using floating
point (scientific notation) - 12.555 becomes 1.2555 X 10¹
- Format stores 12555 (mantissa), 1 (exponent), and
sign () - -143.99 becomes 1.4399 X 102
- Format stores 14399 (mantissa), 2 (exponent), and
sign (-)
27IEEE floating point format for real numbers
28Floating point range
- Number of bits in floating point format limit
range of exponent, mantissa - Overflow (too large a number) always occurs in
the exponent - Underflow (too small a number, i.e. negative
exponent does not fit)
29Range for mantissa
- Number of bits for mantissa limit the number of
significant digits stored for a real number - 23 bits allows for approx. 7 decimal places of
precision - Mantissa is stored using truncation (information
that does not fit is discarded) - Does not throw an overflow condition
30Processing complexity
- General rule is floating point operations (, -,
, etc.) take CPU twice as long as integers
(binary) - Floating Point Operations Per Second (FLOPS) is a
measure of processor speed
31Character data
- Alphabetic letters (upper lower case),
numerals, punctuation marks, special symbols are
called characters - Variable of type character contain only one
symbol - Sequence of symbols forming words, sentences,
etc. called a string
32How computers store characters
- Character data cannot be directly processed by a
computer - Must be translated into a number
- Characters are converted into numbers using a
table of correspondences between a character and
a bit string
33Design issues for character coding schemes
- Table must be publicly available and all users
must use the same table - Coding scheme is a tradeoff among compactness,
ease of manipulation, accuracy, range,
standardization
34Examples of character coding schemes
- BCD and EBCIDIC older IBM mainframe computers
- ASCII PCs
- Unicode larger format allows for expanded and
international alphabets (Java and internet
applications)
35ASCII coding scheme
- 7 bit format allows for parity bit (used to check
for errors over transmission lines) - Has unique codes for all uppercase lowercase
letters, numbers, other printable characters - Also includes codes for device control
36Device control
- In many applications that handle text, formatting
commands to a device are included in the same
stream of data as the text - Examples word processors (reveal codes), HTML
tags - Examples CR (carriage return), tab, form feed
37Limitations to ASCII
- Not robust enough to represent multiple languages
and symbols - 7 bit format allows for 128 unique codes, some
languages have thousands of symbols - Unicode (16 bit) has 65,536 entries
38Boolean data
- Data types has two values, true and false
- Can be stored with one bit
- The results of many CPU operations (comparisons)
generate a Boolean value stored in a register
39Memory addresses
- Primary storage is a series of contiguous bytes
- CPU must be able to access sections of memory
directly - Sections of memory are accessed by their address
(location)
40Formats for memory addresses
- Flat memory model memory starts at address 0,
goes to maximum capacity 1 - Simple integers used to store address
- Segmented memory model
- Memory is divided into equal sized segments
called pages - Address has two parts00FA0034 number for page,
and location within page
41Data structures
- These five primitive types are quite limited for
representing real world data - Words, sentences
- Dates
- Data base tables
- More complex data structures constructed from
these five primitive types
42Chapter summary
- To be processed by any device, data must be
converted from its native format into a form
suitable for the processing device. - All data, including nonnumeric data, are
represented within a modern computer system as
strings of binary digits, or bits. - Each bit string has a specific data format and
coding method.
43Summary (cont.)
- Numeric data is stored using integer, real
number, and floating point formats. - Characters are converted to numbers by means of a
coding table. - Boolean vales can have only two values, true and
false. - Programs often need to define and manipulate data
in larger and more complex units than primitive
CPU data types.