Data Representation - PowerPoint PPT Presentation

About This Presentation
Title:

Data Representation

Description:

Binary numbers can be easily processed (transformed) by two-state electrical ... For small numbers format is padded with leading zeros ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 44
Provided by: catheri57
Learn more at: http://csis.pace.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Representation


1
Chapter 3
  • Data Representation

2
Chapter goals
  • Describe numbering systems and their use in data
    representation
  • Compare and contrast various data representation
    methods
  • Describe how nonnumeric data is represented

3
Data representation
  • Humans have many symbolic forms to represent
    information
  • Alphabet, numbers, pictograms ? ?
  • Computer can only represent information with
    electrical signals
  • Is a circuit on or off?

4
Computers, numbers, and binary data
  • Computers only use on/off signals to represent
    information
  • These signals can only represent numeric data
  • Even character based data is represented as a
    number

5
Why binary data?
  • Electricity has two states, on and off
  • On 1
  • Off 0
  • Binary numbers only have 0s and 1s
  • Data is stored as collections of binary numbers

6
Binary numbers are computer friendly
  • Binary numbers are signals that can easily be
    transported
  • Binary numbers can be easily processed
    (transformed) by two-state electrical devices
    that are easy to design and fabricate
  • These devices (and/or gates, adders) are strung
    together like an assembly line to carry out a
    function

7
Logic gates
8
Boolean algebra
  • System developed by George Boole (19th century
    mathematician) that can determine if two values
    are
  • Equal, not equal, less than, greater than, etc.
  • Boolean algebra allows the CPU to carry out
    binary arithmetic (see White p.36-37)

9
Binary numbers
  • Can be combined into a positional numbering
    system
  • Base for decimal numbers is 10, base for binary
    numbers is 2
  • Each position to the left is an increasing factor
    of 2

10
Terminology for number systems
  • Base is also referred to as the radix
  • Binary numbers have a radix of 2
  • Decimal numbers have a radix of 10
  • Radix point separates whole values from
    fractional values
  • Decimal point is a kind of radix point

11
Base 2 positional example
12
Numbering systems
  • Higher base (radix) means fewer positions are
    needed to represent a number
  • Base 2 needs many more positions than base 10
  • Base 16 (hexidecimal) is often used to represent
    binary numbers

13
Computers binary numbers
  • Each digit of a binary number is called a bit
  • Bit string group of digits that describes a
    single value

14
Bit strings
  • Left most bit (most significant bit) called high
    order bit
  • Right most bit (least significant bit) called low
    order bit
  • 8 bits make a byte
  • Programming languages/spreadsheets/etc.
    automatically translate from base 10 to base 2
    and back again

15
Hexadecimal notation
  • Base or radix is 16
  • More compact than binary
  • Symbols used are 0-9, A-F
  • One hexadecimal position corresponds to 4 bits
  • Used to designate memory locations, colors (html
    VB)

16
Goals of computer data representation
  • Any representation format for numeric data
    represents a balance among several factors,
    including
  • Compactness
  • Accuracy
  • Range
  • Ease of manipulation
  • Standardization

17
Balancing objectives
  • Compactness and range are inversely related the
    more compact, the smaller the range
  • Accuracy increases with of bits used,
    especially with real numbers example, 1/3, or
    0.33333333 (non-terminating fraction)

18
Other objectives
  • Does information format make it easier for
    processor to perform operations?
  • Is data in a standard format, allowing simple
    transfer between computers?

19
CPU standard data types
  • Integer
  • Real number
  • Character
  • Boolean
  • Memory address

20
Integer data types
  • Unsigned assumed to be positive
  • Signed uses one bit (usually high order bit) to
    indicate sign
  • 0 is positive, 1 is negative

21
Representing negative integers
  • Excess notation and twos complement
  • Allow subtraction to be carried out as addition
  • Number is converted to its complement
  • 1 is added to the result
  • When added to another binary number, carry bit is
    ignored

22
Range and overflow
  • Most CPUs use a fixed width of 32 or 64 bits to
    represent an integer
  • For small numbers format is padded with leading
    zeros
  • Machine processes fixed width information more
    easily than variable width

23
Integer overflow
  • If number is too big for fixed width integer
    format CPU throws an overflow error
  • Integer format width is tradeoff between overflow
    and wasted space (padded zeros)
  • CPU often use double precision data types for
    arithmetic operations

24
Representing real numbers
  • More complicated problem than storing integers
  • Real numbers contain whole fractional
    components
  • How to represent both parts together in one
    format?

25
Fixed format for real numbers
26
Floating point notation
  • Any real number can be re-written using floating
    point (scientific notation)
  • 12.555 becomes 1.2555 X 10¹
  • Format stores 12555 (mantissa), 1 (exponent), and
    sign ()
  • -143.99 becomes 1.4399 X 102
  • Format stores 14399 (mantissa), 2 (exponent), and
    sign (-)

27
IEEE floating point format for real numbers
28
Floating point range
  • Number of bits in floating point format limit
    range of exponent, mantissa
  • Overflow (too large a number) always occurs in
    the exponent
  • Underflow (too small a number, i.e. negative
    exponent does not fit)

29
Range for mantissa
  • Number of bits for mantissa limit the number of
    significant digits stored for a real number
  • 23 bits allows for approx. 7 decimal places of
    precision
  • Mantissa is stored using truncation (information
    that does not fit is discarded)
  • Does not throw an overflow condition

30
Processing complexity
  • General rule is floating point operations (, -,
    , etc.) take CPU twice as long as integers
    (binary)
  • Floating Point Operations Per Second (FLOPS) is a
    measure of processor speed

31
Character data
  • Alphabetic letters (upper lower case),
    numerals, punctuation marks, special symbols are
    called characters
  • Variable of type character contain only one
    symbol
  • Sequence of symbols forming words, sentences,
    etc. called a string

32
How computers store characters
  • Character data cannot be directly processed by a
    computer
  • Must be translated into a number
  • Characters are converted into numbers using a
    table of correspondences between a character and
    a bit string

33
Design issues for character coding schemes
  • Table must be publicly available and all users
    must use the same table
  • Coding scheme is a tradeoff among compactness,
    ease of manipulation, accuracy, range,
    standardization

34
Examples of character coding schemes
  • BCD and EBCIDIC older IBM mainframe computers
  • ASCII PCs
  • Unicode larger format allows for expanded and
    international alphabets (Java and internet
    applications)

35
ASCII coding scheme
  • 7 bit format allows for parity bit (used to check
    for errors over transmission lines)
  • Has unique codes for all uppercase lowercase
    letters, numbers, other printable characters
  • Also includes codes for device control

36
Device control
  • In many applications that handle text, formatting
    commands to a device are included in the same
    stream of data as the text
  • Examples word processors (reveal codes), HTML
    tags
  • Examples CR (carriage return), tab, form feed

37
Limitations to ASCII
  • Not robust enough to represent multiple languages
    and symbols
  • 7 bit format allows for 128 unique codes, some
    languages have thousands of symbols
  • Unicode (16 bit) has 65,536 entries

38
Boolean data
  • Data types has two values, true and false
  • Can be stored with one bit
  • The results of many CPU operations (comparisons)
    generate a Boolean value stored in a register

39
Memory addresses
  • Primary storage is a series of contiguous bytes
  • CPU must be able to access sections of memory
    directly
  • Sections of memory are accessed by their address
    (location)

40
Formats for memory addresses
  • Flat memory model memory starts at address 0,
    goes to maximum capacity 1
  • Simple integers used to store address
  • Segmented memory model
  • Memory is divided into equal sized segments
    called pages
  • Address has two parts00FA0034 number for page,
    and location within page

41
Data structures
  • These five primitive types are quite limited for
    representing real world data
  • Words, sentences
  • Dates
  • Data base tables
  • More complex data structures constructed from
    these five primitive types

42
Chapter summary
  • To be processed by any device, data must be
    converted from its native format into a form
    suitable for the processing device.
  • All data, including nonnumeric data, are
    represented within a modern computer system as
    strings of binary digits, or bits.
  • Each bit string has a specific data format and
    coding method.

43
Summary (cont.)
  • Numeric data is stored using integer, real
    number, and floating point formats.
  • Characters are converted to numbers by means of a
    coding table.
  • Boolean vales can have only two values, true and
    false.
  • Programs often need to define and manipulate data
    in larger and more complex units than primitive
    CPU data types.
Write a Comment
User Comments (0)
About PowerShow.com