Title: Chapter 2: How are data represented
1Chapter 2 How are data represented?
2Why we care?
- The accuracy of our results
- The speed of processing
- The range of alphabets available to us
- The size of the files we must store
- The quality of graphics on screen and on paper
- The time it takes for Internet download
3Why computers work in binary?
- Cheapest and simplest in design and engineering
- Switch on ? 1 off ? 0
- Circuit voltages
- 1.7 volts higher ? 1
- 0.0 volts - 1.3 volts ? 0
- Voltages (1.3 to 1.7) are avoided in design
- Mathematics binary numbers
- Using digits 0 and 1 only.
4Decimal vs. Binary
- Decimal system
- 10 symbols 1, 2, 3,9, 0
- Base 10 (We have 10 fingers)
- Decimal number 2324 reads 2 thousands 3 hundreds
twenty four. - Binary system
- 2 symbols 0 and 1
- Base 2
- Binary number 1101 ?
5Decimal vs. Binary
4
2
3
2
.
Decimal System
21000
3100
210
41
Each digit represents
1000
100
10
1
Position values
103
Position values (base)
102
101
100
Value in Decimal
21000310021041 2324D
1
1
1
0
.
Binary System
23
Position values (base)
22
21
20
8
4
2
1
Position values
18
14
02
11
Each digit represents
Value in Decimal
18140211 13D
6Why do computer work in binary?
- Binary digits bits
- 8 bits 1 byte
- 210 bytes 1024 bytes 1 kilobytes 1KB
- 220 bytes 210 KB 1 megabytes 1MB
- 230 bytes 210 MB 1 gigabytes 1GB
- 240 bytes 210 GB 1 terabytes 1TB
7Types of data
- Instructions
- Computer instructions are coded in sequences of
0s and 1s - Numbers
- 2324, -34.35, 34567890123.12345
- Characters and symbols
- A, B, C, Z, a, b, c, z,
- 0, 1, 2, 3 9, , -, ), (, , , etc
- Images
- Photos, charts, drawings
- Audio
- Sound, music, etc
- Video
- Video clips and movies
8Representation of Numbers
- Fixed-size-storage approach
- Computers allocate a specified amount of space
for a number - Integers
- 1 bit 0 to 1
- 2 bits 00, 01, 10, 11 ? 0 to 3
- 4 bits 0000, 0001, 0010, 1111 ? 0 to 15
- 1 byte 0 to 255
- 2 bytes -32768 to 32767
- 4 bytes -2,147,648 to 2,147,483,647
- Note with 4 bytes for integers, any number
smaller than -2,147,648 or larger than
2,147,483,647 would be incorrectly represented.,
9Representation of Numbers
Binary representation of real numbers
1
1
1
0
.
Binary System
1
21
Position values (base)
20
2-1
2-2
2-3
2
1
1/2
1/4
Position values
1/8
12
01
10.5
10.25
Each digit represents
10.125
Value in Decimal
2 ½ ¼ 1/8 2.875D
10Representation of Numbers
- Floating-point numbers for real numbers
- Three parts of representation
- Sign (always 1 bits 0 for and 1 for -)
- Significant digits (e.g., six bits)
- the power of 2 for the leftmost digit (e.g., 3
bits) - Example for binary -1111.01
- Sign 1 (negative)
- Significant digits 111101B
- Power of 2 011B
- Example for binary 100.1101B
- Sign 0 (positive)
- Significant digits 110110B
- Note the last digit is lost, which is 1/16 in
decimal - Power of 2 010B
11Representation of Numbers
- Single-precision floating-point numbers
- Sign (always 1 bits 0 for and 1 for -)
- Significant digits 23 bits
- exponent 8
- Double-precision floating-point numbers
- Sign (always 1 bits 0 for and 1 for -)
- Significant digits 52 bits
- exponent 11
- What you should know?
- Computers can represent numbers only in limited
accuracy. - E.g., when you enter a 20 digit decimal into a
program that uses single-precision, only about 7
digits are actually stored, the rest are lost. - Real examples
- Designing aircraft on p.35
- The Vancouver Stock Exchange Index on pp. 38-39
12Representation of Numbers
- // file public_html/2005f-html/cil102/accuracy.c
- include ltstdio.hgt
- int main()
- int x, y, result // x, y, and result all
use 32 bits to represent integers (-2,147,648 to
2,147,483,647) - char op
- int i
- for (i 0 i lt 100 i)
- printf("please enter an
expression\n") - scanf("d c d", x, op, y)
- if (op '')
- result x y
- else if (op '-')
- result x - y
- else
- printf("Invalid
operator!!") - break
13Representation of Numbers
- Variable-size-storage approach
- Allow a wide-range of numbers to be stored
accurately - Needs significant more time to process
- Fixed-size approach is used more common than
variable-size approach.
14Representation of characters
- There are no visual letters A, B, C, etc stored
in computers like we have in mind. - Letters and symbols are encoded in 8 bits one
byte - of 0s and 1s. - Keyboard converts keys A, B, C etc to their
corresponding codes and - monitor converts the code into visual letters A,
B, C etc on screen. - Two commonly used coding schemes
- ASCII American Standard Code Information
Interchange - EBCDIC Extended Binary Coded Decimal Interchange
Code
15Representation of characters
16Representation of characters
- Foreign characters two approaches
- Use one byte per char
- Ex.,
- ISO-8859-1 for Western (Roman)
- ISO-8859-7 for Greek
- ISO-2022-CN for simplified Chinese
- Webpage using META charset to specify which
encoding is used. - Use two bytes per char/symbols
- 16 bits have 65,536 combinations (characters)
- Unicode coding system
17Representation of Images
- A picture is treated as a matrix of dots, called
pixels.
18Representation of Images
- The pixels are so small and close together we
cannot really see them as separate dots. - Resolution dots per inch (dpi)
- 72 dpi for Web images
- 600 or 1200 dpi for professional printers or home
photo printers
19Representation of Images
- The color of each pixel is represented using
bits. - Black/White one bit per pixel
- 1-white and 0-black
- Gray scale one byte per pixel
- 256 different degrees of gray (00000000 to
11111111) - 00000000 black, 01111111 intermediate gray,
11111111 white - Color three bytes per pixel
- Red, green, blue color
- One byte for the intensity of each of the three
color - 256 possible red, 256 green, 256 blue
- Pure red 11111111 for red byte, 00000000 for
green and blue - White 11111111 for all three bytes
- Black 00000000 for all three bytes
20Representation of Images
- Image storage -- size
- Gray scale one byte per pixel
- E.g., A 3 X 5 picture with 300 dpi resolution
- 3 300 900 pixels per column
- 5 300 1500 pixels per row
- 900 1500 1,350,000 pixels/picture
- Needed storage 1,350,000 bytes/picture
1MB/picture - Color three bytes per pixel
- E.g., A 3 X 5 picture with 300 dpi resolution
- 3 300 900 pixels per column
- 5 300 1500 pixels per row
- 900 1500 1,350,000 pixels/picture
- Needed storage 3 (bytes per pixel)
1,350,000 - 4,050,000
bytes/picture - 4MB/picture ---
TOO BIG
21Representation of Images
- Image compression
- Color table
- Most pictures contain a small of different
colors - Use a table to define colors that are actually
used in the picture - Each pixel has an index to the color table.
- Each image contains a color table and table
indices - Example
- For a picture with 100 different colors, the
color table would contain 100 entries, three
bytes each entry for each color. One byte can be
used as index to the table for each pixel.
22Representation of Images
- Drawing commands
- Draw picture using basic commands
- Just as artists draws using a pencil or a brush
and other basic movements - Example,
- A house is drawn by sketching various elements
(doors, windows, walls), adding color to them,
and moving to the desired position.
23Representation of Images
- Data averaging or sampling
- Condense the size by selecting a smaller
collection of information to store. - Many different ways of sampling and data
averaging - An example choose to store only every other
pixel in an image (sampling) reducing the size
to half. To display the full picture, the
computer need to fill in the missing data with,
for example, the average of neighboring pixels
(data averaging) - The resulting picture cannot be as sharp as the
original - Lossy data compression
24What are .gif, .ps, .jpg, .bmp formats?
- Commonly used image file formats -1
- Bitmap (.bmp)
- Pixel-by-pixel storage of all color information
for each pixel. - Lossless representation
- Files are huge.
- Graphics Interchange Format (.gif)
- Use one or more color tables the color table
technique - Each table contains 256 colors.
- Suitable for pictures with a small (lt256) of
different colors (e.g., organization charts) - Not suitable for pictures with shading (e.g.,
photos)
25What are .gif, .ps, .jpg, .bmp formats?
- Commonly used image file formats - 2
- PostScript (.ps)
- Employ the drawing commands technique
- moveto draws a line from current position to a
new one and arc draws an arc given its center,
radius, etc - General shapes can be used in multiple places
- Fonts can be reused.
- Useful when the picture can be rendered as a
drawing or its contains many of the same elements
(e.g., text of the same fonts) - Joint Photographic Experts Group (JPEG) (.jpg)
- use the data averaging and sampling on 88 pixel
blocks - User determines the level of details and clarity
- High-quality image 88 blocks maintain their
contents - Low-quality image info in 88 blocks is
discarded ? smaller files
26Comparison b/w jpg, gif, and ps
- Pictures in the textbook
- http//www.cs.grinnell.edu/walker/fluency-book/fi
gures/chapter2/fig-2-overview.html - Comparison of .jpg and .gif
- http//www.siriusweb.com/tutorials/gifvsjpg/
- More on .jpg and .gif
- http//www.wfu.edu/matthews/misc/jpg_vs_gif/JpgVs
Gif.html
27Summary chapter 2
- Computers work in binary
- Integers may be constrained in size
- Real numbers may have limited accuracy
- Computations may produce roundoff errors,
affecting accuracy - Characters and languages are encoded in binary
- Pictures are displayed pixel by pixel
- Color table, draw commands, and data averaging
and sampling compression techniques - .bmp, jpg, .gif, .ps formats
28Terminology
- Binary vs. decimal
- Position value
- The base of a system
- Bit/byte/KB/MB/GB/TB
- Integer binary s
- Real in binary
- Floating point numbers
- Representational error
- Roundoff errors
- ASCII/EBCDIC/Unicode
- Pixels
- Dots per inch (dpi)
- Bitmap
- Color table
- Data averaging
- Data sampling
- Data compression
- .jpg, .bmp, .gif, .ps