Title: Computer Architecture
1Computer Architecture Related Topics
- Ben Schrooten
- Shawn Borchardt, Eddie Willett
- Vandana Chopra
2PresentationTopics
- Computer Architecture History
- Single Cpu Design
- GPU Design (Brief)
- Memory Architecture
- Communications Architecture
- Dual Processor Design
- Parallel Supercomputing Design
3Part 1 History and Single Cpu
4HISTORY!!!
One of the first computing devices to come about
was . .
The ABACUS!
5The ENIAC 1946
- Completed1946
- Programmedplug board and switches
- Speed5,000 operations per second
- Input/outputcards, lights, switches, plugs
- Floor space1,000 square feet
6The EDSAC(1949) and The UNIVAC I(1951)
UNIVAC Speed1,905 operations per
second Input/outputmagnetic tape, unityper,
printer Memory size1,000 12-digit words in delay
lines Memory typedelay lines, magnetic
tape Technologyserial vacuum tubes, delay lines,
magnetic tape Floor space943 cubic
feet CostF.O.B. factory 750,000 plus 185,000
for a high speed printer
EDSAC Technologyvacuum tubes Memory1K
words Speed714 operations per second First
practical stored-program computer
7Progression of The Architecture
Intel 4004 1971
Vacuum tubes -- 1940 1950 Transistors -- 1950
1964 Integrated circuits -- 1964 1971
Microprocessor chips -- 1971 present
8Current CPUArchitecture
9 10Single Bus Slow Performance
11Example of Triple Bus Architecture
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Motherboards / Chipsets / Sockets
OH MY!
- In charge of
- Memory Controller
- EIDE Controller
- PCI Bridge
- Real Time Clock
- DMA Controller
- IRDA Controller
- Keyboard
- Mouse
- Secondary Cache
- Low-Power CMOS SRAM
17Sockets
- Socket 4 5
- Socket 7
- Socket 8
- Slot 1
- Slot A
18(No Transcript)
19(No Transcript)
20GPUs
- Allows for Real Time Rendering Graphics on a
small PC
- GPUs are true processing units
- Pentium 4 contains 42 million transistors on a
0.18 micron process
- Geforce3 contains 57 million transistors on a
0.15 micron manufacturing process
21More GPU
22Sources
Memory Functionality Dana Angluin http//zoo.cs.ya
le.edu/classes/cs201/Fall_2001/handouts/lecture-13
/node4.html  Benchmark Graphics Digital
Life http//www.digit-life.com/articles/pentium4/i
ndex3.html  Chipset and Socket
Information Motherboards.org http//www.motherboar
ds.org/articlesd/tech-planations/17_2.html  Amd
Processor Pictures Toms hardware http//www6.tomsh
ardware.com/search/search.html?categoryallwords
Athlon  GPU Info 4th Wave Inc. http//www.wave-re
port.com/tutorials/gpu.htm  NV20 Design
Pictures Digital Life http//www.digit-life.com/ar
ticles/nv20/ Â
Source for DX4100 Picture Oneironaut http//oneiro
naut.tripod.com/dx4100.jpg  Source for Computer
Architecture Overview Picture http//www.eecs.tula
ne.edu/courses/cpen201/slides/201Intro.pdf  Pictu
res of CPU Overview, Single Bus Architecture,
Tripe Bus Architecture Roy M. Wnek Virginia Tech.
CS5515 Lecture 5 http//www.nvc.cs.vt.edu/wnek/cs
5515/slide/Grad_Arch_5.PDF Â Historical Data and
Pictures The Computer Museum History
Center. http//www.computerhistory.org/ Â Intel
Motherboard Diagram/Pentium 4 Picture Intel
Corporation http//www.intel.com  The Abacus
Abacus-Online-Museum http//www.hh.schule.de/meta
lltechnik-didaktik/users/luetjens/abakus/china/chi
na.htm  Information Also from Clint
Fleri http//www.geocities.com/cfleri/
23Main Memory
24Memory Hierarchy
25DRAM vs. SRAM
- DRAM is short for Dynamic Random Access Memory
- SRAM is short for Static Random Access Memory
- DRAM is dynamic in that, unlike SRAM, it needs to
have - its storage cells refreshed or given a new
electronic charge - every few milliseconds. SRAM does not need
refreshing - because it operates on the principle of moving
current that - is switched in one of two directions rather than
a storage cell - that holds a charge in place.
26(No Transcript)
27Parity vs. Non-Parity
- Parity is error detection that was developed to
notify the user of any data errors. By adding a
single bit to each byte of data, this bit is
responsible for checking the integrity of the
other 8 bits while the byte is moved or stored. - Since memory errors are so rare, many of todays
memory is non-parity.
28SIMM vs. DIMM vs. RIMM?
- SIMM-Single In-line Memory Module
- DIMM-Dual In-line Memory Modules
- RIMM-Rambus In-line Memory Modules
- SIMMs offer a 32-bit data path while DIMMs offer
a 64-bit data path. SIMMs have to be used in
pairs on Pentiums and more recent processors - RIMM is the one of the latest designs. Because
of the fast data transfer rate of these modules,
a heat spreader (aluminum plate covering) is used
for each module
29Evolution of Memory
1970 RAM / DRAM 4.77 MHz 1987
FPM 20 MHz 1995 EDO 20 MHz 1997
PC66 SDRAM 66 MHz 1998 PC100
SDRAM 100 MHz 1999 RDRAM 800 MHz
1999/2000 PC133 SDRAM 133 MHz 2000 DDR SDRAM
266 MHz 2001 EDRAM 450MHz
30- FPM-Fast Page Mode DRAM
- -traditional DRAM
- EDO-Extended Data Output
- -increases the Read cycle between Memory and
the CPU - SDRAM-Synchronous DRAM
- -synchronizes itself with the CPU bus and runs
at higher - clock speeds
31- RDRAM-Rambus DRAM
- -DRAM with a very high bandwidth (1.6 GBps)
- EDRAM-Enhanced DRAM
- -(dynamic or power-refreshed RAM) that
includes a - small amount of static RAM (SRAM) inside a
larger - amount of DRAM so that many memory accesses
will - be to the faster SRAM. EDRAM is sometimes
used as - L1 and L2 memory and, together with Enhanced
- Synchronous Dynamic DRAM, is known as cached
- DRAM.
32Read Operation
- On a read the CPU will first try to find the data
in the cache, if it is not there the cache will
get updated from the main memory and then return
the data to the CPU.
33Write Operation
- On a write the CPU will write the information
into the cache and the main memory.
34References
- http//www-ece.ucsd.edu/weathers/ece30/downloads/
Ch7_memory(4x).pdf - http//home.cfl.rr.com/bjp/eric/ComputerMemory.htm
l - http//aggregate.org/EE380/JEL/ch1.pdf
35(No Transcript)
36Defining a Bus
- A parallel circuit that connects the major
components of a computer, allowing the transfer
of electric impulses from one connected component
to any other
37VESA - Video Electronics Standards Association
- 32 bit bus
- Found mostly on 486 machines
- Relied on the 486 processor to function
- People started to switch to the PCI bus because
of this - Otherwise known as VLB
38ISA - Industry Standard Architecture
- Very old technology
- Bus speed 8mhz
- Speed of 42.4 Mb/s maximum
- Very few ISA ports are found in modern machines.
39MCA - Micro Channel Bus
- IBMs attempt to compete with the ISA bus
- 32 bit bus
- Automatically configured cards (Like Plug and
Play) - Not compatible with ISA
40EISA - Extended Industry Standard Architecture
- Attempt to compete with IBMs MCA bus
- Ran on a 8.33Mhz cycle rate
- 32 bit slots
- Backward compatible with ISA
- Went the way of MCA
41PCI Peripheral Component Interconnect
- Speeds up to 960 Mb/s
- Bus speed of 33mhz
- 16-bit architecture
- Developed by Intel in 1993
- Synchronous or Asynchronous
- PCI popularized Plug and Play
- Runs at half of the system bus speed
42PCI X
- Up to 133 Mhz bus speed
- 64-bit bandwidth
- 1GB/sec throughput
- Backwards compatible with all PCI
- Primarily developed for increased I/O demands of
technologies such as Fibre Channel, Gigabit
Ethernet and Ultra3 SCSI.
43AGP Accelerated Graphics Port
- Essentially a high speed PCI Port
- Capable of running at 4 times PCI bus speed.
(133mhz) - Used for High speed 3D graphics cards
- Considered a port not a bus
- Only two devices involved
- Is not expandable
44(No Transcript)
45IDE - Integrated Drive Electronics
- Tons of other names ATA, ATA/ATAPI, EIDE,
ATA-2, Fast ATA, ATA-3, Ultra ATA, Ultra DMA - Good performance at a cheap cost
- Most widely used interface for hard disks
46SCSI - Small Computer System Interface skuzzy
- Capable of handling internal/external peripherals
- Speed anywhere from 80 640 Mb/s
- Many types of SCSI
47 48Serial Port
- Uses DB9 or DB25 connector
- Adheres to RS-232c spec
- Capable of speeds up to 115kb/sec
49USB
- 1.0
- hot plug-and-play
- Full speed USB devices signal at 12Mb/s
- Low speed devices use a 1.5Mb/s subchannel.
- Up to 127 devices chained together
- 2.0
- data rate of 480 mega bits per second
50USB On-The-Go
- For portable devices.
- Limited host capability to communicate with
selected other USB peripherals - A small USB connector to fit the mobile form
factor
51Firewire i.e. IEEE 1394 and i.LINK
- High speed serial port
- 400 mbps transfer rate
- 30 times faster than USB 1.0
- hot plug-and-play
52PS/2 Port
- Mini Din Plug with 6 pins
- Mouse port and keyboard port
- Developed by IBM
53Parallel port i.e. printer port
- Old type
- Two new types
- ECP (extended capabilities port)
- and EPP (enhanced parallel port)
- Ten times faster than old parallel port
- Capable of bi-directional communication.
54Game Port
- Uses a db15 port
- Used for joystick connection to the computer
55(No Transcript)
56Parallel Computer Architecture
57Need for High Performance Computing
- Theres a need for tremendous computational
capabilities in science engineering and business - There are applications that require gigabytes of
memory and gigaflops of performance
58What is a High Performance Computer
- Definition of a High Performance computer An
HPC computer can solve large problems in a
reasonable amount of time - Characteristics Fast Computation
- Large memory
- High speed
interconnect - High speed input
/output -
-
59How is an HPC computer made to go fast
- Make the sequential computation faster
- Do more things in parallel
60Applications
- 1gt Weather Prediction
- 2gt Aircraft and Automobile Design
- 3gt Artificial Intelligence
- 4gt Entertainment Industry
- 5gt Military Applications
- 6gt Financial Analysis
- 7gt Seismic exploration
- 8gt Automobile crash testing
61Who Makes High Performance Computers
- SGI/Cray
- Power Challenge Array
- Origin-2000
- T3D/T3E
- HP/Convex
- SPP-1200
- SPP-2000
- IBM
- SP2
- Tandem
-
-
-
-
62Trends in Computer Design
- Performance of the fastest computer has grown
exponentially from 1945 to the present averaging
a factor of 10 every five years - The growth flattened somewhat in 1980s but is
accelerating again as massively parallel
computers became available
63(No Transcript)
64Increase in the No of Processors
65Real World Sequential Processes
- Sequential processes we find in the world.
- The passage of time is a classic example of a
sequential process. - Day breaks as the sun rises in the morning.
- Daytime has its sunlight and bright sky.
- Dusk sees the sun setting in the horizon.
- Nighttime descends with its moonlight, dark sky
and stars.
66Parallel Processes
- Music
- An orchestra performance, where every instrument
plays its own part, and playing together they
make beautiful music.
67 Parallel Features of Computers
- Various methods available on computers for doing
work in parallel are - Computing environment
- Operating system
- Memory
- Disk
- Arithmetic
68Computing Environment - Parallel Features
- Using a timesharing environment
- The computer's resources are shared among many
users who are logged in simultaneously. - Your process uses the cpu for a time slice, and
then is rolled out while another users process
is allowed to compute. - The opposite of this is to use dedicated mode
where yours is the only job running. - The computer overlaps computation and I/O
- While one process is writing to disk, the
computer lets another process do some computation
69Operating System - Parallel Features
- Using the UNIX background processing facility
- a.out gt results
- man etime
- Using the UNIX Cron jobs feature
- You submit a job that will run at a later time.
- Then you can play tennis while the computer
continues to work. - This overlaps your computer work with your
personal time.
70Memory - Parallel Features
- Memory Interleaving
- Memory is divided into multiple banks, and
consecutive data elements are interleaved among
them. - There are multiple ports to memory. When the
data elements that are spread across the banks
are needed, they can be accessed and fetched in
parallel. - The memory interleaving increases the memory
bandwidth.
71Memory - Parallel Features(Cont)
- Multiple levels of the memory hierarchy
- Global memory which any processor can access.
- Memory local to a partition of the processors.
- Memory local to a single processor
- cache memory
- memory elements held in registers
72Disk - Parallel Features
- RAID disk
- Redundant Array of Inexpensive Disk
- Striped disk
- When a dataset is written to disk, it is broken
into pieces which are written simultaneously to
different disks in a RAID disk system. - When the same dataset is read back in, the pieces
of the dataset are read in parallel, and the
original dataset is reassembled in memory.
73Arithmetic - Parallel Features
- We will examine the following features that lend
themselves to parallel arithmetic - Multiple Functional Units
- Super Scalar arithmetic
- Instruction Pipelining
74 Parallel Machine Model
(Architectures)
75 MultiComputer
- A multicomputer comprises a number of von Neumann
computers or nodes linked by a interconnection
network - In a idealized network the cost of sending the a
message between two nodes is independent of both
node location and other network traffic but does
depend on message length
76- Locality
- Scalibility
- Concurrency
77Distributed Memory (MIMD)
- MIMD means that each processor can execute
- separate stream of instructions on its own
local data,distributed memory means that memory
is distributed among the processors rather than
placed in a central location
78- Difference between multicomputer and MIMD
- The cost of sending a message between
multicomputer and the distributed memory is not
independent of node location and other network
traffic
79Examples of MIMD machine
80 MultiProcessor or Shared Memory MIMD
- All processors share access to a common memory
via bus or hierarchy of buses
81Example for Shared Memory MIMD
- Silicon Graphics Challenge
82 SIMD Machines
- All processors execute the same instruction
stream on a different piece of data
83Example of SIMD machine
84Use of Cache
- Why is cache used on parallel computers?
- The advances in memory technology arent keeping
up with processor innovations. - Memory isnt speeding up as fast as the
processors. - One way to alleviate the performance gap between
main memory and the processors is to have local
cache. - The cache memory can be accessed faster than the
main memory. - Cache keeps up with the fast processors, and
keeps them busy with data.
85Shared Memory
Network
Cache
Cache
Cache
Memory 1
Memory 2
Memory 3
processor
processor
processor
1
2
3
86Cache Coherence
- What is cache coherence?
- Keeps a data element found in several caches
current with each other and with the value in
main memory. - Various cache coherence protocols are used.
- snoopy protocol
- directory based protocol
87 Various Other Issues
- Data Locality Issue
- Distributed Memory Issue
- Shared Memory Issue
88 Thanks