Title: ???? ?????? 0368-2159 Lecture 1 ????? ??? ???????? ?????? ??? ???????: ???? ???? ???? ??-????
1???? ??????0368-2159Lecture 1????? ???
???????? ?????? ??? ??????? ????
???? ???? ??-????
2?? ?? ???? ???????
- ????? - ???????????
- ?????? ??????
- ?????????? ??????
3?? ?? ???? ????
- Introduction Computer Architecture
- Administrative Matters
- History
- ???????? ????? ??? ?????? ??????? ??????? ?????
- ??? ?????
- ???????
- ??????? ????? ?????
- ?????????
- ?????? ??????? ??????? ??????????
4Computing Devices Then
- EDSAC, University of Cambridge, UK, 1949
5Computing Devices Now
Sensor Nets
Cameras
Games
Set-top boxes
Media Players
Laptops
Servers
Robots
Routers
Smart phones
Automobiles
Supercomputers
6???? ??????, ?? ???
7(No Transcript)
8Mother board
9 First Pacemaker, 1957
10(No Transcript)
11The paradigm (Patterson)
- Every Computer Scientist should master the AAA
- Architecture
- Algorithms
- Applications
12Computer Architecture GOAL
Fast, Effective and Cheap
- The goal of Computer Architecture
- To build cost effective systems
- How do we calculate the cost of a system ?
- How we evaluate the effectiveness of the system?
- To optimize the system
- What are the optimization points ?
- Fact most of the computer systems still use
Von-Neumann principle of operation, even though,
internally, they are much different from the
computer of that time.
13Anatomy 5 components of any Computer (since 1946)
Personal Computer
Keyboard, Mouse
Computer
Processor
Memory (where programs, data live
when running)
Devices
Disk (where programs, data live when not
running)
Input
Control (brain)
Datapath (brawn)
Output
Display, Printer
14Computer System Structure
15The Instruction Set a Critical Interface
software
instruction set
hardware
16?? ?? Computer Architecture ?
- Computer Architecture
- Instruction Set Architecture
- Machine Organization
- ????? ??????????
17What are Machine Structures?
Application (ex browser)
Operating
Compiler
System (Linux, Win, ..)
Software
Assembler
Instruction Set Architecture
Hardware
I/O system
Processor
Memory
Datapath Control
Digital Design
Circuit Design
transistors
Physics
- Coordination of many
- levels (layers) of abstraction
18Levels of Representation
temp vk vk vk1 vk1 temp
High Level Language Program
Compiler
- lw 15, 0(2)
- lw 16, 4(2)
- sw 16, 0(2)
- sw 15, 4(2)
Assembly Language Program
Assembler
0000 1001 1100 0110 1010 1111 0101 1000 1010 1111
0101 1000 0000 1001 1100 0110 1100 0110 1010
1111 0101 1000 0000 1001 0101 1000 0000 1001
1100 0110 1010 1111
Machine Language Program
Machine Interpretation
Control Signal Specification
ALUOP03 lt InstReg911 MASK
19Computer Architectures Changing Definition
- 1950s to 1960s Computer Architecture Course
- Computer Arithmetic
- 1970s to mid 1980s Computer Architecture Course
- Instruction Set Design, especially ISA
appropriate for compilers - 1990s Computer Architecture Course
- Design of CPU, memory system, I/O system,
Multi-processors, Networks - 2000s Computer Architecture Course
- Special purpose architectures, Functionally
reconfigurable, Special considerations for low
power/mobile processing - 2005 futue (?) Multi processors, Parallelism
- Synchronization, Speed-up, How to Program ??? !!!
20Forces on Computer Architecture
Technology
Programming
Languages
Applications
Computer Architecture
Cleverness
Operating
Systems
History
21Computers in the News Sony Playstation 2000
The Playstation 3 will deliver nearly 2 teraflops
overall performance, said Ken Kutaragi,
president and group CEO of Sony Computer
Entertainment
- As reported in Microprocessor Report, Vol 13, No.
5 - Emotion Engine 6.2 GFLOPS, 75 million polygons
per second - Graphics Synthesizer 2.4 Billion pixels per
second - Claim Toy Story realism brought to games!
22(No Transcript)
23Ray Kurzweil By 2029 reverse engineer the Human
Brain
http//singules-atarityhub.com/2010/01/25/kurzweil
-discusses-the-future-of-brain-computer-interfac-x
-prize-lab-video/
24Where are We Going??
???? ??????
?
25?????? ???? ??????? ????? ??? ??????
26Course Administration
- Instructors
-
- Nathan Intrator (nin_at_post.tau.ac.il)
- TA Kiril Solovey (kirilsolo_at_gmail.com )
- http//cs.tau.ac.il/nin/Courses/CompStruct/CompS
truct.htm - http//virtual.tau.ac.il
- Books
- V. C. Hamacher, Z. G. Vranesic, S. G.
Zaky Computer Organization. McGraw-Hill, 1982 - H. Taub Digital Circuits and Microporcessors.
McGraw-Hill 1982 - ?????? ??????? ??????? ??????????? ??????
- Hennessy and Patterson, Computer Organization
Design, the hardware/software interface, Morgan
Kaufman 1998
27Grading
- ????
- ???? ???? 80
- ??????? 20
- 6 - 7 ???????
28Architecture Microarchitecture Elements
- Architecture
- Registers data width (8/16/32/64)
- Instruction set
- Addressing modes
- Addressing methods (Segmentation, Paging, etc...)
- Architecture
- Physical memory size
- Caches size and structure
- Number of execution units, number of execution
pipelines - Branch prediction
- TLB
- Timing is considered Arch (though it is user
visible!) - Processors with the same arch may have different
Arch
29Compatibility
- Backward compatibility
- New hardware can run existing software
- Example Pentium? 4 can run software originally
written for Pentium? III, Pentium? II, Pentium? ,
486, 386, 286 - Forward compatibility
- New software can run on existing (old) hardware
- Example new software written with MMXTM must
still run on older Pentium processors which do
not support MMXTM - Less important than backward compatibility
- New ideas architecture independent
- JIT just in time compiler Java and .NET
- Binary translation
30How to compare between different systems?
31Benchmarks Programs for Evaluating Processor
Performance
- Toy Benchmarks
- 10-100 line programs
- e.g. sieve, puzzle, quicksort
- Synthetic Benchmarks
- Attempt to match average frequencies of real
workloads - e.g., Winstone, Dhrystone
- Real programs
- e.g., gcc, spice
- SPEC System Performance Evaluation Cooperative
- SPECint (8 integer programs)
- and SPECfp (10 floating point)
32CPI to compare systems with same instruction
set architecture (ISA)
- The CPU is synchronous - it works according to a
clock signal. - Clock cycle is measured in nsec (10-9 of a
second). - Clock rate ( 1/clock cycle) is measured in MHz
(106 cycles/second). - CPI - cycles per instruction
- Average cycles per Instruction (in a given
program) -
- IPC ( 1/CPI) Instructions per cycles
- Clock rate is mainly affected by technology, CPI
by the architecture - CPI breakdown how many cycles (on average) the
program spends for different causes e.g., in
executing, memory I/O etc.
33CPI (cont.)
- CPIi - cycles to execute a given type of
instruction - e.g. CPIadd 1, CPImul 3
- Independent of a program
- Calculating the CPI of a program
- ICi - times instruction of type i was
executed in the program - IC - instruction executed in the program
- Fi - relative frequency of instruction of
type i Fi ICi/IC - Ncyc - cycles required to execute the program
- CPI
- This calculation does not take into account other
delays such as memory, I/O
34CPU Time
- CPU Time
- The time required by the CPU to execute a given
program - CPU Time clock cycle ? cyc clock cycle
? CPI ? IC - Our goal minimize CPU Time
- Minimize clock cycle more MHz (process, circuit,
?Arch) - Minimize CPI ?Arch (e.g. more execution units)
- Minimize IC architecture (e.g. MMXTM
technology)
- Speedup due to enhancement E
35Amdahls Law
Suppose that enhancement E accelerates a fraction
F of the task by a factor S, and the remainder of
the task is unaffected, then
ExTimeold ExTimenew
1
Speedupoverall
Fractionenhanced
(1 - Fractionenhanced)
Speedupenhanced
36Amdahls Law Example
- Floating point instructions improved to run 2X
but only 10 of actual instructions are FP
ExTimenew ExTimeold x (0.9 .1/2) 0.95 x
ExTimeold
Corollary Make The Common Case Fast
37Instruction Set Design
The ISA is what the user and the compiler
sees The ISA is what the hardware needs to
implement
38Why ISA is important?
- Code size
- long instructions may take more time to be
fetched - Requires large memory (important in small
devices, e.g., cell phones) - Number of instructions (IC)
- Reducing IC reduce execution time (assuming same
CPI and frequency) - Code simplicity
- Simple HW implementation which leads to higher
frequency and lower power - Code optimization can better be applied to
simple code
39The impact of the ISA
40CISC Processors
- CISC - Complex Instruction Set Computer
- The idea a high level machine language
- Characteristic
- Many instruction types, with many addressing
modes - Some of the instructions are complex
- Perform complex tasks
- Require many cycles
- ALU operations directly on memory
- Usually uses limited number of registers
- Variable length instructions
- Common instructions get short codes ? save code
length - Example x86
41CISC Drawbacks
- Compilers do not take advantage of the complex
instructions and the complex indexing methods - Implement complex instructions and complex
addressing modes - ? complicate the processor
- ? slow down the simple, common instructions
- ? contradict Amdahls law corollary
- Make The Common Case Fast
- Variable length instructions are real pain in the
neck - It is difficult to decode few instructions in
parallel - As long as instruction is not decoded, its length
is unknown - ? It is unknown where the instruction ends
- ? It is unknown where the next instruction
starts - An instruction may not fit into the right
behavior of the memory hierarchy (will be
discussed next lectures) - Examples VAX, x86 (!?!)
42RISC Processors
- RISC - Reduced Instruction Set Computer
- The idea simple instructions enable fast
hardware - Characteristic
- A small instruction set, with only a few
instructions formats - Simple instructions
- execute simple tasks
- require a single cycle (with pipeline)
- A few indexing methods
- ALU operations on registers only
- Memory is accessed using Load and Store
instructions only. - Many orthogonal registers
- Three address machine Add dst, src1, src2
- Fixed length instructions
- Examples MIPSTM, SparcTM, AlphaTM, PowerPCTM
43RISC Processors (Cont.)
- Simple architecture ? Simple micro-architecture
- Simple, small and fast control logic
- Simpler to design and validate
- Room for on die caches instruction cache data
cache - Parallelize data and instruction access
- Shorten time-to-market
- Using a smart compiler
- Better pipeline usage
- Better register allocation
- Existing RISC processor are not pure RISC
- e.g., support division which takes many cycles
44RISC and Amdhals Law (Example)
- In comparison to the CISC architecture
- 10 of the static code, that executes 90 of the
dynamic has the same CPI - 90 of the static code, which is only 10 of the
dynamic, increases in 60 - The number of instruction being executed is
increased in 50 - The speed of the processor is doubled
- This was true for the time the RISC processors
were invented - We get
- And then
45So, what is better, RISC or CISC
- Today CISC architectures (X86) are running as
fast as RISC (or even faster) - The main reasons are
- Translates CISC instructions into RISC
instructions (ucode) - CISC architecture are using RISC like engine
- We will discuss this kind of solutions later on
in this course.
46Technology Trends Microprocessor Complexity
Itanium 2 410 Million
Athlon (K7) 22 Million
Alpha 21264 15 million Pentium Pro 5.5
million PowerPC 620 6.9 million Alpha 21164 9.3
million Sparc Ultra 5.2 million
Moores Law
2X transistors/Chip Every 1.5 years Called
Moores Law
47(No Transcript)
48(No Transcript)
49Technology Trends Processor Performance
Intel P4 2000 MHz (Fall 2001)
1.54X/yr
Performance measure
year
50Technology Trends Memory Capacity(Single-Chip
DRAM)
year size (Mbit) 1980 0.0625 1983 0.25 1986
1 1989 4 1992 16 1996 64 1998 128 2000 256 2002 5
12
- Now 1.4X/yr, or 2X every 2 years.
- 8000X since 1980!
51Technology Trends Imply Dramatic Change
- Processor
- Logic capacity about 30 per year
- Clock rate about 20 per year
- Memory
- DRAM capacity about 60 per year (4x every 3
years) - Memory speed about 10 per year
- Cost per bit improves about 25 per year
- Disk
- Capacity about 60 per year
- Total data use 100 per 9 months!
- Network Bandwidth
- Bandwidth increasing more than 100 per year!
521980-2003, CPU--DRAM Speed gap
Q. How do architects address this gap?
A. Put smaller, faster cache memories between
CPU and DRAM.
Performance (1/latency)
10000
CPU
1000
100
10
DRAM
2005
1980
2000
1990
Year
53Dimensions
2001 devices (0.18 µm)
Chip size (1 cm)
Diameter of Human Hair (25 µm)
1996 devices (0.35 µm)
2007 devices (0.01 µm)
Silicon atom radius (1.17 Å)
Deep UV Wavelength (0.248 µm)
X-ray Wavelength (0.6 nm)
Demo
54?????????? ?????? ????? ?????
- ???? ?????? / ????? ???? non issue.
- ???? Power Wall ???? ???. ??????????? ??
?????. - ???? ??????? ??????? ?"? ?????? ???? ??????
??????, ?????????? ?????, ???????????? CPU ????
(pipelining, superscalar, out-of-order execution,
speculations) - ???? ILP Wall ?????? ????? ?????? ??????? ??
?????. - ???? ??? ????, ???? ??????? ?????.
- ???? Memory Wall ??? ???? ????? ???????
??????. - (200 ?????? ???? ?DRAM 4 ???????
????) - ???? ?????? ???? ???? X 2 ?? 1.5 ????.
- ???? ?? ??"? ???? X 2 ?? 5 ??????
- ??? X 2 ?????? (????? Cores) ?? ??????.
???? 4 ?? 40 ????? ?????
55Physics / Transistors History
1906
1947
First point contact transistor (germanium),
1947 John Bardeen and Walter Brattain Bell
Laboratories
Audion (Triode), 1906 Lee De Forest
56History
1958
1997
First integrated circuit (germanium), 1958 Jack
S. Kilby, Texas Instruments Contained five
components, three types transistors resistors
and capacitors
Intel Pentium II, 1997 Clock 233MHz Number of
transistors 7.5 M Gate Length 0.35
57Annual Sales
- 1018 transistors manufactured in 2003 alone
- 100 million for every human on the planet
58(No Transcript)
59(No Transcript)
60(No Transcript)
61Integrated Circuits (2003 state-of-the-art)
- Primarily Crystalline Silicon
- 1mm - 25mm on a side
- 2003 - feature size 0.13µm 0.13 x 10-6 m
- 100 - 400M transistors
- (25 - 100M logic gates")
- 3 - 10 conductive layers
- CMOS (complementary metal oxide semiconductor)
- most common.
Bare Die
Chip in Package
- Package provides
- spreading of chip-level signal paths to
board-level - heat dissipation.
- Ceramic or plastic with gold wires.
62Printed Circuit Boards
- fiberglass or ceramic
- 1-20 conductive layers
- 1-20in on a side
- IC packages are soldered down.
63nMOS Transistor
- Four terminals gate, source, drain, body
- Gate oxide body stack looks like a capacitor
- Gate and body are conductors
- SiO2 (oxide) is a very good insulator
- Called metal oxide semiconductor (MOS)
capacitor - Even though gate is
- no longer made of metal
Off
On
64nMOS Operation
- Body is commonly tied to ground (0 V)
- When the gate is at a low voltage
- P-type body is at low voltage
- Source-body and drain-body diodes are OFF
- No current flows, transistor is OFF
Off
65nMOS Operation Cont.
- When the gate is at a high voltage
- Positive charge on gate of MOS capacitor
- Negative charge attracted to body
- Inverts a channel under gate to n-type
- Now current can flow through n-type silicon from
source through channel to drain, transistor is ON
On
66pMOS Transistor
- Similar, but doping and voltages reversed
- Body tied to high voltage (VDD)
- Gate low transistor ON
- Gate high transistor OFF
- Bubble indicates inverted behavior
67(No Transcript)
68Example Inverter
69Example NAND3
- Horizontal N-diffusion and p-diffusion strips
- Vertical polysilicon gates
- Metal1 VDD rail at top
- Metal1 GND rail at bottom
- 32 l by 40 l
70(No Transcript)
71(No Transcript)
72CMOS Inverter
A Y
0
1
73CMOS Inverter
A Y
0
1 0
74CMOS Inverter
A Y
0 1
1 0
75(No Transcript)
76(No Transcript)
77Multiplexers
- 21 multiplexer chooses between two inputs
S D1 D0 Y
0 X 0
0 X 1
1 0 X
1 1 X
78Multiplexers
- 21 multiplexer chooses between two inputs
S D1 D0 Y
0 X 0 0
0 X 1 1
1 0 X 0
1 1 X 1
79Transmission Gate Mux
- Nonrestoring mux uses two transmission gates
- Only 4 transistors
80out
81?? ????? ????
- Computer Architecture integrates few levels,
from programming languages to logic design. - Instruction Set Architecture (ISA)
- Amdahls law
- Moors law
- Processor (CPU) --- Memory speed gap
- History
- Transistors. What, and how.
- From transistors to logic design