Title: ECE472 Computer Architecture Patrick Chiang TA: Kang-Min Hu
1ECE472Computer ArchitecturePatrick
ChiangTA Kang-Min Hu
2Is this class for you?
- This class will not be easy
- My first quarter of teaching computer
architecture at Oregon State - Assumes good mastery of basic assembly language
programming - What is the class makeup?
- ECE 1/2
- CS 1/2
- This is ECE472, and emphasizes the hardware
side of Comp. Arch. - There is CS472 in Spring 2008 quarter
- Class Breakdown
- 5 Homeworks 10
- 1 Midterm 20
- 1 Project 30
- 1 Final 40
- Average grade around B/B, with some flexibility
3Today Whats the big picture?
- Syllabus Given this Thursday
- Start with the C-code
- Do the assembly language
- FIRST How to evaluate whether a computer is
fast, or good?
- Execution Time (time to run process(s))
- Power
- Cost
- Flexibility (complexity, programmability)
4What do Computer Architects Do?
ECE471 Digital VLSI
5What is Computer Architecture?
- Understanding every level of the complete system
- Software
- Compiler
- Computer Architecture
- VLSI digital circuit design
- For SOC, even analog/mixed-signal design
- Devices
- For a engineer, you must understand depth and
breadth - Everything is related
- Must understand every level of the problem to
make the right choices - Cannot just black-box and say Not my problem.
Someone else will solve it. - Choice of where you want to go next depends on
understanding changes along the entire vertical
structure - How is the technology changing? Are there
fundamental shifts? - i.e. multi-core, parallel processing
- Execution Time ?
6Write Some C Code for Me
- C code
- What does the complier do?
- Assembly language
7Now that we have assembly code, how do we
evaluate performance?
- Is execution time the only metric for performance?
- What about usability/programmability?
8Notice one thing about your C Code Application
Specific
- Where are you running this code?
- Laptop
- Desktop
- Cellphone
- Google Server Farm
- Digital Signal Processor
- Each application has completely different
fundamentals and constraints
9Do a DSP Calculation now--
- Write C-code for DSP
- i.e. Polygon Rendering for X-box Halo 3
- MP3 Decode
- Write assembly code for this
10Do a Transaction Processing Code Now--
11Processor-based Digital Systems
- Systems with a programmable, general-purpose
processor - Advantages ??
- Computers are the canonical example
- PCs, laptops, workstations,
- However, most processors are embedded or in
servers - Game consoles, PDAs, cell phones,
- Printers, car electronics system,
- Web servers, database servers,
12FUTURE Why are we going here--?
13Overall System Architecture
- Multiple interacting layers
- Term architecture used with all of them
- This class focuses on
- Hardware architecture
- Memory, interconnect, IO
- Clusters
- Reliability low power systems
- Hardware-software interaction
- Programming for performance
- OS support
- Cluster programming
- Virtual machines security
14Application Constraints Opportunities
- Applications drive machine balance
- Scientific computations
- Floating-point performance
- Main memory bandwidth
- Transaction/web processing
- ??
- Multimedia processing
- ??
- Embedded control
- ??
Architecture concepts typically exploit
application behavior
15Applications Change over Time
- Data-sets memory requirements ? larger
- Cache memory architecture become more critical
- Standalone ? networked
- IO integration system software become more
critical - Single task ? multiple tasks
- Parallel architectures become critical
- Limited IO requirements ? rich IO requirements
- 60s tapes punch cards
- 70s character oriented displays
- 80s video displays, audio, hard disks
- 90s 3D graphics networking, high-quality audio
- 00s real-time video, immersion,
16Application Properties to Exploit in Computer
Design
- Locality in memory/IO references
- Programs work on subset of instructions/data at
any point in time - Both spatial and temporal locality
- Parallelism
- Data-level (DLP) same operation on every element
of a data sequence - Instruction-level (ILP) independent instructions
within sequential program - Thread-level (TLP) parallel tasks within one
program - Multi-programming independent programs
- Pipelining
- Predictability
- Control-flow direction, memory references, data
values
17Technology Trends ConstraintsYearly
Improvement
- Integrated circuits logic
- 60 more devices per chip
- 15 faster devices
- Long wires dont improve
- Integrated circuits DRAM
- 60 more devices per chip
- 7 reduction in latency
- 14 increase in bandwidth
- Magnetic Disks
- 60 to 100 increase in density
- IO/networking
- Little improvement in latency
- Large improvements in bandwidth through fast/wide
signaling
2001
1998
1995
1992
64x more devices since 19924x faster devices
18Changes in Technology Applications lead to
Changes in Architecture
- 1970s
- Multi-chip CPUs
- Semiconductor memory very expensive
- Complex instruction sets (good code density)
- Microcoded control
- 1980s
- 5K 500 K transistors
- Single-chip, pipelined CPUs
- On-chip memory possible
- Simple, hard-wired control
- Simple instruction sets
- Small on-chip caches
- 1990s
- 1 M - 64M transistors, 64b CPUs
- Complex control to exploit instruction-level
parallelism - Deep pipelines
- Multi-level caches
- 2000s
- 100 M - 5 B transistors
- Slow wires, power consumption, design,
complexity, memory latency, IO bottlenecks, - Multiprocessors parallel systems
- Support programming for parallelism?
- ltltyour Ph.D. thesis goes heregtgt
Keeps computer architecture interesting and
challenging
19Rules of Thumb in Data Engineeringby J. Gray and
Prashant Shenoy
- Storage
- Moores Law Things get 4x denser every three
years. - You need an extra bit of addressing every 18
months. - Storage capacities increase 100x per decade.
- Storage device throughput increases 10x per
decade. - Disk data cools 10x per decade.
- Disk page sizes increase 5x per decade.
- NearlineTapeOnlineDiskRAM storage cost ratios
are approximately 13300. - In ten years RAM will cost what disk costs today.
- A person can administer a million dollars of disk
storage - Disks are replacing tapes as backup devices.
- On random workloads, disk mirroring is preferable
to RAID5 parity because it spends disk space
(which is plentiful) to save disk accesses (which
are precious).
20Metrics of Efficiency
- Desktop computing (500 - 3K)
- Metrics ??
- Prominent processors Intel Pentium, AMD Athlon,
PowerPC G5 - Server computing (3K - 1M)
- Metrics ??
- Prominent processors IBM Power5, Sun UltraSparc,
AMD Opteron - Embedded computing (10 - 500)
- Metrics ??
- Prominent processors ARM, MIPS, Motorola 68K,
many others - Diversity in requirements leads to diversity in
architectures
21Performance Metrics
Plane
Speed
DC to Paris
Passengers
Throughput (pmph)
Boeing 747
610 mph
6.5 hours
470
286,700
BAD/Sud Concorde
1350 mph
3 hours
132
178,200
- Latency or execution time or response time
- Wall-clock time to complete a task
- Important if all we have to run is a single or a
time-critical time to run - Bandwidth or throughput or execution rate
- Number of tasks completed per unit of time
- Bandwidth total amount of work / total
execution time - Metric is independent of exact number of tasks
executed - Important when we have many tasks to run
- What about Power? What about Cost? What about
Reliability?
22Examples
- Latency metric program execution time in seconds
- Your system architecture can affect all of them
- CPI memory latency, IO latency,
- CCT cache organization,
- IC OS overhead,
23A is Faster than B?
- Given the CPUtime for machines A and B, A is X
times faster than B means - Example, CPUtimeA3.4sec CPUtimeB5.3sec then
- A is 5.3/3.41.55 times faster than B or 55
faster - If you start with bandwidth metrics of
performance, use inverse ratio
24Speedup and Amdahls Law
- Speedup CPUtimeold / CPUtimenew
- Given an optimization x that accelerates fraction
fx of program by a factor of Sx, how much is the
overall speedup? - Lessons from Amdhals law
- Make common cases fast as fx?1, speedup?Sx
- But dont overoptimize common case as Sx??,
speedup? 1 / (1-fx) - Speedup is limited by the fraction of the code
that can be accelerated - Uncommon case will eventually become the common
one
25Amdahls Law Example
- If Sx100, what is the overall speedup as a
function of fx?
26Historical Trend for Computer Performance
55 faster per year
Integer Performance
27To Put it Into Perspective
- 1982-2000 computers getting 55 faster per year
- Total of 4,000x
- Significant cost improvements as well
- What if other areas showed similar improvement
rates? - Cars 176,000 mph or 64,000 miles/gal
- Airplanes LA to NY in 5.5sec (MACH 3200)
- Wheat 320,000 bushels per acre
28Digital System Cost
- Cost is a very important design constraint
- Most digital systems are consumer electronic
produces - Cost distribution for 1K PC
- Processor board 37
- Processor, memory,
- IO devices 37
- Hard disk, DVD, monitor, keyboard,
- Software 20
- Cabinet 6
- Integrated circuits represent significant part of
the system cost - Processor, memory, hard disk controller, graphics
chips, networking chip
29Cost of Integrated Circuits
30Chip Cost is a Function of Size
Chip cost increases roughly with die area4
31Cost Performance Tradeoff
- The trade-off
- Chip cost is primarily a function of die area4
- But bigger dies provide more resources for higher
performance - The goal of a good architect
- Find the knee of the performance-cost curve OR
- Get maximum performance for a fixed cost target
32Other Cost Contributors
- Testing cost
- Cost/die (cost/hour x test time) / yield
- Could be 10-20 or more for complex chips
- IC Packaging
- Depends on die size, number of pins, and power
dissipation - Cost of cooling system
- lt2W no heat-sink, lt10W no fan, gt100W
liquid/spray cooling - And most of all, do not forget VOLUME
- Cost of a modern IC fabrication facility gt2B
- Cost of a set of masks for a wafer 0.5M - 1M
- Design NRE cost often 10M
- Need volume to amortize all this cost
33Cost Vs Price
- Price is really what your customer cares about
- Price components for a system vendor
- Component cost buying the parts
- 47 of list price for 1K PC
- Direct costs labor, warranties, dealing with
scrap, - 10 of list price for 1K PC
- Gross margin company overhead
- RD, marketing, sales, buildings, maintenance ,
taxes, - 19 of list price for 1K PC
- Average discount plan for volume discounts
- 25 of list price for 1K PC
- As computers become commodity components, price
matters a lot!
34Historical Trend for Processor Price
35Summary
- Computer architecture
- Design of efficient systems given the
requirements of applications and the
capabilities/constraints of technology - Need to look a few years ahead with both
applications technology - Applications
- Look for locality, parallelism, and
predictability - Technology
- Dealing with latency, power, and reliability are
the upcoming challenges - Performance cost
- Two important efficiency metrics for most systems
- Latency Vs. bandwidth performance metrics
- Cost Vs. price
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Multiple Processors on Single Chip
- Two processors on single-chip
- Two chips(w/ two processors) in single package
- 16 64 256 processors on single die
- Stream Processors
- Sun Niagara
- http//www.ece.ucdavis.edu/ocin06/talks/ho.pdf
44(No Transcript)
45(No Transcript)
46(No Transcript)
47What does Moores Law buy you?
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)