ECE472 Computer Architecture Patrick Chiang TA: Kang-Min Hu

1 / 64
About This Presentation
Title:

ECE472 Computer Architecture Patrick Chiang TA: Kang-Min Hu

Description:

Computer Architecture Patrick Chiang TA: Kang-Min Hu Is this class for you? This class will not be easy My first quarter of teaching computer architecture at Oregon ... – PowerPoint PPT presentation

Number of Views:10
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: ECE472 Computer Architecture Patrick Chiang TA: Kang-Min Hu


1
ECE472Computer ArchitecturePatrick
ChiangTA Kang-Min Hu
2
Is this class for you?
  • This class will not be easy
  • My first quarter of teaching computer
    architecture at Oregon State
  • Assumes good mastery of basic assembly language
    programming
  • What is the class makeup?
  • ECE 1/2
  • CS 1/2
  • This is ECE472, and emphasizes the hardware
    side of Comp. Arch.
  • There is CS472 in Spring 2008 quarter
  • Class Breakdown
  • 5 Homeworks 10
  • 1 Midterm 20
  • 1 Project 30
  • 1 Final 40
  • Average grade around B/B, with some flexibility

3
Today Whats the big picture?
  • Syllabus Given this Thursday
  • Start with the C-code
  • Do the assembly language
  • FIRST How to evaluate whether a computer is
    fast, or good?
  • Execution Time (time to run process(s))
  • Power
  • Cost
  • Flexibility (complexity, programmability)

4
What do Computer Architects Do?
ECE471 Digital VLSI
5
What is Computer Architecture?
  • Understanding every level of the complete system
  • Software
  • Compiler
  • Computer Architecture
  • VLSI digital circuit design
  • For SOC, even analog/mixed-signal design
  • Devices
  • For a engineer, you must understand depth and
    breadth
  • Everything is related
  • Must understand every level of the problem to
    make the right choices
  • Cannot just black-box and say Not my problem.
    Someone else will solve it.
  • Choice of where you want to go next depends on
    understanding changes along the entire vertical
    structure
  • How is the technology changing? Are there
    fundamental shifts?
  • i.e. multi-core, parallel processing
  • Execution Time ?

6
Write Some C Code for Me
  • C code
  • What does the complier do?
  • Assembly language

7
Now that we have assembly code, how do we
evaluate performance?
  • Execution time
  • Is execution time the only metric for performance?
  • What about power?
  • What about cost?
  • What about usability/programmability?

8
Notice one thing about your C Code Application
Specific
  • Where are you running this code?
  • Laptop
  • Desktop
  • Cellphone
  • Google Server Farm
  • Digital Signal Processor
  • Each application has completely different
    fundamentals and constraints

9
Do a DSP Calculation now--
  • Write C-code for DSP
  • i.e. Polygon Rendering for X-box Halo 3
  • MP3 Decode
  • Write assembly code for this

10
Do a Transaction Processing Code Now--
  • Google query--?

11
Processor-based Digital Systems
  • Systems with a programmable, general-purpose
    processor
  • Advantages ??
  • Computers are the canonical example
  • PCs, laptops, workstations,
  • However, most processors are embedded or in
    servers
  • Game consoles, PDAs, cell phones,
  • Printers, car electronics system,
  • Web servers, database servers,

12
FUTURE Why are we going here--?
13
Overall System Architecture
  • Multiple interacting layers
  • Term architecture used with all of them
  • This class focuses on
  • Hardware architecture
  • Memory, interconnect, IO
  • Clusters
  • Reliability low power systems
  • Hardware-software interaction
  • Programming for performance
  • OS support
  • Cluster programming
  • Virtual machines security

14
Application Constraints Opportunities
  • Applications drive machine balance
  • Scientific computations
  • Floating-point performance
  • Main memory bandwidth
  • Transaction/web processing
  • ??
  • Multimedia processing
  • ??
  • Embedded control
  • ??

Architecture concepts typically exploit
application behavior
15
Applications Change over Time
  • Data-sets memory requirements ? larger
  • Cache memory architecture become more critical
  • Standalone ? networked
  • IO integration system software become more
    critical
  • Single task ? multiple tasks
  • Parallel architectures become critical
  • Limited IO requirements ? rich IO requirements
  • 60s tapes punch cards
  • 70s character oriented displays
  • 80s video displays, audio, hard disks
  • 90s 3D graphics networking, high-quality audio
  • 00s real-time video, immersion,

16
Application Properties to Exploit in Computer
Design
  • Locality in memory/IO references
  • Programs work on subset of instructions/data at
    any point in time
  • Both spatial and temporal locality
  • Parallelism
  • Data-level (DLP) same operation on every element
    of a data sequence
  • Instruction-level (ILP) independent instructions
    within sequential program
  • Thread-level (TLP) parallel tasks within one
    program
  • Multi-programming independent programs
  • Pipelining
  • Predictability
  • Control-flow direction, memory references, data
    values

17
Technology Trends ConstraintsYearly
Improvement
  • Integrated circuits logic
  • 60 more devices per chip
  • 15 faster devices
  • Long wires dont improve
  • Integrated circuits DRAM
  • 60 more devices per chip
  • 7 reduction in latency
  • 14 increase in bandwidth
  • Magnetic Disks
  • 60 to 100 increase in density
  • IO/networking
  • Little improvement in latency
  • Large improvements in bandwidth through fast/wide
    signaling

2001
1998
1995
1992
64x more devices since 19924x faster devices
18
Changes in Technology Applications lead to
Changes in Architecture
  • 1970s
  • Multi-chip CPUs
  • Semiconductor memory very expensive
  • Complex instruction sets (good code density)
  • Microcoded control
  • 1980s
  • 5K 500 K transistors
  • Single-chip, pipelined CPUs
  • On-chip memory possible
  • Simple, hard-wired control
  • Simple instruction sets
  • Small on-chip caches
  • 1990s
  • 1 M - 64M transistors, 64b CPUs
  • Complex control to exploit instruction-level
    parallelism
  • Deep pipelines
  • Multi-level caches
  • 2000s
  • 100 M - 5 B transistors
  • Slow wires, power consumption, design,
    complexity, memory latency, IO bottlenecks,
  • Multiprocessors parallel systems
  • Support programming for parallelism?
  • ltltyour Ph.D. thesis goes heregtgt

Keeps computer architecture interesting and
challenging
19
Rules of Thumb in Data Engineeringby J. Gray and
Prashant Shenoy
  • Storage
  • Moores Law Things get 4x denser every three
    years.
  • You need an extra bit of addressing every 18
    months.
  • Storage capacities increase 100x per decade.
  • Storage device throughput increases 10x per
    decade.
  • Disk data cools 10x per decade.
  • Disk page sizes increase 5x per decade.
  • NearlineTapeOnlineDiskRAM storage cost ratios
    are approximately 13300.
  • In ten years RAM will cost what disk costs today.
  • A person can administer a million dollars of disk
    storage
  • Disks are replacing tapes as backup devices.
  • On random workloads, disk mirroring is preferable
    to RAID5 parity because it spends disk space
    (which is plentiful) to save disk accesses (which
    are precious).

20
Metrics of Efficiency
  • Desktop computing (500 - 3K)
  • Metrics ??
  • Prominent processors Intel Pentium, AMD Athlon,
    PowerPC G5
  • Server computing (3K - 1M)
  • Metrics ??
  • Prominent processors IBM Power5, Sun UltraSparc,
    AMD Opteron
  • Embedded computing (10 - 500)
  • Metrics ??
  • Prominent processors ARM, MIPS, Motorola 68K,
    many others
  • Diversity in requirements leads to diversity in
    architectures

21
Performance Metrics
Plane
Speed
DC to Paris
Passengers
Throughput (pmph)
Boeing 747
610 mph
6.5 hours
470
286,700
BAD/Sud Concorde
1350 mph
3 hours
132
178,200
  • Latency or execution time or response time
  • Wall-clock time to complete a task
  • Important if all we have to run is a single or a
    time-critical time to run
  • Bandwidth or throughput or execution rate
  • Number of tasks completed per unit of time
  • Bandwidth total amount of work / total
    execution time
  • Metric is independent of exact number of tasks
    executed
  • Important when we have many tasks to run
  • What about Power? What about Cost? What about
    Reliability?

22
Examples
  • Latency metric program execution time in seconds
  • Your system architecture can affect all of them
  • CPI memory latency, IO latency,
  • CCT cache organization,
  • IC OS overhead,

23
A is Faster than B?
  • Given the CPUtime for machines A and B, A is X
    times faster than B means
  • Example, CPUtimeA3.4sec CPUtimeB5.3sec then
  • A is 5.3/3.41.55 times faster than B or 55
    faster
  • If you start with bandwidth metrics of
    performance, use inverse ratio

24
Speedup and Amdahls Law
  • Speedup CPUtimeold / CPUtimenew
  • Given an optimization x that accelerates fraction
    fx of program by a factor of Sx, how much is the
    overall speedup?
  • Lessons from Amdhals law
  • Make common cases fast as fx?1, speedup?Sx
  • But dont overoptimize common case as Sx??,
    speedup? 1 / (1-fx)
  • Speedup is limited by the fraction of the code
    that can be accelerated
  • Uncommon case will eventually become the common
    one

25
Amdahls Law Example
  • If Sx100, what is the overall speedup as a
    function of fx?

26
Historical Trend for Computer Performance
55 faster per year
Integer Performance
27
To Put it Into Perspective
  • 1982-2000 computers getting 55 faster per year
  • Total of 4,000x
  • Significant cost improvements as well
  • What if other areas showed similar improvement
    rates?
  • Cars 176,000 mph or 64,000 miles/gal
  • Airplanes LA to NY in 5.5sec (MACH 3200)
  • Wheat 320,000 bushels per acre

28
Digital System Cost
  • Cost is a very important design constraint
  • Most digital systems are consumer electronic
    produces
  • Cost distribution for 1K PC
  • Processor board 37
  • Processor, memory,
  • IO devices 37
  • Hard disk, DVD, monitor, keyboard,
  • Software 20
  • Cabinet 6
  • Integrated circuits represent significant part of
    the system cost
  • Processor, memory, hard disk controller, graphics
    chips, networking chip

29
Cost of Integrated Circuits
30
Chip Cost is a Function of Size
Chip cost increases roughly with die area4
31
Cost Performance Tradeoff
  • The trade-off
  • Chip cost is primarily a function of die area4
  • But bigger dies provide more resources for higher
    performance
  • The goal of a good architect
  • Find the knee of the performance-cost curve OR
  • Get maximum performance for a fixed cost target

32
Other Cost Contributors
  • Testing cost
  • Cost/die (cost/hour x test time) / yield
  • Could be 10-20 or more for complex chips
  • IC Packaging
  • Depends on die size, number of pins, and power
    dissipation
  • Cost of cooling system
  • lt2W no heat-sink, lt10W no fan, gt100W
    liquid/spray cooling
  • And most of all, do not forget VOLUME
  • Cost of a modern IC fabrication facility gt2B
  • Cost of a set of masks for a wafer 0.5M - 1M
  • Design NRE cost often 10M
  • Need volume to amortize all this cost

33
Cost Vs Price
  • Price is really what your customer cares about
  • Price components for a system vendor
  • Component cost buying the parts
  • 47 of list price for 1K PC
  • Direct costs labor, warranties, dealing with
    scrap,
  • 10 of list price for 1K PC
  • Gross margin company overhead
  • RD, marketing, sales, buildings, maintenance ,
    taxes,
  • 19 of list price for 1K PC
  • Average discount plan for volume discounts
  • 25 of list price for 1K PC
  • As computers become commodity components, price
    matters a lot!

34
Historical Trend for Processor Price
35
Summary
  • Computer architecture
  • Design of efficient systems given the
    requirements of applications and the
    capabilities/constraints of technology
  • Need to look a few years ahead with both
    applications technology
  • Applications
  • Look for locality, parallelism, and
    predictability
  • Technology
  • Dealing with latency, power, and reliability are
    the upcoming challenges
  • Performance cost
  • Two important efficiency metrics for most systems
  • Latency Vs. bandwidth performance metrics
  • Cost Vs. price

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
Multiple Processors on Single Chip
  • Two processors on single-chip
  • Two chips(w/ two processors) in single package
  • 16 64 256 processors on single die
  • Stream Processors
  • Sun Niagara
  • http//www.ece.ucdavis.edu/ocin06/talks/ho.pdf

44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
What does Moores Law buy you?
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
Write a Comment
User Comments (0)