Title: Cost
1Cost Performance
- Dr. Doug L. Hoffman
- Computer Science 330
- Spring 2000
2Performance
- Purchasing perspective
- given a collection of machines, which has the
- best performance ?
- least cost ?
- best performance / cost ?
- Design perspective
- faced with design options, which has the
- best performance improvement ?
- least cost ?
- best performance / cost ?
- Both require
- basis for comparison
- metric for evaluation
- Our goal is to understand cost performance
implications of architectural choices
3Integrated Circuits Costs
- IC cost Die cost Testing cost
Packaging cost - Final
test yield - Die cost Wafer cost
- Dies per Wafer Die
yield
4Real World Examples
- Chip Metal Line Wafer Defect Area Dies/ Yield Die
Cost layers width cost
/cm2 mm2 wafer - 386DX 2 0.90 900 1.0 43 360 71 4
- 486DX2 3 0.80 1200 1.0 81 181 54 12
- PowerPC 601 4 0.80 1700 1.3 121 115 28 53
- HP PA 7100 3 0.80 1300 1.0 196 66 27 73
- DEC Alpha 3 0.70 1500 1.2 234 53 19 149
- SuperSPARC 3 0.70 1700 1.6 256 48 13 272
- Pentium 3 0.80 1500 1.5 296 40 9 417
- From "Estimating IC Manufacturing Costs, by
Linley Gwennap, Microprocessor Report, August 2,
1993, p. 15
5Cost/PerformanceWhat is Relationship of Cost to
Price?
- Component Costs
- Direct Costs (add 25 to 40) recurring costs
labor, purchasing, scrap, warranty - Gross Margin (add 82 to 186) nonrecurring
costs RD, marketing, sales, equipment
maintenance, rental, financing cost, pretax
profits, taxes - Average Discount to get List Price (add 33 to
66) volume discounts and/or retailer markup
List Price
25 to 40
Avg. Selling Price
34 to 39
6 to 8
Direct Cost
15 to 33
6Chip Prices (August 1993)
- Assume purchase 10,000 units
Chip Area Mfg. Price Multi- Comment mm2 cost pli
er 386DX 43 9 31 3.4 Intense
Competition 486DX2 81 35 245 7.0 No
Competition PowerPC 601 121 77 280 3.6 DEC
Alpha 234 202 1231 6.1 Recoup
RD? Pentium 296 473 965 2.0 Early in
shipments
7Summary Price vs. Cost
8Cost Performance Trends(Summary)
- Workstation performance (measured in Spec Marks)
improves roughly 50 per year (2X every 18
months) - Improvement in cost performance estimated at 70
per year
9Two notions of performance
Plane
Boeing 747
BAD/Sud Concodre
Which has higher performance?
Time to do the task (Execution Time)
execution time, response time, latency Tasks
per day, hour, week, sec, ns. .. (Performance)
throughput, bandwidth Response time and
throughput often are in opposition
10Some Definitions
- Performance is in units of things-per-second
- bigger is better
- If we are primarily concerned with response time
- performance(x) _1_______
execution_time(x) - " X is n times faster than Y" means
- Performance(X)
- n --------------------
-- - Performance(Y)
11Example
- Time of Concorde vs. Boeing 747?
- Concord is 1350 mph / 610 mph 2.2 times faster
-
6.5 hours / 3 hours - Throughput of Concorde vs. Boeing 747 ?
- Concord is 178,200 pmph / 286,700 pmph 0.62
times faster - Boeing is 286,700 pmph / 178,200 pmph 1.6
times faster - Boeing is 1.6 times (60)faster in terms of
throughput - Concord is 2.2 times (120) faster in terms of
flying time - We will focus primarily on execution time for a
single job
12Performance Evaluation
- For better or worse, benchmarks shape the field
- Good products created when have
- Good benchmarks
- Good ways to summarize performance
- Given sales is a function in part of performance
relative to competition, investment in improving
product as reported by performance summary - If benchmarks/summary inadequate, then choose
between improving product for real programs vs.
improving product to get more salesSales almost
always wins! - Execution time is the measure of computer
performance!
13Performance Measures
- MIPS
- Millions of Instructions Per Second (what is an
instruction?) - Meaningless Indicator of Performance
- Often quoted in terms of VAX MIPS.
- MFLOPS
- Millions of Floating-point Operations Per Second.
- System Performance Evaluation Cooperative (SPEC)
- Started in 1998
- Split into SPECint and SPECfp (Updated in 1995)
- Benchmarks are the only way!
14Basis of Evaluation
Cons
Pros
- very specific
- non-portable
- difficult to run, or
- measure
- hard to identify cause
Actual Target Workload
- portable
- widely used
- improvements useful in reality
Full Application Benchmarks
Small Kernel Benchmarks
- easy to run, early in design cycle
- peak may be a long way from application
performance
- identify peak capability and potential
bottlenecks
Microbenchmarks
15SPEC First Round
- One program 99 of time in single line of code
- New front-end compiler could improve dramatically
16Performance is...
- Time to run the task
- Execution time, response time, latency
- Tasks per day, hour, week, sec, ns,
- Throughput vs. bandwidth
- X is n times faster than Y means
- ExTime(Y) Performance(X)
- --------- --------------
- ExTime(X) Performance(Y)
-
17CPI
Average cycles per instruction
CPI (CPU Time Clock Rate) / Instruction Count
Clock Cycles / Instruction Count
n
CPU time ClockCycleTime ?CPI I
i
i
i 1
n
CPI ? CPI F where F
I
i
i
i
i
i 1
Instruction Count
"instruction frequency"
- Invest Resources where time is Spent!
18Example (RISC processor)
Base Machine (Reg / Reg) Op Freq Cycles CPI(i)
Time ALU 50 1 .5 23 Load 20 5
1.0 45 Store 10 3 .3 14 Branch 20 2
.4 18 2.2
Typical Mix
How much faster would the machine be is a better
data cache reduced the average load time to 2
cycles? How does this compare with using branch
prediction to shave a cycle off the branch
time? What if two ALU instructions could be
executed at once?
19Amdahl's Law
- Speedup due to enhancement
-
- Suppose that enhancement E accelerates a fraction
F of the task - by a factor S and the remainder of the task is
unaffected then, - ExTime(with E) ((1-F) F/S) X ExTime(without
E)
20Evaluating Instruction Sets
- Design-time metrics
- Can it be implemented, in how long, at what
cost? - Can it be programmed? Ease of compilation?
- Static Metrics
- How many bytes does the program occupy in
memory? - Dynamic Metrics
- How many instructions are executed?
- How many bytes does the processor fetch to
execute the program? - How many clocks are required per instruction?
- How "lean" a clock is practical?
- Best Metric Time to execute the program!
-
NOTE this depends on instructions set, processor
organization, and compilation
techniques.
21Summary
- Amdahls Law
- CPI Law
- Execution time is the REAL measure of computer
performance! - Good products created when we have
- Good benchmarks, good ways to summarize
performance