Title: NEW TRENDS IN COMPUTER ARCHITECTURE DESIGN
1NEW TRENDS IN COMPUTER ARCHITECTURE DESIGN
Saeid Nooshabadi Arthur Sale University of
Tasmania
2Outline
- Desktop/Server Microprocessor State of the Art
- Current Processors Limit
- Embedded Processors Market
- Mobile Multimedia Computing as New Direction
- Conclusion
3Computer in the NewsTechnology Marches on (1)
- SANTA CLARA, Calif., March 8, 2000 --
- Intel Corporation today introduced the Intel
Pentium III processor 1.0 GHz (GigaHertz or
1,000 MegaHertz), the world's highest performance
microprocessor for PCs. The Pentium III processor
at 1 GHz delivers a 15 percent performance gain
over the fastest processors on the market today. - Source http//www.intel.com
4Computer in the NewsTechnology Marches on (2)
- INTEL DEVELOPER FORUM, Calif., Feb. 15, 2000 -
- Intel Corporation Chairman Andrew S. Grove
today kicked off the semi-annual Intel Developer
Forum by demonstrating the company's fastest
microprocessor a chip running at 1.5 GHz, or 1.5
billion clock cycles per second, at room
temperature. Based on a new microarchitecture
from Intel, the chip is code-named "Willamette."
(To be marketed towards end of the year)
Source http//www.intel.com - Who needs 1.5 GHz Processor?
5State of the Art Alpha 21264
- 15M transistors
- 2 x 64KB caches on chip 16MB L2 cache off chip
- Clock lt1.7 nsec, or gt600 MHz (Fastest Cray
Supercomputer T90 2.2 nsec) - 90 watts
- Superscalar fetch up to 6 instructions/clock
cycle, retires up to 4 instruction/clock cycle - Execution out-of-order
6Processor Limit DRAM Gap
7Processor-Memory Performance Gap Tax (1)
- Processor Area Transistors
- (cost) (power)
- Alpha 21164 37 77
- StrongArm SA110 61 94
- Pentium Pro 64 88
- 2 dies per package Proc/I/D L2
- Caches have no inherent value, only try to close
performance gap - COST F(Area4)
8Processor-Memory Performance Gap Tax (2)
- Microprocessor-DRAM performance gap
- time of a full cache miss in instructions
executed - 1st Alpha (7000) 340 ns/5.0 ns  68 clks x 2
or 136 - 2nd Alpha (8400) 266 ns/3.3 ns  80 clks x 4 or
320 - 3rd Alpha (21264) 180 ns/1.7 ns 108 clks x 6 or
648 - 1/2X latency x 3X clock rate x 3X Instr/clock ?
5X
9Todays Situation Microprocessor
- MIPS MPUs R5000 R10000 10k/5k
- Clock Rate 200 MHz 195 MHz 1.0x
- On-Chip Caches 32K/32K 32K/32K 1.0x
- Instructions/Cycle 1( FP) 4 4.0x
- Pipe stages 5 5-7 1.2x
- Model In-order Out-of-order ---
- Die Size (mm2) 84 298 3.5x
- without cache, TLB 32 205 6.3x
- Development (man yr.) 60 300 5.0x
- SPECint_base95 5.7 8.8 1.6x
10Processors Evaluation Metrics
- SPECint95 Suit of Integer Programs
- SPECft95 Suit of Floating Point Programs
- TCP-C On Line Transaction Processing Programs
(OLTP) - All state of the arts processors perform well for
SPECint95 and SPECft95 (scientific and technical
applications) - TCP-C ?
11Processor Limits for TPC-C
-
SPEC- - Pentium Pro
int95 TPC-C - Multilevel Caches Miss rate 1MB L2 cache 0.5
5 - Superscalar (2-3 instr. retired/clock) clks
40 10 - Out-of-Order Execution speedup
2.0X 1.4X - Clocks per Instruction
0.8 3.4 - Peak performance
40 10
source Bhandarkar, D. Ding, J. Performance
characterization of the Pentium Pro processor.
Proc. 3rd Int'l. Symp. on High-Performance
Computer Architecture, Feb 1997. p. 288-97.
12Embedded Processor Market
- Over 97 of the processors fabricated
- 50 of the revenues from processor sales
- Embedded devices cover wide range products
- simple devices such as thermostats and toasters
- complex and mission-critical applications such as
avionics systems. - In between are phones, facsimile machines, ATM
switches, digital cameras, automotive
applications, set-top boxes, ...
13Embedded Processor Design
- Drives the technology Post-PC era
- Embedded processors incorporate capabilities
traditionally associated with the conventional
CPUs. - They are subject to challenging
- cost,
- power consumption,
- and application- imposed constraints.
14Intel Embedded Mobile Celeron Processor
- Available at 600, 566, 533, 500 and 466 MHz.
- Dynamic Execution technology.
- Includes Intel MMX media enhancement technology.
- Intel Streaming SIMD Extensions (available on the
Intel Celeron Processor at 566 and 600 MHz). - 32 Kbyte (16 Kbyte/16 Kbyte) Level 1 cache.
- 128 Kbyte integrated Level 2 cache.
- 66 MHz Intel P6 micro-architecture's
multitransaction system bus. - Intel Chipset support Intel 810 chipset, Intel
810E chipset, Intel 440BX, Intel 440EX and the
Intel 440ZX-66 AGPset. - Power 17 - 30 Watts Source http//www.intel.com
15Desktop/Server Processors Summary (1)
- SPEC performance doubling / 18 months
- Growing CPU-DRAM performance gap tax
- Running out of ideas, competition? Back to 2X /
2.3 yrs? - Benchmarks SPEC-int, SPEC-ft, TPC (for OLTP)
- Benchmark highest optimization, ship lowest
optimization? - Processor tricks not as useful for transactions?
- Clock rate increase compensated by CPI increase?
- When gt 100 MIPS on TPC-C?
16Desktop/Server Processors Summary (2)
- Embedded processors promising
- Strong ARM 110 233 MHz, 268 MIPS, 0.36W typ.,
49 - 1/10 cost, 1/100 power, 1/2 integer performance?
- Consolidation of desktop industry? Innovation?
- Time to look for the computing trends and
applications of tomorrow?
17Billion Transistor Architectures and Stationary
Computer Metrics
- SS Trace SMT CMP IA-64 RAW
- SPEC Int
- SPEC FP
- TPC (DataBse)
- SW Effort
- Design Scal.
- Physical Design Complexity
- (See IEEE Computer (9/97), Special Issue on
Billion Transistor Microprocessors) - Very Long Instruction Word (Intel,HP
IA-64/Merced) - multiple ops/ instruction, compiler controls
parallelism - Coined as the next generation Intel/HP processor
- Renamed Itanium (October 99)
18Current Computer Design with the Bias for the Past
- Most Billion Transistor Architectures show high
physical design complexity - Most show impressive performance for SPEC suits
of programs - Suitablity
- suitable for high end traditonal applications
- unsuitable for pervasive computing environment of
the future - high power budget (gt180 Watts),
- expensive (gt500)
- Applications of past to design computers of future
19Challenge for Future Microprocessors
- ...wires are not keeping pace with scaling of
other features. In fact, for CMOS processes
below 0.25 micron ... an unacceptably small
percentage of the die will be reachable during a
single clock cycle. - Architectures that require long-distance, rapid
interaction will not scale well ... - Will Physical Scalability Sabotage Performance
Gains? Matzke, IEEE Computer (9/97)
20Computer in the NewsExpert Talking
- Intel specializes in designing
microprocessors for the desktop PC, which in five
years may no longer be the most important type of
computer. Its successor may be a personal mobile
computer that integrates the portable computer
with a cellular phone, digital camera, and video
game player Such devices require low- cost,
energy- efficient microprocessors, and Intel is
far from a leader in that area. - -David Patterson, NY Times, June 9, 1998
- David Patterson led the design of Berkeley
RISC Machine, the first RISC computer. He is also
the author/co-author of two of most popular
Textbooks on Computer Architecture.
21Post PC Motivation
- Next generation fixes problems of last gen.
- 1960s batch processing slow turnaround ?
Timesharing - 15-20 years of performance improvement, cost
reduction (minicomputers, semiconductor memory) - 1980s Time sharing inconsistent response
times ? Workstations/Personal Computers - 15-20 years of performance improvement, cost
reduction (microprocessors, DRAM memory, disk) - 2000s PCs difficulty of use/high cost of
ownership ? ???
22Computing Trends Post-PC Era
- Multimedia Applications
- real time data types video, speech, animation,
music - 90 of desktop cycles will be spent on media
applications by end of 2000. - Multimedia workloads will continue in importance
- Image, handwriting, and speech recognition will
pose other major challenges. - Pervasive Mobile Computing Devices
- support an expanding range of functions
- challenge is in converging them into a single
device - keeping the size, weight, and power consumption
constant.
23Sony Playstation 2000
- Emotion Engine 6.2 GFLOPS, 75 million polygons
per second (Microprocessor Report, 135) - Superscalar MIPS core vector coprocessor
graphics/DRAM - Claim Toy Story realism brought to games!
24Intelligent PDA ( 2005?)
- Pilot PDA
- gameboy, cell phone, radio, timer, camera, TV
remote, am/fm radio, garage door opener, ... - Wireless data (WWW)
- Speech, vision recog.
- Voice output for conversations
-Speech control of all devices - Vision to see,
- Scan documents, - read bar code, ... -
Measure room
25Billion Transistor Architectures and Mobile
Multimedia Metrics
- SS Trace SMT CMP IA-64 RAW
- Design Scal.
- Energy/power
- Code Size
- Real-time
- Cont. Data
- Memory BW
- Fine-grain Par.
- Coarse-gr.Par.
- Direction for Computer Architecture Research,
Kozyrakis, Patterson IEEE Computer (11/98)
26New Architecture Directions
- media processing will become the dominant force
in computer arch. microprocessor design. - ... new media-rich applications... involve
significant real-time processing of continuous
media streams, and make heavy use of vectors of
packed 8-, 16-, and 32-bit integer and Fl. Pt. - Needs include high memory BW, high network BW,
continuous media data types, real-time response,
fine grain parallelism - How Multimedia Workloads Will Change Processor
Design, Diefendorff Dubey, IEEE Computer (9/97)
27Some Media-Processing Functions
- Kernel Vector length
- Matrix transpose/multiply (3D Gr.) vertices at
once - DCT (video, comm.) image width
- FFT (audio) 256-1024
- Motion estimation (video) image width, i.w./16
- Gamma correction (video) image width
- Haar transform (media mining) image width
- Median filter (image process.) image width
(from http//www.research.ibm.com/people/p/pradeep
/tutor.html)
28Challenges for Mobile Multimedia
- High performance for multimedia functions
- Energy and power efficiency (lt1 Watt)
- Small size (fit in pocket)
- Low design complexity and high degree of
scalability (costs few tens of )
29A Better Mobile Multimedia MPUs LogicDRAM
- Embedded DRAM processors one possibility
- Faster logic in DRAM process
- DRAM vendors offer faster transistors same
number metal layers as good logic process?_at_
20 higher cost per wafer? - Called Intelligent RAM (IRAM) since most of
transistors will be DRAM - Leave for another presentation
- A Case for Intelligent RAMPatterson, Anderson,
. IEEE Computer (3/97)
30Mobile Multimedia Conclusion
- 10000X cost-performance increase in stationary
computers, consolidation of industrygt time for
architecture/OS/compiler researchers declare
victory, search for new horizons? - Mobile Multimedia offer many new challenges
energy efficiency, size, real time performance,
... - Apps/metrics of future to design computer of
future! - Suppose PDA replaces desktop as primary computer?
- Work on FPPP on PC vs. Speech on PDA?
31From the Horse Mouth
- Personal mobile computing offers a vision of the
future with a much richer and more exciting set
of architecture research challenges than
extrapolations of the current desktop
architectures and benchmarks. - Put another way, which problem would you rather
work on improving performance of PCs running
FPPPPa 1982 Fortran benchmark used in
SPECfp95or making speech input practical for
PDAs? - Direction for Computer Architecture Research,
Kozyrakis, Patterson IEEE Computer (11/98)
32References
- IEEE Computers Sept. 97, Jan. 98, Aug. 98, Nov.
98, - IEEE Micro Dec. 96, Mar. 97, Sept. 97
33Acknowledgement
- Thanks to Dr. Vishv Malhotra for lending me some
of his IEEE Computer issues. - Thanks to Prof. Sale for going through the slides
and making useful suggestions. - WAIT FOR THE NEXT TWO SLIDES
34Purpose of This Talk
- To get Staff and Students excited about the new
opportunities for research. - What would you be doing as a graduate?
- Service Windows NT, and if lucky perhaps UNIX?
- Develop web pages?
- Do more of the same?
- Or rather do something really exciting?
- We need you if you choose the LATTER!
- 50 Post Graduate Scholarship for IT up for grab
35Our Vision and Aim
- Achieve Critical Mass in Research
- Create a Group of Staff Students Working on the
Problems of Future. - Pulling Australian IT Research Community Together
- Identifying Niches Where We Can Make
International Contribution.