Title: Compilers and Multi-Core Computing Systems
1Compilers and Multi-Core Computing Systems
- FRAN ALLEN
- allen_at_watson.ibm.com
-
- Triangle Computer Science Lecture UNC
- February 18, 2008
2Topics
- New Technical Challenge Performance
- Solution?? Multicores and Parallelism
- A personal tour of some languages, compilers, and
computers for high performance systems - Meeting the Challenge
3What Is The Challenge and Why Does It Matter?
- Computers are hitting a performance limit
- The biggest problem Computer Science has ever
faced. John Hennessy - The best opportunity Computer Science has to
improve user productivity, application
performance, and system integrity. Fran Allen
4The Performance Problem
- Transistors continue to shrink
- More and more transistors fit on a chip
- The chips run faster and faster
- Resulting in HOT CHIPS!
5Performance Problem Solution Multicores
- Two or more processors (multicores) on a chip
- Simpler, slower, cooler processors
- Processors can work on independent parts of the
same task - Users and software will organize tasks to
maximize PARALLELISM
6Parallelism Solves the Performance Problem! (or
does it?)
7The (Very Conservative) Future of Multi-cores
- 2007 - 8 cores on a chip
- 2009 - 16 cores
- 2013 - 64 cores
- 2015 - 128 cores
- 2021 - 1k cores
- LUNATIC LEVELS OF PARALLELISM!!
8Languages, Compilers, and Computers A Personal
History
- Fortran 1954-1957
- Sequential Programs and Hardware Concurrency
- 1955-1962 Stretch Harvest
- 1962-1968 Advanced Computing System (ACS)
- 1970s Consolidation
- Sequential Programs and Parallel Computers
- 1983 1995 PTRAN
In the beginning there was Fortran. Jim Gray
9Fortran Project (1954-1957) Goals
- Increase user productivity
- "...produce programs almost as efficient as hand
coded ones and do so on virtually every job."
John Backus
THE FORTRAN GOALS BECAME MY GOALS
10The Fortran Language and Compiler
- Available April 15, 1957
- Some features
- Beginnings of formal parsing techniques
- Intermediate language form for optimization
- Control flow graphs
- Common sub-expression elimination
- Generalized register allocation - for only 3
registers! - Spectacular object code!!
11Stretch (1956-1961)
- Goal 100 times faster than any existing machine
- Main Performance Limitation Memory Access Time
- Extraordinarily ambitious hardware
- Equally ambitious compiler
Fred Brooks
12Stretch Concurrency
- Overlapped storage references up to 6 at a time
- Instruction lookahead unit
- Up to 11 instructions executing in cpu at the
same time - Hardware gave the appearance of a sequential
machine - Superscalar??
- Multiprogramming
13HARVEST (1958 - 1962)
- Built for NSA for code breaking
- Hosted by Stretch
- Streaming data computation model
- Eight instructions and unbounded execution times
- Only system with balanced I/O, memory and
computational speeds (per conversation with Jim
Pomerene 11/2000) - ALPHA a language designed to fit the problem and
the machine
14Stretch Harvest Compiler Organization
Autocoder II
ALPHA
Fortran
Translation
Translation
Translation
IL
OPTIMIZER
IL
REGISTER ALLOCATOR
IL
ASSEMBLER
OBJECT CODE
STRETCH
STRETCH-HARVEST
15Stretch - Harvest Outcomes
- April 1961 Stretch delivered to Los Alamos but
- Stretch performance off by 50
- Considered a failure by IBM
- Feb 1962 Harvest accepted by National Security
Agency and used for 14 years - Stretch had a huge influence on future IBM
systems!
16The IBM 360 (1959? 1964)
- Goal Unify existing product lines
- One Instruction set for scientific and business
applications - Multiple hardware models ranging from small and
cheap to powerful and expensive - One software product line
- IBM bet the company and won!
Fred Brooks
17Advanced Computing System (ACS) 1962-1968
- Goal Fastest Machine in the World
- Pipelined and superscalar
- Branch prediction
- Out of order instruction execution
- Instruction and data caches
- Experimental Compiler
- Built early to drive hardware design
- Compiler code often faster than the best hand
code
John Cocke
18ACS Compiler Optimization Results
- Language-independent machine-independent
optimization - A theoretical basis for program analysis and
optimization - A Catalogue of Optimizations which included
- Procedure integration
- Loop transformations unrolling, jamming,
unswitching - Redundant subexpression elimination, code motion,
constant folding, dead code elimination, strength
reduction, linear function test replacement,
carry optimization, anchor pointing - Instruction scheduling
- Register allocation
- IBM CANCELLED ACS PROJECT IN 1968!
19The 1970's Consolidation and Simplification
- Mainstreaming new optimization techniques
- Lots of research on optimization algorithms
- Whole program analysis
- Experimental Compiling System
-
John Cocke gave up his goal of building the
worlds fastest computer to build the best
cost/performance machine. The Result THE POWER
PC!!
20PTRAN for Automatic Parallelization (1980s to
1995)
- Research
- Program Dependence Graphs
- Constructing Useful Parallelism
- Static Single Assignment (SSA)
- Whole Program Analysis Framework
- Compiler development
- IBMs XL Family of Compilers
- Fortran 90
- Run-time technologies
- Dynamic Process Scheduling
- Debugging
- Visualization
21Automatic Parallelization is Hard
- Identifying potential parallelism is hard
- Pointers
- Storage reuse
- Procedure boundaries
- Forming useful parallelism is hard
- Caches
- Data management
- Multiple models of parallelism
22Components of the Performance Solution
- Very high level domain specific languages
- Automatic parallelism
- Data management optimization locality,
integrity, ownership, - Influence the architects before it is too late.
- Remember the goals
- User Productivity
- Application Performance
- Bold thinkers and high risk projects
23Peak Performance Computers by Year
24END OF TALKSTART OF A NEW ERA IN COMPUTING
AND COMPUTER SCIENCE!