The case for simultaneous multithreading SMT and chip multiprocessorCMP presentation

About This Presentation

Title:

The case for simultaneous multithreading SMT and chip multiprocessorCMP

Description:

Multiple instruction issue from superscalar architectures ... All with the state of the art instruction scheduling. Results. Results. Cache issues in SMT ... –

Number of Views:64

Avg rating:3.0/5.0

Slides: 25

Provided by: Surf6

Category:

more less

Transcript and Presenter's Notes

Title: The case for simultaneous multithreading SMT and chip multiprocessorCMP

1
The case for simultaneous multithreading (SMT)
and chip multiprocessor(CMP)

Dean Tullsen, Susan J. Eggers, and Henry M. Levy,
Simultaneous Multithreading Maximizing on-chip
parallelism, ISCA 1995.
Kunle Olukotun, et.al, The case for single-chip
multiprocessor, ASPLOS VII, 1996.

2
State of the art microprocessor architecture in
1995

Superscalar (SS)
Multiple instruction issues
Dynamic scheduling with hardware tracking
dependencies
Speculative execution look past predicted
branches
Non-blocking caches multiple outstanding memory
operations
Coarse grain threading instructions from the
same thread are packed in each cycle.
Moores law is still in action
More logics for the processors
What is the best path to higher performance?

3
Options to higher performance

Continue with superscalar
Wider instruction issue
Support more speculation
Considerations
Technology limits.
Need to be able to pack instructions from one
thread (how much ILP is there?).

4
Super-scalar designs

3 phases instruction fetch, issue and
retirement, execution.
Performance limiter
Issue and retirement.
20 die area in PA-8000 with 56-instruction queue
Wider issue width requires deeper issue queue
Quadratic increase in size of the Q
Long wires for broadcast tags
Execution phase quadratic increases in the size
of register file and the bypass logic.

5
Super-scalar designs

Delays increase as the size of the issue queue
increases and as the size of multi-port register
file increases.
Performance return from wider issue is limited.

6
How much ILP is there?

Programs in SPEC92, 8-issue superscalar
processor.
lt 1.5 IPC
dominant cycle losts differs by application.
ILP in one application cannot fully utilize the
cycles.

7
Some conclusion from the 8-issue simuation

No dominant cause of wasted cycles.
No dominant solution
What is next?
Each thread does not have enough ILP for a
8-issue superscalar.
Exploit ILP from multiple threads.
Fine-grain multithreading context switching each
cycle, instructions from one threads in each
cycle.

8
Type of cycle waste in super-scalar processors
The SPEC92 study says 61 vertical waste and 39
horizontal waste fine-grain multithreading also
has a limit.
9
Another alternative Simultaneous multithreading