A Study on HyperThreading - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

A Study on HyperThreading

Description:

Other resources partitioned equally between 2 threads ... HT On: Hyper-Threading on and OS context ... Extended the simulator to model SMT and Hyper-Threading: ... – PowerPoint PPT presentation

Number of Views:530

Avg rating:3.0/5.0

Slides: 30

Provided by: www43

Category:

more less

Transcript and Presenter's Notes

Title: A Study on HyperThreading

1
A Study on Hyper-Threading

Vimal Reddy
Ambarish Sule
Aravindh Anantaraman

2
Microarchitectural trends

Higher degrees of instruction-level parallelism
Different generations
I. Serial Processors Fetch and execute each
instruction back to back
II. Pipelined Processors Overlap different
phases of instruction processing for higher
throughput
III. Superscalar Processors Overlap different
phases of instruction processing and issue and
execute multiple instructions in parallel for IPC
gt 1
IV. ???

3
Superscalar limits

Limitations with superscalar approach
- Amount of ILP in most programs is limited
- Nature of ILP in programs can be bursty
- Bottom-line Resources can be utilized
better

4
Simultaneous Multithreading

Finds parallelism at thread level
Executes multiple instructions from multiple
threads each cycle
No significant increase in chip area over a
superscalar processor

5
Multiple PCs
Replicate architectural state

Thread selection
Replicate RAS
BTB thread ids

Fetch Unit
Data Cache
FP Registers
FP queue
Instruction Cache
Selective squash
Decode
Int. Registers
Int. queue
Int. load/store units
Register Renaming
Selective squash
Replicate architectural state
Per-thread disambiguation

Multiple rename map tables
Multiple arch. map tables
Multiple active lists

From ece721 notes, Prof. Eric Rotenberg, NCSU
6
Hyper-Threading

Brings goodness of Simultaneous Multi-Threading
(SMT) to Intel Architecture
Motivation (Same as that for SMT)
High processor utilization
Better throughput (by exploiting thread level
parallelism - TLP)
Power efficient due to smaller processor cores
compared to CMP

7
Hyper-Threading Contd.

2 Logical processors (2 threads in SMT
terminology)
Shared Instruction Trace Cache and L1 D-Cache
2 PCs and 2 register renamers
Other resources partitioned equally between 2
threads
Recombines shared resources when single threaded
(no degradation of single thread
performance)

Intel NetBurst Microarchitecture Pipeline With
Hyper-Threading Technology
8
Project Goal

Measure performance of micro-benchmarks (kernels)
on Pentium-4. Form workloads to utilize different
processor resources and study behavior.

9
Pentium4 Functional Units
3 Integer ALU units (2 double speed) 1 unit
for Floating point computation Separate address
generator units for loads and stores
10
Micro-benchmarks

Created 3 types of kernels
Floating Point intensive kernel (flt)
Performs FP Add, Sub, Multiply, Divide operations
a large number of times
Targets single FP unit
Integer intensive kernel (int)
Performs integer Add, Subtract and Shift a large
number of times
Targets integer units (2 double speed and 1 slow)
Memory intensive kernel (mem, mem_s)
Dynamically allocates a linked list larger than
L1 D and parses it
Targets shared data cache and memory hierarchy as
such

11
Micro-benchmarks (contd.)
Integer kernel
Floating Point kernel
Memory intensive kernel
12
Workbench

Machine Pentium4 Northwood 2.53-2.66 GHz. with
Hyper-Threading
Operating System Linux 2.4.18-SMP kernel. OS
views each thread as a processor
BIOS setting to turn HT On/Off
PERL script to fork processes at the same time
top (Linux utility) to monitor processes
(processor and memory utilization)
time utility to get timing statistics for each
program
Ran each experiment 10 times and took the average
execution time

13
Methodology

Run different workload combinations.
fltflt 2 Floating point kernels
mem_smem_s 2 small memory intensive kernels
intflt 1 integer and 1 float kernel and so on
..
Run in 3 modes
1. back-to-back Run each program individually
2. HT Off No Hyper-Threading. But OS context
switching
3. HT On Hyper-Threading on and OS context
switching
Find Contending workloads Compete for
resources and degrade performance (increase
execution time with HT on)
Find Complementary workloads Utilize idle
resources and increase performance (decrease
execution time with HT on)

14
Experiments Single thread performance

Hyper-Threading does not degrade single thread
performance

15
Experiments (Contd.)

Contention for single FP unit increases
execution time
Contention for data cache can lead to thrashing

16
Experiments (Contd.)

Integer workloads perform well 3 integer units
(2 double speed) are well utilized
Workloads with complementary resource
requirements
perform well (intflt, memint)
OS plays important role when number of programs
gt number
of hardware contexts available

17
Experiments (Contd.)
18
Experiments (contd.)

Execution time with 3 kernel workload is less
than that for 2!
Scheduling important!
intfltflt - int kernel has 100 of 1 thread,
5050 between flt
and flt
fltfltint - flt kernel has 100 of 1 thread,
5050 between int
and flt. Has higher execution time!

19
Project Goal

Model Hyper-Threading on a simulator. Vary key
parameters and study first order effects

20
Simulator details

Execution driven, cycle accurate simulator based
on SimpleScalar toolset
Extended the simulator to model SMT and
Hyper-Threading
Resource sharing by tagging thread id (I, D)
Resource replication through multiple
instantiation (PC, Map tables, Branch history,
RAS)
Resource partitioning by having separate
instances but imposing a global limit on entries
( Active list, Load/store buffers, IQs)
Stop simulation after completion of all threads

21
Simulator details
22
Simulator SMT/HT validation
23
Experiment Modeling L1 data cache interference
24
Experiment Modeling issue queue partitioning
25
Experiment Modeling total issue queue size with
partitioning
26
Experiment Varying Load/Store buffer sizes
(Pentium4 48 Load, 24 Store)
27
Experiment Comparison of fetch policies
28
References

1 Prof. Eric Rotenberg, Course Notes, ECE 792E
Advanced Microarchitecture, Fall 2002 NC State
University.
2 Deborah T. Marr et al. Hyper-Threading
Technology Architecture and Microarchitecture,
Intel Technology Journal 1st Qtr 2002 Vol 6 Issue
1.
3 Vimal Reddy, Ambarish Sule, Aravindh
Anantaraman Hyperthreading on the Pentium 4,
ECE792E Project, Fall 2002 http//www.tinker.ncsu.
edu/ericro/ece721/student_projects/avananta.pdf
4 D. M. Tullsen, et al. Exploiting Choice
Instruction Fetch and Issue on an Implementable
Simultaneous Multithreading Processor, 23rd
Annual ISCA, pp. 191-202, May 1996.