Title: CloselyCoupled TimingDirected Partitioning in HAsim
1Closely-CoupledTiming-Directed Partitioningin
HAsim
- Michael Pellauer
- pellauer_at_csail.mit.edu
Murali Vijayaraghavan, Michael Adler, Arvind,
Joel Emer
MIT CS and AI Lab Computation Structures Group
Intel Corporation VSSAD Group
To Appear In ISPASS 2008
2Motivation
- We want to simulate target platforms quickly
- We also want to construct simulators quickly
- Partitioned simulators are a known technique from
traditional performance models
- Micro-architecture
- Resource contention
- Dependencies
- ISA
- Off-chip
- communication
Functional Partition
Timing Partition
Interaction
- Simplifies timing model
- Amortize functional model design effort over
many models - Functional Partition can be extremely
FPGA-optimized
3Different Partitioning Schemes
- As categorized by Mauer, Hill and Wood
- Source MAUER 2002, ACM SIGMETRICS
- We believe that a timing-directed solution will
ultimately lead to the best performance - Both partitions upon the FPGA
4Functional Partition in Software Asim
- Get Instruction (at a given Address)
- Get Dependencies
- Get Instruction Results
- Read Memory
- Speculatively Write Memory (locally visible)
- Commit or Abort instruction
- Write Memory (globally visible)
- Optional depending on instruction type
5Execution in Phases
The Emer Assertion All data dependencies can be
represented via these phases
6Detailed Example 3 Different Timing Models
- Executing the same instruction sequence
7Functional Partition in Hardware?
- Requirements
- Support these operations in hardware
- Allow for out-of-order execution, speculation,
rollback - Challenges
- Minimize operation execution times
- Pipeline wherever possible
- Tradeoff between BRAM/multiport RAMs
- Race conditions due to extreme parallelism
8Functional Partition As Pipeline
- Conveys concept well, but poor performance
Timing Model
Token Gen
Dec
Exe
Mem
LCom
GCom
Fet
Functional Partition
Memory State
Register State
RegFile
9ImplementationLarge Scoreboards in BRAM
- Series of tables in BRAM
- Store information about each in-flight
instruction - Tables are indexed by token
- Also used by the timing partition to refer to
each instruction - New operation getToken to allocate a space in
the tables
10Implementing the Operations
- See paper for details (also extra slides)
11AssessmentThree Timing Models
- Unpipelined Target
- MIPS R10K-like out-of-order superscalar
12AssessmentTarget Performance
- Targets have idealized memory hierarchy
13AssessmentSimulator Performance
- Some correspondence between target and functional
partition is very helpful
14AssessmentReuse and Physical Stats
- Where is functionality implemented
- FPGA usage
Virtex IIPro 70 Using ISE 8.1i
15Future WorkSimulating Multicores
Interaction occurs here
- Scheme 1 Duplicate both partitions
- Scheme 2 Cluster Timing Parititions
Timing Model A
Func Reg Datapath
Functional Memory State
Timing Model B
Func Reg Datapath
Use a context ID to reference all state lookups
Timing Model A
Timing Model C
Functional Reg State Datapath
Timing Model B
Timing Model D
Functional Memory State
Interaction still occurs here
16Future Work Simulating Multicores
- Scheme 3 Perform multiplexing of timing models
themselves - Leverage HASim A-Ports in Timing Model
- Out of scope of todays talk
Timing Model A
Timing Model B
Timing Model C
Functional Reg State Datapath
Timing Model D
Functional Memory State
Use a context ID to reference all state lookups
Interaction still occurs here
17Future WorkUnifying with the UT-FAST model
- UT-FAST is Functional-First
- This can be unified into Timing-Directed
- Just do execute-at-fetch
Func Partition
Timing Partition
functional emulator running in software
execution stream
FPGA
resteer
Emulator
execution stream
resteer
Ø
Ø
functional emulator running in software
Ø
Ø
18Summary
- Described a scheme for closely-coupled
timing-directed partitioning - Both partitions are suitable for on-FPGA
implementation - Demonstrated such a schemes benefits
- Very Good Reuse, Very Good Area/Clock Speed
- Good FPGA-to-Model Cycle Ratio
- Caveat Assuming some correspondence between
timing model and functional partitions (recall
the unpipelined target) - We plan to extend this using contexts for
hardware multiplexing Chung 07 - Future rare complex operations (such as
syscalls) could be done in software using virtual
channels
19Questions?
pellauer_at_csail.mit.edu
20Extra Slides
pellauer_at_csail.mit.edu
21Functional Partition Fetch
22Functional Partition Decode
23Functional Partition Execute
24Functional Partition Back End
25Timing Model Unpipelined
265-Stage Pipeline Timing Model
27Out-Of-Order Superscalar Timing Model