CloselyCoupled TimingDirected Partitioning in HAsim - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

CloselyCoupled TimingDirected Partitioning in HAsim

Description:

Murali Vijayaraghavan , Michael Adler , Arvind , Joel Emer. MIT CS and AI Lab ... Amortize functional model design effort over many models ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 28
Provided by: foo11
Category:

less

Transcript and Presenter's Notes

Title: CloselyCoupled TimingDirected Partitioning in HAsim


1
Closely-CoupledTiming-Directed Partitioningin
HAsim
  • Michael Pellauer
  • pellauer_at_csail.mit.edu

Murali Vijayaraghavan, Michael Adler, Arvind,
Joel Emer
MIT CS and AI Lab Computation Structures Group
Intel Corporation VSSAD Group
To Appear In ISPASS 2008
2
Motivation
  • We want to simulate target platforms quickly
  • We also want to construct simulators quickly
  • Partitioned simulators are a known technique from
    traditional performance models
  • Micro-architecture
  • Resource contention
  • Dependencies
  • ISA
  • Off-chip
  • communication

Functional Partition
Timing Partition
Interaction
  • Simplifies timing model
  • Amortize functional model design effort over
    many models
  • Functional Partition can be extremely
    FPGA-optimized

3
Different Partitioning Schemes
  • As categorized by Mauer, Hill and Wood
  • Source MAUER 2002, ACM SIGMETRICS
  • We believe that a timing-directed solution will
    ultimately lead to the best performance
  • Both partitions upon the FPGA

4
Functional Partition in Software Asim
  • Get Instruction (at a given Address)
  • Get Dependencies
  • Get Instruction Results
  • Read Memory
  • Speculatively Write Memory (locally visible)
  • Commit or Abort instruction
  • Write Memory (globally visible)
  • Optional depending on instruction type

5
Execution in Phases
The Emer Assertion All data dependencies can be
represented via these phases
6
Detailed Example 3 Different Timing Models
  • Executing the same instruction sequence

7
Functional Partition in Hardware?
  • Requirements
  • Support these operations in hardware
  • Allow for out-of-order execution, speculation,
    rollback
  • Challenges
  • Minimize operation execution times
  • Pipeline wherever possible
  • Tradeoff between BRAM/multiport RAMs
  • Race conditions due to extreme parallelism

8
Functional Partition As Pipeline
  • Conveys concept well, but poor performance

Timing Model
Token Gen
Dec
Exe
Mem
LCom
GCom
Fet
Functional Partition
Memory State
Register State
RegFile
9
ImplementationLarge Scoreboards in BRAM
  • Series of tables in BRAM
  • Store information about each in-flight
    instruction
  • Tables are indexed by token
  • Also used by the timing partition to refer to
    each instruction
  • New operation getToken to allocate a space in
    the tables

10
Implementing the Operations
  • See paper for details (also extra slides)

11
AssessmentThree Timing Models
  • Unpipelined Target
  • MIPS R10K-like out-of-order superscalar
  • 5-Stage Pipeline

12
AssessmentTarget Performance
  • Targets have idealized memory hierarchy

13
AssessmentSimulator Performance
  • Some correspondence between target and functional
    partition is very helpful

14
AssessmentReuse and Physical Stats
  • Where is functionality implemented
  • FPGA usage

Virtex IIPro 70 Using ISE 8.1i


15
Future WorkSimulating Multicores
Interaction occurs here
  • Scheme 1 Duplicate both partitions
  • Scheme 2 Cluster Timing Parititions

Timing Model A
Func Reg Datapath
Functional Memory State
Timing Model B
Func Reg Datapath
Use a context ID to reference all state lookups
Timing Model A
Timing Model C
Functional Reg State Datapath
Timing Model B
Timing Model D
Functional Memory State
Interaction still occurs here
16
Future Work Simulating Multicores
  • Scheme 3 Perform multiplexing of timing models
    themselves
  • Leverage HASim A-Ports in Timing Model
  • Out of scope of todays talk

Timing Model A
Timing Model B
Timing Model C
Functional Reg State Datapath
Timing Model D
Functional Memory State
Use a context ID to reference all state lookups
Interaction still occurs here
17
Future WorkUnifying with the UT-FAST model
  • UT-FAST is Functional-First
  • This can be unified into Timing-Directed
  • Just do execute-at-fetch

Func Partition
Timing Partition
functional emulator running in software
execution stream
FPGA
resteer
Emulator
execution stream
resteer
Ø
Ø
functional emulator running in software
Ø
Ø
18
Summary
  • Described a scheme for closely-coupled
    timing-directed partitioning
  • Both partitions are suitable for on-FPGA
    implementation
  • Demonstrated such a schemes benefits
  • Very Good Reuse, Very Good Area/Clock Speed
  • Good FPGA-to-Model Cycle Ratio
  • Caveat Assuming some correspondence between
    timing model and functional partitions (recall
    the unpipelined target)
  • We plan to extend this using contexts for
    hardware multiplexing Chung 07
  • Future rare complex operations (such as
    syscalls) could be done in software using virtual
    channels

19
Questions?
pellauer_at_csail.mit.edu
20
Extra Slides
pellauer_at_csail.mit.edu
21
Functional Partition Fetch
22
Functional Partition Decode
23
Functional Partition Execute
24
Functional Partition Back End
25
Timing Model Unpipelined
26
5-Stage Pipeline Timing Model
27
Out-Of-Order Superscalar Timing Model
Write a Comment
User Comments (0)
About PowerShow.com