Title: Hasim
1Hasim
Michael Adler, Artur Klauser, Angshuman
Parashar, Michael Pellauer, Murali
Vijayaraghavan
2Overview
- Goal
- Produce compelling evidence for architecture
ideas - Requirements
- Cycle accurate simulation
- Representative simulation length
- Software development (often)
- Current approach
- Mostly software simulation (10 KHz to 1 KHz)
- New approach
- Build a performance model in an FPGA
3FPGA-based approaches
- Prototyping
- Build a logically isomorphic representation of
the design - Modeling
- Build a performance simulation in gates
- Hybrids
- Build something that is partially a prototype and
partially a model
4Recreate Asim in hardware
- Modularity
- Inter-module communication
- Functional/Timing Partitioning
- Modeling Utilities
5Why modularity?
- Speed of model development
- Shared components between products
- Reuse across generations
- Encourages isomorphism to design
- Improved fidelity
- Facilitates speed/fidelity trade-offs
- Architectural experimentation
- Factorial development and evaluations
- Sharing
6ASIM Module Hierarchy
S
7ASIM Module Selection
B
8Module Selection
S
S
C
M
N
D
R
X
C
W
F
9Module Replacement
S
X
D
R
X
C
W
F
10(H)ASIM Module Hierarchy
11Communication
C
12Named connections
S
D
A-out
A-in
13Model and FPGA Cycles
Port
Port
Module B
Module A
Port
Port
14Functional/Timing Decomposition
- ISA semantics
- Platform semantics
Timing Partition
Functional Partition
Fetch(PC)
Instruction
- Simplifies timing model
- Amortize functional model design effort over
many models - Can be pipelined for performance
- Can be FPGA-friendly design
- Can be split across hardware and software
15Execute_at_execute phases
- Fetch instruction
- Speculatively execute instruction
- Read memory
- Speculatively write memory (locally visible)
- Commit or Abort instruction
- Write memory (globally visible) Optional
depending on instruction type
16Execution in phases
Assertion All data dependencies can be
represented in these phases
17HASim Partitioning Overview
Timing Partition
Memory State
Register State
RegFile
Functional Partition
18Common Infrastructure
- Modules
- Inter-module communication
- Statistics gathering
- Event logging
- Debug Tracing
- Simulation control
-
19Bluespec (Asim-style) module
module HAsim_module mkCache() (Empty)
Port(Addr) req_port lt- mkSendPort(a2cache)
Port(Bool) resp_port lt- mkRecvPort(cache2a)
  TagArray tagarray lt- mkTagArray() rule
cycle(True)Â Â Â Maybe(Addr) mx
req_port.get()Â Â if (isValid(mx)) Â Â
resp_port.put(tagarray.lookup(validValue(mx))) Â Â
endruleendmodule
20Bluespec (Asim-style) submodule
- module mkTagArray(TagArray)
RegFile(Bit(12),Bit(4)) tagArraylt-
mkRegFileFull(...) method Bool
lookup(Bit(16) a) return (tagArray.sub(getIn
dex(a)) getTag(a)) endmethod - function Bit(4) getTag(Address x) return
x1512 endfunction - function Bit(12) getIndex(Address x)
return x110 endfunction - endmodule
21Support functions - stats
module mkCache(...) (Empty)Â Â ...
cache_hits lt- mkStat(...) ... Â
 hittagarray.lookup(...)   if (hit)
cache_hits.increment() endif
...endmodule
Module
Stat Counter
Module
Stat Counter
Stat Dumper
Module
Stat Counter
222Dreams
23Support functions - events
module mkCache(...) (Empty)Â Â ...
cache_event lt- mkEvent(...) ... Â
 hittagarray.lookup(...)   cache_event.report(
hit) ...endmodule
Module
Event Reg
Module
Event Reg
Event Dumper
Module
Event Reg
24Support functions global controller
module mkCache(...) (Empty)Â Â ... ctrl lt-
mkCntrlr(...) ... rule (ctrl.run())
... endrule endmodule
Module
Controller
Module
Controller
GlobalController
Module
Controller
25(No Transcript)
26FPGA-based prototype
Prototyping Catch-22
27Module Instantiation
U
M
C
N
28Factorial Coding/Experiments
29HAsim Current status - models
- Simple RISC functional model operating
- Simple RISC ISA
- Pipelined multi-phase instruction execution
- Supports speculative OOO design
- Physical Reg File and ROB
- Small physically addressed memory
- Fast speculative rewinds
- Instruction-per-cycle (APE) model
- Runs simple benchmarks on FPGA
- Five stage pipeline
- Supports branch mis-speculation
- Runs simple benchmarks (in software simulation)
- X86 functional model architecture under
development
30Connections Implement Ports
PM (Module Tree w. Connections)
PM (Hardware Modules w. Wrappers)
Implemented via connections.
31Timing Model Resources (Fast)
- OOO, branch prediction, three functional units,
32KB 2-way set associative ICache and DCache,
iTLB, dTLB2142 slices (15 of a 2VP30) - 21 block RAMs (15 of a 2VP30)
- Configurable cache model
- 32KB 4-way set associative cache with 16B
cache-lines - 165 slices (1 of a 2VP30)
- 17 block RAMs (12 of a 2VP30)
- 2MB 4-way set-associative cache with 64B
cache-lines - 140 slices (1 of a 2VP30)
- 40 block RAMs (29 of a 2VP30)
- Current FPGAs (4VFX140)
- 142,128 slices
- 552 block RAMs
- 2 PowerPCs