RAMP Models and Platforms - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

RAMP Models and Platforms

Description:

Why so many different RAMP projects? Why is there not more sharing among projects? ... RAMP Gold to add support for ParLab OS work (Tessellation) ... – PowerPoint PPT presentation

Number of Views:293

Avg rating:3.0/5.0

Slides: 28

Provided by: georgep6

Category:

more less

Transcript and Presenter's Notes

Title: RAMP Models and Platforms

1
RAMP Models and Platforms

Krste Asanovic
UC Berkeley
RAMP Retreat, Berkeley, CA
January 15, 2009

2
Much confusion about RAMP

Frequently asked questions
When will RAMP be finished/usable?
What ISA does RAMP use?
Can RAMP model my new feature X?
How accurate is RAMP?
Why so many different RAMP projects?
Why is there not more sharing among projects?

3
Not much confusion about software simulators

Rarely asked questions
When will software simulation be finished/usable?
What ISA do software simulators use?
Can a software simulator model my new feature
X?
How accurate is software simulation?
Why so many software simulators?
Why is there not more sharing among software
simulators?

4
RAMP is a consortium, not a project

Many projects with different goals
sometimes multiple per site
So far, much sharing of ideas and techniques
Very healthy and active community
Some sharing of low-level infrastructure
Boards platform-level interfaces to DRAM,
Ethernet, etc.
Not a single complete infrastructure that
everyone uses
and thats been OK, and might continue to be OK

5
Run Model of Target on Host Platform
Hard Work
6
RAMP Projects Goals

Model some target machine trading off
Fidelity
Model design effort
Emulation speed (and capacity)

7
Space of Target Machines

Which ISA?
x86, SPARC, PowerPC, Alpha, ARM, MIPS?
In-order or out-of-order cores?
How many cores?
1, 16, 256, 1M?
Processormemory of general-purpose machine, or
whole SoC including I/O devices?
Accelerators, GPUs?
Which operating system? Hypervisor?

8
ISA Wars

Original pick to standardize around was SPARC
Open standard
Available verification suite
Simplest ISA with extensive general-purpose
software support (i.e., desktop/server
development environment available)
SGI/MIPS sorely missed
Leon implementation for FPGA
Simics
But the intent was always to support multiple
ISAs

9
ISA usage in RAMP models

UCB RAMP Blue Microblaze
Xilinx soft core modified to add 64-bit FPU
Stanford RAMP Red PowerPC
Used Virtex-II Pro hard cores
UT FAST x86
Functional simulation in software on front-end
machine (or on PowerPC hardcore)
UT RAMP White PowerPC -gt SPARC
Initial version used hard PowerPC cores moving to
Leon soft cores
MIT/Intel HASIM Alpha -gt x86?
Initially Alpha ISA, eventually to form basis of
x86/uOP machine
CMU ProtoFLEX SPARC
SPARC three ways (own core emulation on hard
PowerPC core emulation on front-end machine)
UCB RAMP Gold Internet-in-a-Box SPARC
Own core design
UCB/LBNL Green Flash Tensilica
RTL generated from Tensilica tools

10
Supporting new ISAs

x86 still very desirable, but difficult
FAST software functional model is probably
current best approach if want to play with
different timings
Microcoded functional model would be good way to
go if had resources (HASIM?)
Even with working functional model, timing model
is difficult? Adding new features difficult?
ARM also desirable for mobile device modeling
Renewed interest in engaging here
MIT/IBM PowerPC work in progress, could form
functional model
But nobody does this for fun - only to advance
their own research goals

11
Commercial/Existing RTL Cores

Originally seen as big benefit of RAMP
But didnt turn out that way in practice (except
for prototyping usage model - see later)
Cores dont provide features we need, too big,
too difficult to modify
For simple ISAs (i.e. non-x86), biggest help is
ISA verification suites, and/or really simple
synthesizable ISA pipeline to form basis of
functional model

12
Operating System Support

Currently only ProtoFLEX, FAST, RAMP-White
support OS
Others can run one application with proxy
mechanism for I/O
Reflects interests of groups. OS is not primary
subject of research for groups building models so
far.
RAMP Gold to add support for ParLab OS work
(Tessellation)
Green Flash to add support for HPC-style
microkernel

13
Target systems

From a few, to millions of cores
Scaling simulation to 100s of cores was a shared
goal
But smaller core counts (16-128) very interesting
also
Huge core counts (gt1E6) also of interest
Single node versus clusters
RAMP Blue Internet-in-a-box are message-passing
clusters
Rest are shared-memory systems
Memory hierarchy and cache coherence protocols
Wide variety of possibilities
Desktop/Laptop/Server versus Handheld or SoC
What is important to model for given research
topic?
Accelerators/GPUs
Even wider variety than CPU ISAs/microarchitecture
s

14
Wide variety, how to reuse?

Proposal
ISA functional models
also FPU across ISAs
Perhaps even common uOP engine across all ISAs?
CPU Microarchitecture timing model
E.g., in-order superscalar, out-of-order with
unified physical register file
Memory functional model
Host-level caches memory interleaving
Memory hierarchy timing models
On-chip network types as subset
I/O bus shims
To allow random RTL to be attached for I/O
devices and non-GPU accelerators
This wont be easy, as have to agree on
interfaces between these components, might need
further specialization
Definitely need more experience doing all of the
above

15
Simulator Types

Functional model only (no timing)
RTL models (functional includes timing)
Also used for chip prototyping
Split functional and timing models
Hybrids of above

16
Simulator Mapping Styles

Gate-level emulator (Quickturn, Palladium)
1MHz
Direct RTL emulator
5-20MHz
FPGA-tuned RTL emulator
20-50MHz
Virtualized RTL emulator
50-100MHz
Host-multithreaded models
gt100MHz

17
(No Transcript)
18

RAMP Blue Release 2/25/2008
design available from RAMP website
ramp.eecs.berkeley.edu

19
Climate System Design ConceptStrawman Design
Study
10PF sustained 120 m2 lt3MWatts lt 75M
20
Virtualized RTL Improves FPGA Resource Usage

RAMP allows units to run at varying target-host
clock ratios to optimize area and overall
performance
Example 1 Multiported register file
Example, Sun Niagara has 3 read ports and 2 write
ports to 6KB of register storage
If RTL mapped directly, requires 48K flip-flops
Slow cycle time, large area
If mapping into block RAMs (one readone write
per cycle), takes 3 host cycles and 3x2KB block
RAMs
Faster cycle time (3X) and far less resources
Example 2 Large L2/L3 caches
Current FPGAs only have 1MB of on-chip SRAM
Use on-chip SRAM to build cache of active piece
of L2/L3 cache, stall target cycle if access
misses and fetch data from off-chip DRAM

21
Host Multithreading(Zhangxi Tan (UCB), Chung,
(CMU))

Multithreading emulation engine reduces FPGA
resource use and improves emulator throughput
Hides emulation latencies (e.g., communicating
across FPGAs)

22
Split Functional/Timing Models(HASIM Emer
(MIT/Intel), FAST Chiou, (UT Austin))
Functional Model
Timing Model

Functional model executes CPU ISA correctly, no
timing information
Only need to develop functional model once for
each ISA
Timing model captures pipeline timing details,
does not need to execute code
Much easier to change timing model for
architectural experimentation
Without RTL design, cannot be 100 certain that
timing is accurate
Many possible splits between timing and
functional model

23
RAMP WhiteHari Angepat, Derek Chiou (UT Austin)

Scalable Coherent Shared Memory Multiprocessor
Support standard shared memory programming models

Leon3 shim
Leon3 shim
Intersection Unit
NIU
Intersection Unit
NIU
Router
Router
AHB shim
AHB shim
AHB bus
AHB bus
MP IntCntrl
DSU
Eth
DDR2
DDR2
RAMP-White
23
24
Multithreaded Func. Timing Models(RAMP Gold
UCB)
Timing Model Pipeline
MT-Channels
MT-Unit

MT-Unit multiplexes multiple target units on a
single host engine
MT-Channel multiplexes multiple target channels
over a single host link

25
CMU Simics/RAMP Simulator
16-CPU Shared-memory UltraSPARC III Server
(SunFire 3800)
BEE2 Platform
25
26
What Hardware Platforms?

RTL mapping approaches
Need large amounts of logic
Selected BEE2, and then designed BEE3 for this
emulation style
Observed that dont need much interconnect
bandwidth (memory inter-board links) because
RTL cores are slow and latency sensitive
Host-multithreading allows large systems to be
mapped to small (one?) FPGA (e.g., 64-128 cores
on ML505)
Logic gate count not as critical, need to focus
on on-chip capacity, off-chip memory bandwidth
and total memory capacity per FPGA (conventional
processor memory hierarchy issues multiplied by
multithreading factor)
One big FPGA with lots of fast memory channels
would be ideal
Software functional emulation (FAST) or
transplant (ProtoFLEX)
Focus on fast coherent connection to front-end
x86 CPU
Hypertransport, FSB, QPI interfaces better than
PCI I/O connections

27
Summary

Many reasons for great divergence in RAMP
projects
Different ISAs, different target machines,
different research topics, different emulation
styles
Sharing possible, but hard work and more
experience needed
Questions?

Write a Comment

User Comments (0)