RAMP Models and Platforms - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

RAMP Models and Platforms

Description:

Why so many different RAMP projects? Why is there not more sharing among projects? ... RAMP Gold to add support for ParLab OS work (Tessellation) ... – PowerPoint PPT presentation

Number of Views:293
Avg rating:3.0/5.0
Slides: 28
Provided by: georgep6
Category:

less

Transcript and Presenter's Notes

Title: RAMP Models and Platforms


1
RAMP Models and Platforms
  • Krste Asanovic
  • UC Berkeley
  • RAMP Retreat, Berkeley, CA
  • January 15, 2009

2
Much confusion about RAMP
  • Frequently asked questions
  • When will RAMP be finished/usable?
  • What ISA does RAMP use?
  • Can RAMP model my new feature X?
  • How accurate is RAMP?
  • Why so many different RAMP projects?
  • Why is there not more sharing among projects?

3
Not much confusion about software simulators
  • Rarely asked questions
  • When will software simulation be finished/usable?
  • What ISA do software simulators use?
  • Can a software simulator model my new feature
    X?
  • How accurate is software simulation?
  • Why so many software simulators?
  • Why is there not more sharing among software
    simulators?

4
RAMP is a consortium, not a project
  • Many projects with different goals
  • sometimes multiple per site
  • So far, much sharing of ideas and techniques
  • Very healthy and active community
  • Some sharing of low-level infrastructure
  • Boards platform-level interfaces to DRAM,
    Ethernet, etc.
  • Not a single complete infrastructure that
    everyone uses
  • and thats been OK, and might continue to be OK

5
Run Model of Target on Host Platform
Hard Work
6
RAMP Projects Goals
  • Model some target machine trading off
  • Fidelity
  • Model design effort
  • Emulation speed (and capacity)

7
Space of Target Machines
  • Which ISA?
  • x86, SPARC, PowerPC, Alpha, ARM, MIPS?
  • In-order or out-of-order cores?
  • How many cores?
  • 1, 16, 256, 1M?
  • Processormemory of general-purpose machine, or
    whole SoC including I/O devices?
  • Accelerators, GPUs?
  • Which operating system? Hypervisor?

8
ISA Wars
  • Original pick to standardize around was SPARC
  • Open standard
  • Available verification suite
  • Simplest ISA with extensive general-purpose
    software support (i.e., desktop/server
    development environment available)
  • SGI/MIPS sorely missed
  • Leon implementation for FPGA
  • Simics
  • But the intent was always to support multiple
    ISAs

9
ISA usage in RAMP models
  • UCB RAMP Blue Microblaze
  • Xilinx soft core modified to add 64-bit FPU
  • Stanford RAMP Red PowerPC
  • Used Virtex-II Pro hard cores
  • UT FAST x86
  • Functional simulation in software on front-end
    machine (or on PowerPC hardcore)
  • UT RAMP White PowerPC -gt SPARC
  • Initial version used hard PowerPC cores moving to
    Leon soft cores
  • MIT/Intel HASIM Alpha -gt x86?
  • Initially Alpha ISA, eventually to form basis of
    x86/uOP machine
  • CMU ProtoFLEX SPARC
  • SPARC three ways (own core emulation on hard
    PowerPC core emulation on front-end machine)
  • UCB RAMP Gold Internet-in-a-Box SPARC
  • Own core design
  • UCB/LBNL Green Flash Tensilica
  • RTL generated from Tensilica tools

10
Supporting new ISAs
  • x86 still very desirable, but difficult
  • FAST software functional model is probably
    current best approach if want to play with
    different timings
  • Microcoded functional model would be good way to
    go if had resources (HASIM?)
  • Even with working functional model, timing model
    is difficult? Adding new features difficult?
  • ARM also desirable for mobile device modeling
  • Renewed interest in engaging here
  • MIT/IBM PowerPC work in progress, could form
    functional model
  • But nobody does this for fun - only to advance
    their own research goals

11
Commercial/Existing RTL Cores
  • Originally seen as big benefit of RAMP
  • But didnt turn out that way in practice (except
    for prototyping usage model - see later)
  • Cores dont provide features we need, too big,
    too difficult to modify
  • For simple ISAs (i.e. non-x86), biggest help is
    ISA verification suites, and/or really simple
    synthesizable ISA pipeline to form basis of
    functional model

12
Operating System Support
  • Currently only ProtoFLEX, FAST, RAMP-White
    support OS
  • Others can run one application with proxy
    mechanism for I/O
  • Reflects interests of groups. OS is not primary
    subject of research for groups building models so
    far.
  • RAMP Gold to add support for ParLab OS work
    (Tessellation)
  • Green Flash to add support for HPC-style
    microkernel

13
Target systems
  • From a few, to millions of cores
  • Scaling simulation to 100s of cores was a shared
    goal
  • But smaller core counts (16-128) very interesting
    also
  • Huge core counts (gt1E6) also of interest
  • Single node versus clusters
  • RAMP Blue Internet-in-a-box are message-passing
    clusters
  • Rest are shared-memory systems
  • Memory hierarchy and cache coherence protocols
  • Wide variety of possibilities
  • Desktop/Laptop/Server versus Handheld or SoC
  • What is important to model for given research
    topic?
  • Accelerators/GPUs
  • Even wider variety than CPU ISAs/microarchitecture
    s

14
Wide variety, how to reuse?
  • Proposal
  • ISA functional models
  • also FPU across ISAs
  • Perhaps even common uOP engine across all ISAs?
  • CPU Microarchitecture timing model
  • E.g., in-order superscalar, out-of-order with
    unified physical register file
  • Memory functional model
  • Host-level caches memory interleaving
  • Memory hierarchy timing models
  • On-chip network types as subset
  • I/O bus shims
  • To allow random RTL to be attached for I/O
    devices and non-GPU accelerators
  • This wont be easy, as have to agree on
    interfaces between these components, might need
    further specialization
  • Definitely need more experience doing all of the
    above

15
Simulator Types
  • Functional model only (no timing)
  • RTL models (functional includes timing)
  • Also used for chip prototyping
  • Split functional and timing models
  • Hybrids of above

16
Simulator Mapping Styles
  • Gate-level emulator (Quickturn, Palladium)
  • 1MHz
  • Direct RTL emulator
  • 5-20MHz
  • FPGA-tuned RTL emulator
  • 20-50MHz
  • Virtualized RTL emulator
  • 50-100MHz
  • Host-multithreaded models
  • gt100MHz

17
(No Transcript)
18
  • RAMP Blue Release 2/25/2008
  • design available from RAMP website
  • ramp.eecs.berkeley.edu

19
Climate System Design ConceptStrawman Design
Study
10PF sustained 120 m2 lt3MWatts lt 75M
20
Virtualized RTL Improves FPGA Resource Usage
  • RAMP allows units to run at varying target-host
    clock ratios to optimize area and overall
    performance
  • Example 1 Multiported register file
  • Example, Sun Niagara has 3 read ports and 2 write
    ports to 6KB of register storage
  • If RTL mapped directly, requires 48K flip-flops
  • Slow cycle time, large area
  • If mapping into block RAMs (one readone write
    per cycle), takes 3 host cycles and 3x2KB block
    RAMs
  • Faster cycle time (3X) and far less resources
  • Example 2 Large L2/L3 caches
  • Current FPGAs only have 1MB of on-chip SRAM
  • Use on-chip SRAM to build cache of active piece
    of L2/L3 cache, stall target cycle if access
    misses and fetch data from off-chip DRAM

21
Host Multithreading(Zhangxi Tan (UCB), Chung,
(CMU))
  • Multithreading emulation engine reduces FPGA
    resource use and improves emulator throughput
  • Hides emulation latencies (e.g., communicating
    across FPGAs)

22
Split Functional/Timing Models(HASIM Emer
(MIT/Intel), FAST Chiou, (UT Austin))
Functional Model
Timing Model
  • Functional model executes CPU ISA correctly, no
    timing information
  • Only need to develop functional model once for
    each ISA
  • Timing model captures pipeline timing details,
    does not need to execute code
  • Much easier to change timing model for
    architectural experimentation
  • Without RTL design, cannot be 100 certain that
    timing is accurate
  • Many possible splits between timing and
    functional model

23
RAMP WhiteHari Angepat, Derek Chiou (UT Austin)
  • Scalable Coherent Shared Memory Multiprocessor
  • Support standard shared memory programming models

Leon3 shim
Leon3 shim
Intersection Unit
NIU
Intersection Unit
NIU
Router
Router
AHB shim
AHB shim
AHB bus
AHB bus
MP IntCntrl
DSU
Eth
DDR2
DDR2
RAMP-White
23
24
Multithreaded Func. Timing Models(RAMP Gold
UCB)
Timing Model Pipeline
MT-Channels
MT-Unit
  • MT-Unit multiplexes multiple target units on a
    single host engine
  • MT-Channel multiplexes multiple target channels
    over a single host link

25
CMU Simics/RAMP Simulator
16-CPU Shared-memory UltraSPARC III Server
(SunFire 3800)
BEE2 Platform
25
26
What Hardware Platforms?
  • RTL mapping approaches
  • Need large amounts of logic
  • Selected BEE2, and then designed BEE3 for this
    emulation style
  • Observed that dont need much interconnect
    bandwidth (memory inter-board links) because
    RTL cores are slow and latency sensitive
  • Host-multithreading allows large systems to be
    mapped to small (one?) FPGA (e.g., 64-128 cores
    on ML505)
  • Logic gate count not as critical, need to focus
    on on-chip capacity, off-chip memory bandwidth
    and total memory capacity per FPGA (conventional
    processor memory hierarchy issues multiplied by
    multithreading factor)
  • One big FPGA with lots of fast memory channels
    would be ideal
  • Software functional emulation (FAST) or
    transplant (ProtoFLEX)
  • Focus on fast coherent connection to front-end
    x86 CPU
  • Hypertransport, FSB, QPI interfaces better than
    PCI I/O connections

27
Summary
  • Many reasons for great divergence in RAMP
    projects
  • Different ISAs, different target machines,
    different research topics, different emulation
    styles
  • Sharing possible, but hard work and more
    experience needed
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com