Climate Machine Update - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Climate Machine Update

Description:

Identify application, then tailor machine using semi-custom design ... Start with building blocks from embedded designs rather than full custom ASIC ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 16
Provided by: nersc4
Category:

less

Transcript and Presenter's Notes

Title: Climate Machine Update


1
Climate Machine Update
  • David Donofrio
  • RAMP Retreat
  • 8/20/2008

2
Agenda
  • Project Overview
  • Tensilica Architecture and Design Flow
  • Tensilica Tools Demo
  • Why we need RAMP
  • Current Progress
  • Next Steps

3
A New Approach to HPC
  • Current HPC Design approach
  • Leverage commodity processors from Intel, AMD,
    etc
  • Once machine is built, optimize problems to run
    on it
  • Power wall prevents scaling to exaflop
    performance
  • Power is the new design point

Olukotun and Sutter
Moores Law still in effect - but number of
processors double every 18 months rather than
clock rate
4
A New Approach to HPC
  • Our approach
  • Identify application, then tailor machine using
    semi-custom design
  • Optimize CPU architecture and further extend with
    semi-custom ISA
  • Leverage auto-tuning to access architecture
    specific optimizations
  • Even if each simple core is 1/4 as
    computationally efficient as a complex core you
    can fit hundreds on a single die and be 100x more
    power efficient
  • Learn from embedded market where Flops / Watt and
    rapid design cycles are crucial
  • Start with building blocks from embedded designs
    rather than full custom ASIC
  • Preserve ability to run general purpose C code
  • Application Target 1km Scale Climate Model
  • Tailor machine architecture to application to
  • reduce waste

5
Climate Model Resource Requirements
  • DOE has identified high-resolution climate
    modeling as a leading justification for exascale
    computing
  • Must express 20M way parallelism
  • Requires performance of 200 Pflops peak
  • Simulation must run 1000x faster than real time
  • Amenable to massively concurrent architectures
    composed of power efficient embedded cores.
  • Actively working with the climate science
    community to enable new Icosahedral model

NASA
Randall / CSU
6
Tensilica Processor Design Flow
  • Complete Solution Hardware, Software and
    Verification
  • Fully customizable
  • Required base ISA ensures general purpose
    applications
  • Processor configuration submitted to Tensilicas
    servers where synthesis is performed
  • Returned design can be spun for ASIC or FPGA
  • Bit file available for Avnet boards
  • Building block approach drastically reduces
    design cycle time compared to full-custom design

Tensilica Inc.
7
Tensilica Architecture Features
  • Verilog-like TIE language allows for custom ISA
    extensions
  • Functional and performance verification built in
  • Auto generated compiler intrinsics
  • 64-bit IEEE-DP floating point coded up in TIE and
    available
  • Custom VLIW support
  • Inter-processor communication easily enabled
    through
  • TIE Ports
  • TIE Queues
  • Access to direct HW support for interprocessor
    communication
  • TIE Lookups
  • Allows interface to external ROMs or other RTL
    block

8
Tensilica Architecture Overview
Tensilica Inc.
9
Tensilica Performance Debug
  • Processor viewed as black box
  • State can be compressed (via HW) and pushed out
    JTAG port
  • Intended for program replay
  • Xtensa trace port gives real-time visibility
    into internal pipeline state with unprecedented
    detail
  • hit miss with virtual address
  • Branch taken / not taken
  • Call / return
  • Resource dependency
  • Etc
  • Opportunity for hundreds
  • of performance counters
  • to be made available

Tensilica Inc.
10
Tensilica Tools Demo
11
Why we need RAMP
  • Fast, accurate emulation enables
  • Dual nested loop of HW / SW co-design
  • Preliminary work using Stanford SM sim shows
    significant improvement in power eff. using
    automated HW/SW co-tuning
  • RAMP critical to accelerate
  • Rapid prototyping and analysis of Tensilica
    architectural options
  • Inter-processor communication architecture
    exploration
  • Running FULL climate code providing a more
    complete performance picture
  • Cycle accurate simulator currently running at
    100 kHz vs. 50MHz on V5
  • Extensive HW performance counter data enables an
    emulation environment with similar resolution but
    much greater speed

Tensilica provided emulation environment
kick-starts this effort
12
Current Status
  • ML505 used for initial design exploration
  • Basic xtensa processor JTAG and memory
    controller is 50 of a Virtex 5 50t
  • Runs at 50MHz
  • ASIC in 65G process runs at 650MHz
  • OnChip Debug working
  • Can load / run programs using main memory
    synthesized from BRAM
  • DRAM interface coded - currently being debugged
  • RTL license recently obtained - full simulation
    environment (in ModelSim) being brought up

13
Next Steps
  • Transition to BEE3 from ML505
  • Bring up XTOS environment on single xtensa
    processor on BEE3
  • Run single column of climate code on single
    processor
  • Demo at SC08 in November
  • Continue HW / SW co-tuning optimization
  • Begin multi-processor emulation
  • Emulation of single socket, 32 core, using
    networked BEE3s
  • Running full 2 Million line climate model

14
Backup
15
The Need for Exascale Computing
  • DOE has identified high-resolution climate
    modeling as leading justification for exascale
    computing
  • 1 km resolution targeted for accurate cloud
    resolving model
  • Difficult to scale existing systems
  • HPC design using commodity processors estimated
    to draw 179MW
  • BlueGene design estimated to draw 20MW
  • Leveraging embedded cores and more application
    specific design a power envelope of 3-5MW is
    projected

Randall / CSU
LBNL will seek an external vendor to build the
machine if our approach is proven valid - LBNL is
not entering the commercial HPC market.
Write a Comment
User Comments (0)
About PowerShow.com