Joe Gebis

About This Presentation

Title:

Joe Gebis

Description:

Hercules DRC and LVN. Avant! StarRC parasitic extraction ... Hercules DRC. Synopsys PrimeTime static timing analysis. Joe Gebis, IRAM Retreat, Summer 2000 ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 27

Provided by: csBer

Category:

more less

Transcript and Presenter's Notes

Title: Joe Gebis

1
IRAM Chip Status

Joe Gebis
Computer Science Division
University of California, Berkeley
gebis_at_cs.berkeley.edu
http//iram.cs.berkeley.edu

2
Outline

Overview of VIRAM-1 organization
Hardware status
CAD plan

3
VIRAM1 Block Diagram
Flag Unit 0
Flag Unit 1
Flag Register File (512B)
Arithmetic Unit 0
Arithmetic Unit 1
32B
32B
Vector Register File (8KB)
SysAD IF
8B
8B
TLB
Memory Unit
32B
DMA
Memory Crossbar
JTAG IF

JTAG
DRAM0 (2MB)
DRAM1 (2MB)
DRAM7 (2MB)
4
VIRAM1 Vector Units
Datapath
Lane
Vector Registers
Functional Unit
Memory Unit

4 partitionable 64-bit lanes
2 arithmetic functional units (one FP), 2 flag
processing
Lane provides basic unit of design, replication

5
VIRAM1 Layout

IBM SA-27E process
.18mm, 6 copper layers
290 mm2 area
150M transistors
1.2V logic, 1.8V DRAM
2W Power consumption

6
Scalar Core Status

Have synthesizable MIPS64 5Kc core
Will run at 200 MHz
Has 8KB instruction and data caches
Caches will be compiled by IBM

7
Vector Integer Unit Status

Complete
Design of all blocks
Layout of subblocks
Partially done
Assembling block component
Remaining
Final assembly

Logical Unit
Multiplier

Shifter
Rounder
shamt
Design complete, basic subblocks layout done
Adder
zero detect
Design complete, components ready for assembly
Saturate
8
Vector Register File

Have a register file from Transmeta which was
successfully fabbed in the same process
Using the complete Transmeta register file?
Contains shadow registers we couldnt use
Has more ports than we need
Would require combining 8 duplicates
Use the Transmeta bit cell?
It is larger than it needs to be for our purposes
Build our own bit cell?
Possibly a significant amount of work

9
Control

Small changes to work with new MIPS core
Working model of the vector unit complete
Some small glue logic remains to be able to do
complete simulation with core and on-chip DRAM

10
Floating Point Vector Unit

Synthesizable Verilog received from MIT RAW
architecture group
FPU as received not fully IEEE compliant
Required some changes to work with core

11
Crossbar Design
DRAM 0 Port
DRAM 1 Port
DRAM 2 Port
DRAM 3 Port
256-bit load crossbar
Scalar / DMA Port
256-bit store crossbar
VL0 Port
VL1 Port
VL2 Port
VL3 Port
12
Memory and Crossbar

Model for DRAM controllers complete
Crossbar design is complete, layout progressing
Crossbar issues
Switches only 64-bit words
Operates at 1.2V, contains level shifters to
interface to 1.8V DRAM
Segmented with repeaters at approximately 2mm
intervals
5 ns cycle time, interfaces to DRAM without
additional subclocks

13
CAD Plan - Synthesized Blocks

Synopsys VCS Verilog compiler and environment
Synopsys Design Compiler
Synopsys Module Compiler
Avant! Apollo place route
Synopsys PrimeTime static timing analysis

14
CAD Plan - Custom Blocks

Cadence layout editor
Cadence schematic editor
Avant! Hercules DRC and LVN
Avant! StarRC parasitic extraction
Avant! Hspice
Synopsys TimeMill dynamic timing
Synopsys PowerMill power consumption simulation

15
CAD Plan - Integrated Blocks

Avant! Apollo place route
Avant! Hercules DRC
Synopsys PrimeTime static timing analysis

16
CAD Plan - Other Blocks

Cache blocks
IBM SRAM compilers
Functional verification
Synopsys VCS

17
Remaining Work

Some design, layout work remains
Synthesizing blocks
Verification
Tapeout planned for late fall

18
(No Transcript)
19
Vector Execution Model
Scalar Execution
Vector Execution
r1
vector length
add r3, r1, r2
add.vv v3, v1, v2
20
Vector Architectural State
21
VIRAM ISA Extensions
Scalar
MIPS64 scalar instruction set
Vector ALU
All ALU / mem operations under mask
Vector Memory
Vector Register
Plus flag, convert, fixed-point, and transfer
operations
22
Fixed-point Arithmetic

Multiply upper or lower halves, shift and round
Add/Sub and saturate
Shift right and round, shift left and saturate
All combinations of multiply and add/sub
instructions
Saturate to narrower width

23
Multiplier Partitioning
16-bit Multiplier Block
result150
result3116
16-bit adder
24
Scaling

Scaled-down version from the original
Vector unit with same control
Or scale up for future versions

25
Scalar Core

Synthesizable core from MIPS
64 bit (MIPS64 ISA)
6 stage
Single instruction issue
8 kB direct-mapped D/I cache
Has coprocessor interface used for vector unit
access and FPU

26
Floating-point datapath

Single precision
Contains add, sub, mul, div, compare, convert,
truncate
Does not contain mul-add, sqrt
Only supports round to nearest even mode
Fully pipelined
3 cycle latency for add/sub/mul/compare/convert
10 cycle latency for divide, 8 cycle repeat rate
Fast execution mode
exceptions for each element noted in flag
register exception raised at the end of
instruction execution
Precise execution mode
Following FP instructions are stalled early in
the pipeline until execution of previous
instruction is complete and any exceptions are
raised
Operates at half performance (0.8 GFLOPS)