CS184a: Computer Architecture Structures and Organization - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

CS184a: Computer Architecture Structures and Organization

Description:

Add buffers to LUT LUT path to match interconnect register requirements. Retime to C=1 as before. Buffer chains force enough registers to cover interconnect delays ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 45

Provided by: andre576

Learn more at: https://www.seas.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS184a: Computer Architecture Structures and Organization

1
CS184aComputer Architecture(Structures and
Organization)

Day16 November 15, 2000
Retiming Structures

2
Last Time

Saw how to formulate and automate retiming
start with network
calculate minimum achievable c
c cycle delay (clock cycle)
make c-slow if want/need to make c1
calculate new register placements and move

3
Today

Systematic transformation for retiming
justify mandatory registers in design
Retiming in the Large
Retiming Requirements
Retiming Structures

4
HSRA Retiming

HSRA
adds mandatory pipelining to interconnect
One additional twist
long, pipelined interconnect
? need more than one register on paths

5
Accommodating HSRA Interconnect Delays

Add buffers to LUT?LUT path to match interconnect
register requirements
Retime to C1 as before
Buffer chains force enough registers to cover
interconnect delays

6
Accommodating HSRA Interconnect Delays
7
Retiming in the Large
8
Align Data / Balance Paths
Day3 registers to align data
9
Systolic Data Alignment

Bit-level max

10
Serialization

Serialization
greater serialization gt deeper retiming
total same per compute larger

11
Data Alignment

For video (2D) processing
often work on local windows
retime scan lines
E.g.
edge detect
smoothing
motion est.

12
Image Processing

See Data in raster scan order
adjacent, horizontal bits easy
adjacent, vertical bits
scan line apart

13
Wavelet

Data stream for horizontal transform
Data stream for vertical transform
Nimage width

14
Retiming in the Large

Aside from the local retiming for cycle
optimization (last time)
Many intrinsic needs to retime data for correct
use of compute engine
some very deep
often arise from serialization

15
Reminder Temporal Interconnect

Retiming ? Temporal Interconnect
Function of data memory
perform retiming

16
Requirements not Unique

Retiming requirements are not unique to the
problem
Depends on algorithm/implementation
Behavioral transformations can alter significantly

17
Requirements Example
QABCDEF

For I ? 1 to N
t1I ?AIBI
For I ? 1 to N
t2I ?CIDI
For I ? 1 to N
t3I ?EIFI
For I ? 1 to N
t2I ?t1It2I
For I ? 1 to N
QI ?t2It3I

For I ? 1 to N
t1 ?AIBI
t2 ?CIDI
t1 ?t1t2
t2 ?EIFI
QI ?t1t2
left gt 3N regs
right gt 2 regs

18
Retiming Structure and Requirements
19
Structures

How do we implement programmable retiming?
Concerns
Area l2/bit
Throughput bandwidth (bits/time)
Latency important when do not know when we will
need data item again

20
Just Logic Blocks

Most primitive
build flip-flop out of logic blocks
I ?D/Clk IClk
Q ?Q/Clk IClk
Area 2 LUTs (800K?1Ml2/LUT each)
Bandwidth 1b/cycle

21
Optional Output

Real flip-flop (optionally) on output
flip-flop 4-5Kl2
Switch to select 5Kl2
Area 1 LUT (800K?1Ml2/LUT)
Bandwidth 1b/cycle

22
Output Flip-Flop Needs

Pipeline and C-slow to LUT cycle
Always need an output register

Average Regs/LUT 1.7, some designs need 2--7x
23
Separate Flip-Flops

Network flip flop w/ own interconnect
can deploy where needed
requires more interconnect
Assume routing goes as inputs
1/4 size of LUT
Area 200Kl2 each
Bandwidth 1b/cycle

24
Deeper Options

Interconnect / Flip-Flop is expensive
How do we avoid?

25
Deeper

Implication
dont need result on every cycle
number of regs gtbits need to see each cycle
gt lower bandwidth acceptable
gt less interconnect

26
Deeper Retiming
27
Output

Single Output
Ok, if dont need other timings of signal
Multiple Output
more routing

28
Input

More registers (K?)
7-10Kl2/register
4-LUT gt 30-40Kl2/depth
No more interconnect than unretimed
open compare savings to additional reg. cost
Area 1 LUT (1Md40Kl2) get Kd regs
d4, 1.2Ml2
Bandwidth 1b/cycle
1/d th capacity

29
HSRA Input
30
Input Retiming
31
HSRA Interconnect
32
Flop Experiment 1

Pipeline and retime to single LUT delay per cycle
MCNC benchmarks to 256 4-LUTs
no interconnect accounting
average 1.7 registers/LUT (some circuits 2--7)

33
Flop Experiment 2

Pipeline and retime to HSRA cycle
place on HSRA
single LUT or interconnect timing domain
same MCNC benchmarks
average 4.7 registers/LUT

34
Input Depth Optimization

Real design, fixed input retiming depth
truncate deeper and allocate additional logic
blocks

35
Extra Blocks (limited input depth)
Average
Worst Case Benchmark
36
With Chained Dual Output
can use one BLB as 2 retiming-only chains
Average
Worst Case Benchmark
37
HSRA Architecture
38
Register File

From MIPS-X
1Kl2/bit 500l2/port
Area(RF) (d6)(W6)(1Kl2ports 500l2)
wgtgt6,dgtgt6 Io2 gt 2Kl2/bit
w1,dgtgt6 Io4 gt 35Kl2/bit
comparable to input chain
More efficient for wide-word cases

39
Xilinx CLB

Xilinx 4K CLB
as memory
works like RF
Area 1/2 CLB (640Kl2)/16?40Kl2/bit
but need 4 CLBs to control
Bandwidth 1b/2 cycle (1/2 CLB)
1/16 th capacity

40
Memory Blocks

SRAM bit ? 1200l2 (large arrays)
DRAM bit ? 100l2 (large arrays)
Bandwidth W bits / 2 cycles
usually single read/write
1/2A th capacity

41
Disk Drive

Cheaper per bit than DRAM/Flash
(not MOS, no l2)
Bandwidth 10-20Mb/s
For 4ns array cycle
1b/12.5 cycles _at_20Mb/s

42
Hierarchy/Structure Summary

Memory Hierarchy arises from area/bandwidth
tradeoffs
Smaller/cheaper to store words/blocks
(saves routing and control)
Smaller/cheaper to handle long retiming in larger
arrays (reduce interconnect)
High bandwidth out of registers/shallow memories

43
Big IdeasMSB Ideas

Can systematically justify registers in
architecture (interconnect, FU pipeline)

44
Big IdeasMSB Ideas

Tasks have a wide variety of retiming distances
Retiming requirements affected by high-level
decisions/strategy in solving task
Wide variety of retiming costs
100 l2?1Ml2
Routing and I/O bandwidth
big factors in costs
Gives rise to memory (retiming) hierarchy

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

CS184a: Computer Architecture Structures and Organization PowerPoint PPT Presentation

CS184a: Computer Architecture Structures and Organization - 'Cartoon' VLSI Area Model (Example artificially small for clarity) ... Larger 'Cartoon' 1024 LUT. Network. P=0.67. LUT Area 3% Caltech CS184a Fall2000 -- DeHon ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) - return end of class in basket. or later to Cynthia (256 JRG) Caltech CS184a ... `Science is the belief in the ignorance of experts.'' -- Richard Feynman ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture Structures and Organization - Including how to map to them. Saw how to reuse resources at maximum ... list schedule, anneal. Caltech CS184a Fall2000 -- DeHon. 25. Multicontext Data Retiming ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) - and why they don't work. Characterizing Interconnect ... Resuming... Caltech CS184a Fall2000 -- DeHon. 15. Rent's Rule. Typically consider. 0.5 P 0.75 ' ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) - (2) Crossbar. Avoid bottleneck. Every output gets its own interconnect channel ... Can't afford full crossbar. Need to exploit locality. Can't have everything close ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) - Coming Attractions. Administrivia. Big Ideas. MSB. MSB-1. Caltech CS184a Fall2000 -- DeHon ... Coming Attractions: Three Talks by Tom Knight. Thursday 4pm (102 Steele) ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structure and Organization) PowerPoint PPT Presentation

CS184a: Computer Architecture (Structure and Organization) - Lower Upper Bound: 22M functions realizable by M-LUT. Say Need n 4-LUTs to cover; compute n: ... Upper Bound: (M-k)/log2(k- log2(k)) 1. Caltech CS184 ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) - minimum area (one study, see paper) K=10, N=12, M=3. A(PLA 10,12,3) ... Questions about homework. Caltech CS184a Fall2000 -- DeHon. 29. Big Ideas [MSB Ideas] ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) - Just starting to look at balancing interconnect and logic. Caltech CS184a Fall2000 -- DeHon ... Better results if 'reassociate' rather than keeping original subtrees. ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) - CS184a: Computer Architecture (Structures and Organization) Day20: November 29, 2000 Review Today Review content and themes N.B. EOT Feedback Questionnaire return end ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 5th Edition PowerPoint PPT Presentation

William Stallings Computer Organization and Architecture 5th Edition - William Stallings Computer Organization and Architecture 5th Edition Chapter 11 CPU Structure and Function CPU Topics Processor Organization ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition PowerPoint PPT Presentation

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 11 Instruction Sets: Addressing Modes and Formats Addressing Modes Immediate Direct ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structures and Organization) - CS184a: Computer Architecture (Structures and Organization) Day1: September 25, 2000 Introduction and Overview Today Matter Computes Architecture Matters This Course ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 10 Instruction Sets: Characteristics and Functions What is an instruction set? | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 1 Introduction Architecture & Organization 1 Architecture is those attributes visible to ... | PowerPoint PPT presentation | free to view

CS184a: Computer Architecture (Structure and Organization) - ... wired-OR Wired-or Connect series of inputs to wire Any of the inputs can drive the wire high Wired-or Implementation with ... of Technology Other titles: Times ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 8th Edition PowerPoint PPT Presentation

William Stallings Computer Organization and Architecture 8th Edition - William Stallings Computer Organization and Architecture 8th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 18 Parallel Processing Multiple Processor Organization Single instruction, single data ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 2 Computer Evolution and Performance A brief history of computer The first Generation ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 1 Introduction Architecture & Organization 1 Architecture is those attributes of a system ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 8th Edition - William Stallings Computer Organization and Architecture 8th Edition Chapter 7 Input/Output | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 7th Edition PowerPoint PPT Presentation

William Stallings Computer Organization and Architecture 7th Edition - William Stallings Computer Organization and Architecture 7th Edition Chapter 1 Introduction | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 7th Edition - William Stallings Computer Organization and Architecture 7th Edition Chapter 10 Instruction Sets: Characteristics and Functions | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 6th Edition - William Stallings Computer Organization and Architecture 6th Edition Chapter 11 ... Organization and Architecture 6th Edition Addressing Modes Immediate ... | PowerPoint PPT presentation | free to view

William Stallings Computer Organization and Architecture 7th Edition - William Stallings Computer Organization and Architecture 7th Edition Chapter 5 Internal Memory ... | PowerPoint PPT presentation | free to view