HDR Design Presentation - PowerPoint PPT Presentation

About This Presentation
Title:

HDR Design Presentation

Description:

Each ROP on nVidia's g80 is capable of processing 4 ... Bitwise inv 2. 7. B. 7. R. 7. G. Transistor count. 30,162. Total. 1,224. 434. 4,224. 23,032 (5758) ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 14
Provided by: ece64
Category:

less

Transcript and Presenter's Notes

Title: HDR Design Presentation


1
HDR- Design Presentation
  • Team M1
  • Emeka Ezekwe (M11)
  • Chris Thayer (M12)
  • Shabnam Aggarwal (M13)
  • Charles Fan (M14)

Team M1 Manager Matthew Russo
2
Status
  • Complete
  • Specification definition
  • Block Diagram
  • C Implementation
  • Verilog (Structural almost done, not yet tested)
  • Incomplete
  • Schematic
  • Layout
  • Testing

3
HDR in the G80 GPU
  • Our decoder is designed to interface between
    specially encoded textures stored on the GPUs
    memory and one of the GPUs ROPs (Render Output
    Unit)
  • Each ROP on nVidias g80 is capable of processing
    4 pixels per clock cycle. We plan for our
    hardware to decode the texture information for 4
    pixels during each clock cycle.
  • This decoder will allow smaller textures to be
    stored in the GPUs memory, which will allow
    graphics cards to provide the same functions with
    less memory.
  • Ultimately, this decoder can provide savings in
    cost, power consumption, heat dissipation, and
    size in current graphics cards.

4
Design objectives and contstraints
  • Shooting for 400 Mhz (2 or 3 pipeline stages)
  • Speed is clearly our goal, but power and size are
    also important.
  • minimize these after maximizing speed
  • 4 pixels per cycle, 4 cycles per block
  • no wasted cycles like before when storing special
    luminance values

5
Design Decisions
  • Removed Module to store Nzeros and Lbias
  • This has increased our input count from 97 to 104
  • Removed denormal support in the floating point
    multipliers.
  • Integer Multiplication is done by Wallace trees
    and Booth recoding.
  • still a maybe. need to see how they layout
  • Critical adders are going to be Carry select.

6
Updated Block Diagram
8
Reg
Compute 1 pixel
Reg
16
Serialize output
7
Reg
Find G
Compute 1 pixel
Int to FP
Reg
16
Serialize output
7
Reg
4
Compute 1 pixel
Reg
Reg
16
Serialize output
4
Reg
4
Reg
Compute 1 pixel
Reg
16
Serialize output
4
Reg
7
Compute 1 pixel
Lbias
5
11
6
Nzeros
(Int)
Lump
4
11
11
Shift gtgt
7
11
11-bit FP MULT
R FP
11
Rp
11
11-bit FP MULT
11
G FP
Gp
11
11
11-bit FP MULT
Bp
11
B FP
11
11
8
FP multiplier
11
11
5
6
Vdd
6
nor
nor
5
Vdd
7
7
I-
nor
I-
1
14
ovrflw
1
5
1
gtgt
4
1
or
0s
9
1
mux
10
0s
1s
mux
mux
10
5
out
15
9
I-FP
50
3
7
0 Counter
I
6
50
11 bit FP 2CE
m
ltlt
106
20
3 bit 2s Comp Exp
3

7
5
43
10
Find G
7
R
7
R
7-bit Adder
7
Bitwise inv
G
2
B
7
7
B
11
Transistor count
this is assuming 49 FA which is an upper bound
12
Initial Floorplan
4
16
Reg
Reg
Compute 1 pixel
Compute 1 pixel
Serial output
4
Reg
Reg
7
16
Reg
Serial output
8
Find G
Int to FP
Reg
Reg
16
Serial output
7
Reg
Compute 1 pixel
Compute 1 pixel
Reg
4
Reg
16
Serial output
4
Reg
13
Problems and Questions
  • Pipelining
  • looks like one pipeline stage inside the fp
    multiplyer and another just before it will do
    well. Need to make sure.
  • Alternate designs for I-FP
  • looks like ROM is the way. Faster, only need 3 of
    them (or one triple-ported ROM) instead of 12
    like we thought.
  • How well do wallace trees layout?
  • carry save multipliers are known to layout very
    well may simplify pipelining
Write a Comment
User Comments (0)
About PowerShow.com