More Quality - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

More Quality

Description:

Table in Rasterizer & Convolve chips. Tokens used to enforce ordering. 11 ... Convolve. Each supports vertical swath of screen. Computes 5x5 filter ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 30
Provided by: anselmo9
Category:
Tags: convolve | more | quality

less

Transcript and Presenter's Notes

Title: More Quality


1
More Quality
  • Anselmo Lastra

2
Topics
  • SAGE (SIGGRAPH 2002)
  • WarpEngine (SIGGRAPH 2000)

3
SAGE (2002)
  • Deering and Nagle, Sun
  • 80M tris/sec, sort-last
  • High-quality antialiasing a key feature
  • Topics
  • Motivation (market issues)
  • Architecture
  • Antialiasing
  • Note There was a SAGE architecture (from IBM
    research) presented at S88

4
Motivation
  • Say display resolution grows lt 10/yr
  • Antialising will increase the effective
    resolution of current display devices
  • Out of reach of single-chip systems
  • Pin-count X pin-data-rate bandwidth-per-chip
    lags well below Moores rate
  • Say that while demand for triangle rate is
    increasing, decrease in size of tris making
    fill-rate demand grow more slowly

5
Block Diagram
6
Command Distribution
  • Master chip
  • DMA to/from host
  • Contains MMU
  • So graphics data can be in virtual mem
  • Primitives distributed to 4 units
  • Load balanced

7
Geometry
  • MAJC
  • Microprocessor Architecture for Java Computing
  • Meant for media processing
  • 2 VLIW CPUs/chip
  • 4 Functional-Units/Processor

8
Rasterizer
  • Programmable non-uniform sample locations
  • 8-16x extra fill
  • 256MB texture mem
  • Replicated for polygon rend (not for volrend)
  • Nothing said about programmable shading
  • 2 output buses

9
3DRAM or FBRAM
  • Deering, S94
  • Render pins write only (no R/W)
  • Z buffering on chip
  • Multiple chips communicate to agree on z value,
    and z-color write
  • Blending ops for color
  • Video out pins

10
Sched Chips
  • Sort-Last
  • Sample buffer interleaved
  • Chips interleaved by sub-samples
  • Green circles are programmable switches
  • Allow data to flow from a single input for a time
    to ensure cache coherence
  • Fixed of samples
  • So list is implicit
  • Table in Rasterizer Convolve chips
  • Tokens used to enforce ordering

11
Route
  • Scans out subsamples
  • 640 pins
  • 20 per 3DRAM
  • 10 Route chips as 2-bit slices
  • 40 bits/sample

12
Convolve
  • Each supports vertical swath of screen
  • Computes 5x5 filter
  • Overlaps neighbors by 2 pixels
  • Aggregate data rate 8GB/s

13
Convolve (2)
  • Buffers 6 swaths
  • 64 sample locations
  • 6-bit offsets
  • 2D hash to permute access
  • Sample location table kept on chip
  • May be different than one in Rasterizers because
    those chips are working on next frame
  • Sample locations can change per frame
  • Filter coefficients computed dynamically
  • Filters programmable but must be radially
    symmetric

14
Subsamples
  • 16 samples/pixel
  • Faint red lines are pixels
  • 11 triangles
  • Green lines

15
Example
16
Closeup
17
Reference
  • Michael F. Deering, Stephen A. Schlapp, Michael
    G. Lavelle, FBRAM a new form of memory optimized
    for 3D graphics, SIGGRAPH94

18
WarpEngine
  • Hardware architecture for rendering from depth
    images
  • We didnt built it
  • Voicu Popescus PhD
  • Popescu, Voicu, John Eyles, Anselmo Lastra, Josh
    Steinhurst, Nick England and Lars Nyland, "The
    WarpEngine An Architecture for the
    Post-Polygonal Age, Proceedings of SIGGRAPH
    2000, New Orleans, July 2000, 433-442

19
Rendering Algorithm
  • WarpEngine algorithm
  • Interpolate between reference image samples
  • Warp (transform) them forward to image space
  • Z-composite into sub-pixel (2x2) warp buffer
  • No interpolation
  • across skins

20
Forward vs. Backward Map
  • Conventional scan conversion
  • For each pixel, compute color
  • Basically backward map
  • WarpEngine
  • Warp sample forward

21
Offsets Make it Work
  • 2-bit offset
  • More precise sample location
  • 2-pixel wide filter kernel
  • Similar to sparse buffer

Blue pixel, Green warp buffer, Black - offset
22
Inexpensive Antialiasing
2 x 2 Offset
No Offset
Zoomed
23
Why Forward Map?
  • Low setup cost!
  • No edge-expression computation
  • Exploits coherence
  • IBR tile (16x16 image) tends to need same
    interpolation factor
  • Can use efficient SIMD warper

24
Architecture
25
WarpArray
  • Nearest neighbor connectivity
  • In/Out/Warp pipelined
  • Similar to PixelFlow design

26
Region Accumulator
  • Pixel interleaved
  • 128 x 128
  • Soft z?
  • Reconstruction pipelined with next region
    rendering

27
Sort First for Parallelism
  • How to distribute work across chips?
  • Sort by screen space regions
  • 128x128 pixel region
  • Sort First Mueller refers to sorting primitives
    as soon as possible
  • Tile coherence lowers overlap factor

28
Expected Chip Specs (2000)
  • ASIC 12x16 mm
  • 0.18 micron
  • ? 300 MHz
  • 4-node VGA
  • 32-node HDTV
  • Each chip
  • 100M Samples/sec
  • 4.8G Bytes/sec bandwidth

Simulation on video
29
WarpEngine References
  • Voicu Popescu, John Eyles, Anselmo Lastra, Josh
    Steinhurst, Nick England and Lars Nyland,
    WarpEngine An Architecture for the
    Post-Polygonal Age, Proceedings of SIGGRAPH 2000,
    New Orleans, July 2000, 433-442
  • Voicu Popescu, Anselmo Lastra, John Eyles,
    Sort-First Parallelism for Image-Based Rendering,
    Proceedings of Eurographics Workshop on Parallel
    Graphics and Visualization, Girona, Spain,
    September 2000.
  • Popescu, Voicu, Anselmo Lastra, The Vacuum
    Buffer, Proceedings of the 2001 ACM Symposium on
    Interactive 3D Graphics, Research Triangle Park,
    NC, March 19-21, 2001, 73-76.
Write a Comment
User Comments (0)
About PowerShow.com