Title: A View-Independent Graphics Rendering Architecture
1A View-Independent Graphics Rendering
Architecture
Graphics Hardware 2004
Grenoble, France
Jason Stewart, Eric P. Bennett, and Leonard
McMillan Presented by Anselmo Lastra University
of North Carolina at Chapel Hill
2Why View-Independence?
- Decouples Rendering from Viewing
- Eliminates latency
- Provides uniform framerates
- Allows increased shading complexity
- Needed for future applications
- Shared multi-user virtual environments
- True three-dimensional (Autostereo) displays
3True 3-Dimensional Displays
- Promising 3-D Display Technologies
- Lenticular, and flys-eye optics
- Barrier-based methods
- Reflective optics
- Holographic optics
- Technology is Maturing
- Problem
- How to generatethe content?
- Requires 1000s of simultaneous views
4Using Todays Architecture
- I guess you could buy 1024 GPUs
- 10 years of Moores law would yield
- 4 doublings in performance (only need 64 GPUs)
- At least 2 doublings in power (only needs 10 KW)
- There has to be a better way
- Traditional graphics architectures are
inefficient for view-independent graphics
5Previous Work
- Low Latency Rendering
- 3D Light field Viewing H/W Regan 99
- Frameless Redering Bishop 94
- Just-in-time Pixels Mine Bishop 93
- View-Independent Rendering
- Multiple viewpoint rendering Halle 98
- 4D Parameterization
- Light Field Levoy Hanrahan 96, Lumigraph
Gortler 96 - Micropolygon Rasterization
- Reyes Cook 87
- Reyes streaming H/W pipeline Owens 02
6Rearchitecting the Pipeline
- Classic view-dependent pipeline
Geometry
Rasterize
VertexProcessing
FragmentProcessing
Visibility
2DFramebuffer
Scanout
7Rearchitecting the Pipeline
- Proposed view-independent pipeline
Host PC
Geometry
FragmentProcessing
Subdivideinto points
Point-Casting
Scatter
4DFramebuffer
Scanout
Hardware Prototype
8PixelView Prototype
9PixelView Prototype
- 100 Mhz XILINX SpartanII-E FPGA
- 300k Gates
- 16MB100MHzSDRAM
- 5000 lines of Verilog
- 6000 linesof C onhost PC
10PixelView Demo
11System Partitioning
- The prototype pipeline implementation
Host PC
Geometry
FragmentProcessing
Subdivideinto points
Point-Casting
Scatter
4DFramebuffer
Scanout
Hardware Prototype
12PixelView 4D Framebuffer
- 16 MBytes of Framebuffer memory
- Reconfigurable (ex. 8x8x256x256x(rgb z))
- 16 bits (5/6/5) rgb
- 16 bits z
13View Selection
- Every view is just a 2D planar slice through the
4D framebuffer - Which, after some simplifying assumptions,reduces
to
14Linear Expression Evaluators
- Simple datapath replicated for each ofs, t, u,
and v - Pixel rate
- TrivialH/W cost
- Easy toparallelize
- Drop inreplacement for traditional scanout
15Memory Access Patterns
- For each view the LEE generates s(i,j),
t(i,j), u(i,j), v(i,j)
16Scan out Performance
- Typical lt 10 Memory bandwidth required for scan
out - 640x480 VGA
- 100 MHz SDRAM, and order of magnitude behind the
state of the art (DDR _at_ 500MHz) - Can easily support multiple simultaneous views
17Filling the Framebuffer
- Elemental Rendering Primitive is the Outgoing
Radiance from a Point
18Point Casting
- Instead of the natural planar radiance
parameterizationabout each point - We align with parameterization planes
- Simplifies mapping
19In Other Words
- Parameterize outgoing radiance on fixed planes
and resample it.
q
20Unexplored View Coherence
- Outgoing radiance from a point is smoother than
spatial variations - Todays architectures do not exploit this
- Still amplespatial coherence
- We support 2formats
- Uniform color
- Spatially varying
21Unexplored Coherence
- Outgoing radiance from a point is smoother than
spatial variations - Todays architectures do not exploit this
View-dependency
22Geometry Subdivision
- Subdivide until primitive is point-sized
- Backward compatibility with polygons
- Reminiscent of Reyes rendering pipeline
- Every primitive requires a world-spacesubdivision
method - However, Reyes subdivision is view-dependent
(the stopping criterion is based on pixel grid) - Probably better methods
23Prototype Limitations
- Points are transferred via USB 1.1
- Achieved 80,000 points/second (which means
5,120,000 rays/second) - Each pointcast requires at least 64 reads
- Requires ? 17 of memory B/W
- Could easily include 4 or more parallel
point-casting units - Entire design uses ? 23 of chip
24A Practical PixelView System
- The prototype demonstrates feasibility, but what
would a real system entail? - Improve
- Scalability
- Field of View
- Subdivision Techniques
- Output Bandwidth
25Distributing the Frame Buffer
26Expanding Field of View
- 6 Slabs _at_ 64x64x1024x1024x(88) 64 GBytes
- But 1MB was huge for a 64k PDP-11
27Addressing Output Bandwidth
- Currently we can support only a handful of
dynamic views out of the framebuffer - An Autostereoscopic Displays would require every
pixel on every frame - High-speed interconnects are available gt5GHz per
pin without compression
28Improving Subdivision
- Our conservative subdivision methods oversample
by a factor of 4 or more after factoring out
depth complexity - Fast, on-the-fly, hardware friendly, uniform
subdivision would be great
29Conclusions
- PixelView simultaneously supports low latency and
complex shading - PixelView supports a wide range of primitives and
IBR data structures - PixelView is scalable to
- A full field of view
- High resolutions
- Multi-user environments
- PixelView can power the next generation of
display technologies
30Thank You Questions?
A View-Independent Graphics Rendering Architecture