GPU - PowerPoint PPT Presentation

About This Presentation
Title:

GPU

Description:

Title: NSF CARGO: Multi-scale Topological Analysis of Deforming Shapes APES (Analysis and Parameterization of Evolving Shapes) Author: george burdell – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 13
Provided by: george720
Category:
Tags: gpu | buffer

less

Transcript and Presenter's Notes

Title: GPU


1
GPU
  • Precision, Power, Programmability
  • CPU x60/decade, 6 GFLOPS, 6GB/sec
  • GPU x1000/decade, 20 GFLOPs, 25GB/sec
  • Arithmetic heavy (read OR write) faster hardware
  • Parallelization
  • Multi-billion entertainment market drives
    innovation
  • 32-bit Floating point
  • Programmable (graphics, physics, general purpose
    data-flow)
  • Cant simply port CPU code to GPU
  • David Luebke et al. GPGPU, SIGGRAPH 2004

2
History of the 3D graphics industry
  • 60s
  • Line drawings, hidden lines, parametric surfaces
    (B-splines)
  • Automated drafting machining for car,
    airplane, and ships manufacturers
  • 70s
  • Mainframes, Vector tubes (HP)
  • Software Solids, (CSG), Ray Tracing, Z-buffer
    for hidden lines
  • 80s
  • Graphics workstations (50K-1M) Frame buffers,
    rasterizers , GL, Phigs
  • VR CAVEs and head-mounted displays
  • CAD/CAM GIS CATIA, SDRC, PTC
  • Sun, HP, IBM, SGI, ES, DEC
  • 90s
  • PCs (2K) Graphics boards, OpenGL, Java3D
  • CADVideogamesAnimations AutoCAD, SolidWorks,
    Alias-Wavefront
  • Intel, many board vendors
  • 00s
  • Laptops, PDAs, Cell Phones Parallel graphic
    chips
  • Everything will be graphics, 3D, animated,
    interactive
  • Nvidia, Sony, Nokia

3
History of GPU
  • Pre-GPU Graphics Acceleration
  • SGI, Evans Sutherland. Introduced concepts like
    vertex transformation and texture mapping. Very
    expensive!
  • First-Generation GPU (-1998)
  • Nvidia TNT2, ATI Rage, Voodoo3. Vertex
    transformation on CPU, limited set of math
    operations.
  • Second-Generation GPU (1999-2000)
  • GeForce 256, Geforce2, Radeon 7500, Savage3D.
    Transformation Lighting. More configurable,
    still not programmable.
  • Third-Generation GPU (2001)
  • Geforce3, Geforce4 Ti, Xbox, Radeon 8500. Vertex
    Programmability, pixel-level configurability.
  • Fourth-Generation GPU (2002-)
  • Geforce FX series, Radeon 9700 and on.
    Vertex-level and pixel-level programmability.

4
Architecture
Application
Vertex Shader
transformed vertices, normals, colors
Geometry Shader
Rasterizer
fragments (surfels per pixel)
texture
Fragment Shader
pixel color, depth, stencil
Compositor
Display
5
Buffers
  • Color 8-bit index to color table, float/16-bit
    true color
  • Depth 24-bit or float (0 at back plane)
  • Back and front display front, update back, swap
  • Stereo Shutter glasses, HMD. Alternate frames
  • Auxiliary off-screen working space. Helps reduce
    passes.
  • Stencil 8 bits (left-over of depth buffer). lt,gt
    mask,
  • Accumulation sum, scale (supersampling, blur)
  • P-buffer, superbuffers Render to texture

6
Fragment operations
  • Depth tests lt, lt, gt, lt, , Z?depth-interval
  • Stencil test mask?, counter, parity.
  • Alpha tests compare to reference alpha
  • Alpha blending max, min, replace, blend

7
Data Parallelism in GPUs
  • Data flow vertices gt fragments gt pixels
  • Parallelism at each stage
  • No shared or static data (except textures)
  • ALU-heavy (multiple ALUs per stage in pipe)
  • Fight memory latency with more computation

8
GPGPU
  • Stream collection of records (pixels, vertices)
  • Stored in Textures (a computational grid)
  • Kernel Function applied to each element in
    stream
  • Transform, evolve (no dependency between records)
  • Matrix algebra
  • Image/volume processing
  • Physical simulation
  • Global illumination
  • Ray tracing
  • Photon mapping
  • Radiosity

9
Computational Resources
  • Programmable parallel processors
  • Vertex Fragment pipelines
  • Rasterizer
  • Mostly useful for interpolating addresses
    (texture coordinates) and per-vertex constants
  • Texture unit
  • Read-only memory interface
  • Render to texture (or Copy to texture)
  • Write-only memory interface

10
Vertex Processor
  • Fully programmable (SIMD / MIMD)
  • Processes 4-vectors (RGBA / XYZW)
  • Capable of scatter but not gather (Ai,jx)
  • Can change the location of current vertex
  • Cannot read info from other vertices
  • Can only read a small constant memory
  • Vertex Texture Fetch
  • Random access memory for vertices
  • Arguably still not gather

11
Fragment Processor
  • May be invoked at each pixel by drawing a full
    screen quad
  • Fully programmable (SIMD)
  • Processes 4-vectors (RGBA / XYZW)
  • Random access memory read (textures)
  • Capable of gather (xAi1,j) and some scatter
  • RAM read (texture), but no RAM write
  • Output address fixed to a specific pixel
  • But can change that address
  • Typically more useful than vertex processor
  • More fragment pipelines than vertex pipelines
  • Gather
  • Direct output (fragment processor is at end of
    pipeline)

12
Branching
  • Not supported or expensive
  • Avoid, replace by math
  • Depth test
  • Stencil test
  • Occlusion query (conditional execution)
  • Pre-computation (region of interest, use to set
    stencil mask)
Write a Comment
User Comments (0)
About PowerShow.com