Title: GP2: General Purpose Computation using Graphics Processors
1GP2 General Purpose Computation using Graphics
Processors
Dinesh Manocha Avneesh Sud
- http//gamma.cs.unc.edu/GPGP
- Spring 2007
- Department of Computer Science
- UNC Chapel Hill
2Instructors
- Dinesh Manocha dm_at_cs.unc.edu 962-1749
- Avneesh Sud sud_at_cs.unc.edu 962-1849
3Class Schedule
- Current Time Slot 200 315pm, Mon/Wed, SN011
- Office hours TBD
- Class mailing list gpgp_at_cs.unc.edu (??)
4GPGP What kind of course is it?
5GPGP What kind of course is it?
- Is it a graphics course?
- Is it a system course?
6GPGP What kind of course is it?
- Is it a graphics course?
- Is it a system course?
- Is it an application course?
7GPGP What kind of course is it?
- Is it a graphics course?
- Is it a system course?
- Is it an application course?
- It is all of them!!
8Is this the right course for me?
- No strict pre-requisites
- Course would borrow concepts from
- Computer graphics
- Linear algebra
- Numerical computations
- Architectures CPU GPUs
- Parallel programming (data parallel programming)
- Applications
- Geometric computations
- Database computations
- Scientific computing and physical simulation
- Computer vision
9Modern Commodity Processors
GPU (1.3 GHz)
Video Memory(768 MB)
2 x 4 MB Cache
PCI-E Bus(4 GB/s)
GPU (1.3 GHz)
2 x 4 MB Cache
Video Memory(768 MB)
System Memory(4 GB)
HyperTransport(20 GB/s)
10GPUs of Today!
- The GPU on commodity video cards has evolved into
an extremely flexible and powerful processor - Programmability
- Precision
- Power
11GPGP
- The GPU on commodity video cards has evolved into
an extremely flexible and powerful processor - Programmability
- Precision
- Power
- This course will address how to harness that
power for general-purpose computation
(non-rasterization) - Algorithmic issues
- Programming and systems
- Applications
12GeForce 7900 302M Transistors (2005)
13GeForce 7900 302M Transistors (OUT OF DATE)
14GeForce 8800 600M Transistors (2006)
15Graphics Processing Units (GPUs)
- Commodity processor for graphics applications
- Massively parallel vector processors
- High memory bandwidth
- Low memory latency pipeline
- Programmable
- High growth rate
- Power-efficient
16GPU Commodity Processor
Laptops
Consoles
Cell phones
PSP
Desktops
17GPU Commodity Processor
Laptops
Consoles
Cell phones
????
SuperComputers
PSP
Desktops
18GPU Commodity Processor
Laptops
Consoles
Cell phones
????
iPhone
PSP
Desktops
19Graphics Processing Units (GPUs)
- Commodity processor for graphics applications
- Massively parallel vector processors
- 10-20x more operations per sec than CPUs
- High memory bandwidth
- Better hides memory latency pipeline
- Programmable
- High growth rate
- Power-efficient
20Parallelism on GPUs
Graphics FLOPS GPU 1.3 TFLOPS CPU 25.6
GFLOPS
21Quad SLI 1.3 Billion transistors
Jan2006
22Graphics Processing Units (GPUs)
- Commodity processor for graphics applications
- Massively parallel vector processors
- High memory bandwidth
- Better hides latency pipeline
- Programmable
- 10x more memory bandwidth than CPUs
- High growth rate
- Power-efficient
23CPU vs. GPU Memory Hierarchy
Core 1
Core 2
FP
FP
FP
FP
FP
Registers
Registers
Registers
L1 Dcache
L1 Dcache
L1 cache
L2 cache
L2 cache
DDR2 RAM
GDDR4 RAM
24CPU vs. GPU Memory HierarchyBroad Level
Comparison
Core 1
Core 2
FP
FP
FP
FP
FP
Registers
Registers
Registers
L1 Dcache
L1 Dcache
L1 cache
Write through
Write back
L2 cache
L2 cache
DDR2 RAM
GDDR4 RAM
25CPU vs. GPU Memory Hierarchy
Core 1
Core 2
FP
FP
FP
FP
FP
Registers
Registers
Registers
L1 Dcache
L1 Dcache
L1 cache
Very small
Small, 4MB
L2 cache
L2 cache
DDR2 RAM
GDDR4 RAM
26CPU vs. GPU Memory Hierarchy
Core 1
Core 2
FP
FP
FP
FP
FP
Registers
Registers
Registers
L1 Dcache
L1 Dcache
L1 cache
L2 cache
L2 cache
High B/W, 86 GB/s
Low B/W, 8GB/s
DDR2 RAM
GDDR4 RAM
27Graphics Processing Units (GPUs)
- Commodity processor for graphics applications
- Massively parallel vector processors
- High memory bandwidth
- Better hides latency pipeline
- Programmable
- High growth rate
- Power-efficient
28GFLOPS for GPUs CPUs
Graphics-Flops
Giga-Flops
29Graphics Processing Units (GPUs)
- Commodity processor for graphics applications
- Massively parallel vector processors
- High memory bandwidth
- Better hides latency pipeline
- Programmable
- High growth rate
- Power-efficient (high throughput per watt)
30Computational Power of GPUs
- Why are GPUs getting faster so fast?
- Arithmetic intensity the specialized nature of
GPUs makes it easier to use additional
transistors for computation not cache - Economics multi-billion dollar video game market
is the killer application that pays for innovation
31GPUs and Computer Architecture
- Current research in computer architecture is
looking at - Streaming computation
- Flexible polymorphous computing systems
- Multi-core architecture
- Heterogeneous architecture
- More on these topics in the future
32GPUs and Computer Architecture
- Current research in computer architecture is
looking at - Streaming computation
- Flexible polymorphous computing systems
- Multi-core architecture
- Heterogeneous architecture
- GPU-like architectures have a lot in common with
all these research trends!
33GPUs and Computer Architecture
- Current research in computer architecture is
looking at - Streaming computation
- Flexible polymorphous computing systems
- Multi-core architecture
- Heterogeneous architecture
- GPU-like architectures have a lot in common with
all these research trends! - We plan to touch on many of these topics as part
of the course!
34Is There a Future of GPGPU?
- http//www.informationweek.com/news/showArticle.jh
tml?articleID196800208 One of the Five
Disruptive Technologies for 2007 - http//www.wired.com/news/technology/computers/0,7
2090-0.html?twwn_index_9 - SuperComputings Next Revolution
35Capabilities of Current GPUs
- Modern GPUs are deeply programmable
- Programmable pixel, vertex, video engines
- Solidifying high-level language support
- Modern GPUs support 32-bit floating point
precision - Great development in the last few years
- 64-bit arithmetic may be coming soon
- Almost IEEE FP compliant
36The Potential of GPGP
- The power and flexibility of GPUs makes them an
attractive platform for general-purpose
computation - Example applications range from in-game physics
simulation, geometric applications to
conventional computational science - Goal make the inexpensive power of the GPU
available to developers as a sort of
computational coprocessor - Check out http//www.gpgpu.org
37GPGP Challenges
- GPUs designed for and driven by video games
- Programming model is unusual tied to computer
graphics - Programming environment is tightly constrained
- Underlying architectures are
- Inherently parallel
- Rapidly evolving (even in basic feature set!)
- Largely secret
- No clear standards (besides DirectX imposed by
MSFT) - Cant simply port code written for the CPU!
- Is there a formal class of problems that can be
solved using current GPUs
38Importance of Data Parallelism
- GPUs are designed for graphics or gaming industry
- Highly parallel tasks
- GPUs process independent vertices fragments
- Temporary registers are zeroed
- No shared or static data
- No read-modify-write buffers
- Data-parallel processing
- GPUs architecture is ALU-heavy
- Multiple vertex pixel pipelines, multiple ALUs
per pipe - Hide memory latency (with more computation)
39GPGPU Applications
- Geometric computations
- Database computations
- Scientific computing and physical simulation
- Signal processing
- Computer vision
- Efficient when computation domain is a uniform
grid
40Geometric Computations
- Distance computations Data-parallel computation
- Demo (2D)
41Geometric Computations
42Geometric Computations
- Collision Detection and Proximity Computations
- GPU A culling co-processor
N-Objects
Stage 1 Culling
GPU-Based Culling
Exact Tests
Potential Colliding Set
Overlap Tests
Collision
Potential Neighbor Set
Distance
Distance-Based Culling
CPU
GPU
43Geometric Computations
44Geometric Computations
45Database Computations
46Physical Simulation
- Solving PDEs
- Numerical methods
- Linear Algebra
- Reaction-Diffusion Demo
- Fluid Demo
47Signal Processing
- FFT, DCT, Video Processing
- DCT demo
- Video filtering demo
48Computer Vision
- Realtime feature tracker (KLT)
49Computer Vision
- Realtime feature tracker (KLT)
50Goals of this Course
- A detailed introduction to general-purpose
computing on graphics hardware - Emphasis includes
- Core computational building blocks
- Strategies and tools for programming GPUs
- Cover many applications and explore new
applications - Highlight major research issues
51Course Organization
- Survey lectures
- Instructors, other faculty, senior graduate
students - Breadth and depth coverage
- Student presentations
52Course Contents
- Overview of GPUs architecture and features
- Models of computation for GPU-based algorithms
- System issues Cache and data management
Languages and compilers - Numerical and Scientific Computations Linear
algebra computations. Optimization, FFTrigid body
simulation, fluid dynamics - Geometric computations Proximity computations
distance fields motion planning and navigation - Database computations database queries
predicates, booleans, aggregates streaming
databases and data mining sorting searching - GPU Clusters Parallel computing environments for
GPUs - Rendering Ray-tracing, photon mapping Shadows
53Student Load
- Stay awake in classes!
- One class lecture
- Read a lot of papers
- 2-3 small assignments
54Student Load
- Stay awake in classes!
- One class lecture
- Read a lot of papers
- 2-3 small assignments
- A MAJOR COURSE PROJECT WITH RESEARCH COMPONENT
55Course Projects
- Work by yourself or part of a small team
- Develop new algorithms for simulation, geometric
problems, database computations - Formal model for GPU algorithms or GPU hacking
- Issues in developing GPU clusters for scientific
computation - Look into new architecture and parallel
programming trends
56Possible Course Projects
- Check the WWW site
- http//gamma.cs.unc.edu/GPGP/projects