Title: Afrigraph Tutorial B: Interactive RayTracing
1Afrigraph Tutorial BInteractive Ray-Tracing
- Ingo Wald
- Philipp Slusallek
- Saarland University
- Computer Graphics Group
- http//graphics.cs.uni-sb.de
2- For almost 20 years, researchers have argued that
eventually, Ray-Tracing will become faster than
rasterization
3- For almost 20 years, researchers have argued that
eventually, Ray-Tracing will become faster than
rasterization - And nothing happened...
- Well, almost ...
4UNC Powerplant (12.5 Mtris, gt10 fps)
5Four Power Plants (50 Mtris)
6Tutorial Overview
- Introduction
- Introduction to Ray-Tracing
- Discussion Ray-Tracing versus Rasterization
- Previous Work
- Approximating Ray-Tracing
- Accelerated Ray-Tracing
- Interactive Ray-Tracing on PCs
- Coherent Ray-Tracing Implementation
- Comparisons (SW / HW)
- Distributed RT of Massive Models
- Outlook Hardware-Architectures for Ray-Tracing
- Future Research and Conclusions
7Tutorial Overview
- Introduction
- Introduction to Ray-Tracing
- Discussion Ray-Tracing versus Rasterization
- Previous Work
- Approximating Ray-Tracing
- Accelerated Ray-Tracing
- Interactive Ray-Tracing on PCs
- Coherent Ray-Tracing Implementation
- Comparisons (SW / HW)
- Distributed RT of Massive Models
- Outlook Hardware-Architectures for Ray-Tracing
- Future Research and Conclusions
8Introduction to Ray-Tracing
- In principle Very simple algorithm
- For each pixel
- Create ray through that pixel
- Cast ray into scene and find closest intersection
- Shade ray at intersection point
- Can also shoot new rays during shading
- Determine visibility of point lights by shadow
rays - Compute reflected/refracted light by recursively
tracing reflection-/refraction-rays - Basically, thats all
9Ray-Tracing Algorithm
10Introduction to Ray-Tracing
- Only three main components
- Generating rays
- Finding the closest intersection of a ray
- Ray traversal
- Ray-object intersection
- Shading
11Ray-Generation
- Generate initial ray for each pixel
- Other camera models are trivial
- Fisheye lens
- Non-linear distortions/Lens effects
- Motion blur, depth of field
-
- Options
- More samples for anti-aliasing
- Adaptive Sampling
- Combine with IBR
- E.g. RenderCache Reuse samples by reprojection
12Ray-Traversal
Grid (2D)
- Need to find objects quickly
- Exhaustive search infeasible
- Build spatial index structure
- Grid, octree, BSP-tree, BVH, ...
- Advantages
- Logarithmic complexity
- Occlusion culling
- Early ray termination
- Problems
- Multiple intersection computations
- (objects often in multiple voxels)
- Dynamic scenes ?
Octree (2D)
13Ray-Object-Intersection
- Need to compute intersections fast
- Requires many floating point operations
- But typically dominated by traversal (21)
- Plenty of algorithms
- Plenty of primitives
- Even for triangles
- Optimizations
- Use SIMD CPU-extensions (SSE, AltiVec, 3D-Now)
- Data parallel execution
- Proper caching of data
14Shading
- Lots of reflection models possible
- Phong, Cook-Torrance, Ward,
- Direct use of Shading Languages (Renderman)
- Shading after visibility has been computed
- No overhead due to overdraw
- Every ray is shaded exactly once
- Can generate new rays
- Shadow, reflection, transmission, ...
- Need to deal with recursion
- Rendering cost linear in rays traced
15Introduction to Ray-Tracing
- Only three main components
- Generating rays
- Finding the closest intersection of a ray
- Ray traversal
- Ray-object intersection
- Shading
- Problem
- Find closest intersection is very expensive
- And Lots of rays per image
16Rasterization Pipeline
Application
- In Contrast Rasterization
- Efficient HW implementation
- Use of object coherence
- Many new features
- Rendering is driven by App.
- Application submits geometry
- Visibility determined at end
- Z-buffer fragment test
TL, Vertex Ops
Rasterization
Texturing
Fragment Ops
Fragment Tests
Framebuffer
17RasterizationDrawbacks
- Drawbacks of this approach
- Use of object coherence
- Only if triangle is large
- Rendering is driven by App.
- Application has to know what is visible
- Efficient occlusion culling is hard
- Visibility determined at end
- Overdraw Discard all but one fragments
- High depth complexity very inefficient
18Ray-Tracing versus Rasterization
- Flexibility
- Handling unstructured groups of rays
- Image-based rendering, reflections, shadows
- Generality
- Ray-Tracing is the basis for many algorithms
- Global illumination, visibility,
- Used in many disciplines
- Physics, Biology, Chemistry, Telecom,
19Ray-Tracing versus Rasterization
- Simple and Efficient Shading
- Shading happens after visibility computation
- Direct use of Shading Languages
- Correctness Image Quality
- Rasterization inherently relies on approximations
- Environment maps, shadow maps, ...
- Ray-traced images are correct by default
- True reflections and shadows
- Use of approximations is optional
20Ray-Tracing versus Rasterization
- Parallel Scalability
- Ray-Tracing is embarrassingly parallel
- (e.g. each pixel independent of all others)
- Scales well with the available hardware
- Needs fast access to scene data base
21Ray-Tracing versus Rasterization
- Scalability with Scene Size
- Occlusion Culling Logarithmic Complexity
- RT never even looks at invisible geometry
- RT traversal allows for efficient searching
O(log N) - Rasterization shows linear behavior O(N)
- ? RT wins for complex scenes
- But rasterization is improving
22Ray-Tracing versus Rasterization
- Coherence
- Key to efficient rendering
- Rasterization Object coherence
- Allows for efficient HW implementation
- But only really efficient for large triangles
- Ray-Tracing Ray coherence
- Improved caching reduced bandwidth
- Allows for data parallel computation
- RT has much more coherence than assumed
- But harder to exploit
23Ray-Tracing versus Rasterization
- Conclusion of that Comparison
- Ray Tracing has many advantages
- These advantages become ever more pronounced
- Not only qualty, also efficiency
- But Ray-Tracing is (still) costly
- Have to make it faster !
24Tutorial Overview
- Introduction
- Introduction to Ray-Tracing
- Discussion Ray-Tracing versus Rasterization
- Previous Work
- Approximating Ray-Tracing
- Accelerated Ray-Tracing
- Interactive Ray-Tracing on PCs
- Coherent Ray-Tracing Implementation
- Comparisons (SW / HW)
- Distributed RT of Massive Models
- Outlook Hardware-Architectures for Ray-Tracing
- Future Research and Conclusions
25Previous and Related Work
- Two ways to achieve ray-tracing like quality
interactively - Trace less rays per frame Approximative
ray-tracing - Rasterization hardware
- Image-based techniques
- Interpolation of ray-traced results
- Trace more rays/sec Accelerated ray-tracing
- Better data structures
- Better algorithms
- Better implementations
- Parallel processing
26Previous and Related Work
- Two ways to achieve ray-tracing like quality
interactively - Trace less rays per frame Approximative
ray-tracing - Rasterization hardware
- Image-based techniques
- Interpolation of ray-traced results
- Trace more rays/sec Accelerated ray-tracing
- Better data structures
- Better algorithms
- Better implementations
- Parallel processing
27Approximated Ray-TracingRasterization Hardware
- HW-Accelerated vista/shadow buffers
- Compute visible geometry in HW
- Lookup of geometry in frame buffer
- Only works for primary rays and point lights
- Creates artifacts (e.g. shadow buffer resolution)
- Augmenting hardware with RT effects
- Selective ray-tracing
- Integrate ray-tracing with OpenGL rendering
- Rasterization for diffuse objects
- Textures or splatting Stamminger/Haber 00/01
for ray-traced samples
28Approximated Ray-TracingCorrective Textures
29Approximated Ray-TracingImage-Based Techniques
- RenderCache Walter et al. 99
- Store ray samples per pixel (color, depth, ...)
- Reproject samples for next frame
- Detect and fill holes by sending few new rays
- Heuristic algorithms based on neighborhood
- Locate and correct errors (shadow, etc)
- Pseudo-randomly sample a few other pixel
- Adaptively sample near error regions
- But Reprojection and Heuristics are expensive
- Pays off (only) when pixels are very expensive to
compute directly (e.g. global illumination) - Scales badly with CPUs
30Approximated Ray-TracingImage-Based Techniques
- Holodeck Ward 98
- Similar to RenderCache, but
- Long term storage of ray samples on disk
- Fast access to samples based on grid structure
- Builds light-field-like data representation
31Approximated Ray-TracingImage-Based Techniques
- Interpolation in the image plane
- Pixel-selected ray-tracing Akimoto, 89
- Coarse sampling grid
- Adaptive refinement based on error criteria
- Linear interpolation between samples
- General ray interpolation Bala, 99
- Object-/Ray-/Image-Space
- Time
- Error bounded
32Previous and Related Work
- Two ways to achieve ray-tracing like quality
interactively - Trace less rays per frame Approximative
ray-tracing - Rasterization hardware
- Image-based techniques
- Interpolation of ray-traced results
- Trace more rays/sec Accelerated ray-tracing
- Better data structures
- Better algorithms
- Better implementations
- Parallel processing
33Accelerated Ray TracingBetter Data
Structures/Algorithms
- Best data structure (Grid vs BSP vs) ?
- Always scene and implementation dependent
- In practice, most do about equally well
- Well-reserached topic ? New data structures are
unlikely to be found - But Potential for better algorithms
- Can we better exploit coherence ?
- Can we build data structures faster ?
- Can we build data structures fully automatically
? - Also Need for dynamic data structures
34Accelerated Ray-TracingParallelization on
SuperComputers
- RT of large CSG models Muuss 95
- Motivation Interactively render complex data
sets - Idea Use raytracing
- Flexibility Avoid tessellation of CSG-models
- Take advantage of logarithmic complexity of RT
- Exploit parallelism
- Implementation
- Optimized, general RT algorithm
- 96 CPU, SGI PowerChallenge, shared memory
- Results
- 1-2 frames per second _at_ video resolution (in
95!!!)
35Accelerated Ray-TracingParallelization on
SuperComputers
- Utah Parallel RT System Parker 99
- Similar approach to Muuss
- Parallelization on shared memory machine
- Supports general primitives and volume data sets
- Results
- Has shown scalability up to 128 CPUs
- Importance of caching analysis
- New goal interactive visual cues for
visualization(Same information at less cost)
36Tutorial Overview
- Introduction
- Introduction to Ray-Tracing
- Discussion Ray-Tracing versus Rasterization
- Previous Work
- Approximating Ray-Tracing
- Accelerated Ray-Tracing
- Interactive Ray-Tracing on PCs
- Coherent Ray-Tracing Implementation
- Comparisons (SW / HW)
- Distributed RT of Massive Models
- Outlook Hardware-Architectures for Ray-Tracing
- Future Research and Conclusions
37IRT on PCsWhat to keep in mind
- PC hardware has changed dramatically
- Processors become much faster
- But increase in ray-tracing speed is gradual
- Increasing gap between speed of CPU and memory
- But ray-tracing algorithm did not change
- SIMD extensions
- Flops become increasingly cheap
- But difficult to take advantage of in ray-tracing
- Fast (and cheap) networking network of PCs
- But good performance on non-shared-memory is hard
- Small clusters are around everywhere
38IRT on PCsWhat to keep in mind
- PC hardware has changed dramatically
- Have to adapt our algorithms !
- Special emphasis on
- Keeping the CPU busy
- Memory Caching(1 cache miss can cost several
triangle intersections) - SIMD
- Not so important any more
- Instruction count, avoiding float ops
39General Optimizations Cache
- Main memory is too slow for CPU (110)
- (bandwidth and latency)
- Keep relevant data in caches
- Design algorithms for cache reuse ? coherence
- Align data to cache lines (32 bytes)
- Separate data according to usage
- Separate volatile from non-volatile data
- Store intersection data separate from shading
data(e.g. shading normals not needed for
intersection) - Prefetch data
- Design algorithms to enable data access prediction
40General Optimizations Cache
- Cache Reuse Example Triangle Data Structure
- Variant 1 Struct Triangle Vec3f a,b,c
- Intersect() routine works on this structure
- Prefetching hard (2 levels of indirection)
- Data stored in 4 different memory regions
- (1 struct 3 vectors)
- Worst case 8 cache misses
- (if each of the 4 data overlaps cacheline border)
41General Optimizations Cache
- Cache Reuse Example Triangle Data Structure
- Variant 2 With preprocessed intersection data
- All necessary data packed into 48 aligned
bytes(see paper) - Con Additional data to store (48b/triangle)
- But several advantages
- At most 2 cache misses
- 1 continuous memory region ? Trivial to prefetch
42General Optimizations Cache
- This was only one example Similarly for
- BSP Nodes (even more important)
- Triangle lists
- Materials
- Shading Data
-
43General Optimizations Simplification
- Today's CPUs have very long pipelines
- Simplify the code to avoid pipeline stalls
- Choose simple algorithms
- KISS wins(KISS keep it simple and stupid)
- E.g. BSP-tree traversal simpler than grids
- Easier to maintain and optimize (e.g.
prefetching) - Write tight inner loops
- E.g. better caching and handling of branches
- Avoid conditionals/relative jumps in inner loops
- E.g. support only triangles
- Avoid memory-access stalls
- ? Caching, caching, caching !!!
44OptimizationSIMD Extensions
- Most CPUs provide SIMD extensions
- Intel SSE (Others 3D-Now!, AltiVec, ...)
- Use SIMD higher speed lower bandwidth
- Up to four parallel floating point operations
- ? For the cost of 1 !
- Fetch data once to reduce bandwidth to cache
- Amortize loading cost over 4 operations
- ?Factor 4 in bandwidth reduction
- Overhead due to restricted instruction set
- E.g. no SSE dot product
- Con Programming in assembly language
45OptimizationSIMD Extensions
- How to use SIMD Extensions ?
- Either Instruction-parallel
- Combine 4 computations in normal algorithm
- E.g. the 4 mults in a dot product
- Or Data-parallel
- Run algorithm on 4 different data in parallel
- E.g. 4 independent dot products
46SIMD Intersection
- SIMD best used in data parallel fashion
- Little instruction-level parallelism (in RT)
- ? Just doesnt work
- Data parallel 1 ray ? 4 triangles
- Hard to always have four triangles ready
- Data parallel traversal for 1 ray ?
- Data parallel 4 rays ? 1 triangle
- Must traverse rays in parallel ? ray packets
- Standard intersection code
- Overhead for terminated rays(E.g. 1 ray hits, 3
rays miss)
47SIMD Intersection
- Performance Results
- Comparison against already optimized C code
- Amortized cost for SSE code
- ? 20-36 million intersections/sec! (P-III, 800
MHz)
48SIMD BSP-Traversal
- Recursive Traversal Algorithm
49SIMD BSP-Traversal
- SIMD-Traversal
- Traverse four rays in parallel
- Intersection with split plane traversal
decision - Combine decisions flags
- All rays must perform the same traversal
- Make sure order is consistent
- Easy to guarantee Same ray origin or same signs
of direction vector - Avoid recursion function calls
- Maintain stack manually
- Worst case as bad as before
50SIMD BSP-Traversal
- Overhead of SIMD-Traversal (in )
- Fixed resolution at 10242 (l), fixed 2x2 packet
(r) - Traversal still dominates rendering cost
- Overall speedup factor 2 to 2.3
51Coherent Algorithm Tracing Ray Packets
- Many rays are very similar
- e.g. primary and shadow rays, but others too
- Handle rays together in packets of 4 rays
- Process them in lock-step (? SIMD)
- Reorder computations to be partly breadth-first
- Load data once and use it for all rays
- Reduces memory bandwidth (e.g. SSE Factor 4 !)
- Increases Cache Utilization
- Coherence increases with image resolution
- more rays in same view frustum
52Coherent Algorithms Shading
- SIMD Phong-Shading
- Fixed cost per image
- Rearrange data from ray packets
- Different depth non-coherent shadow rays
- Different materials different shaders
- Algorithm
- Parallel shadow rays to light sources
- SIMD shading using shadow flags
- Constant shading texturing cost (lt10)
- Procedural shading is easy (noise)
53Coherent Ray-Tracing Summary
- Speedup
- Prerequisite Expose coherence in ray-tracing
algorithm - Factor gt5 General optimizations
- Factor gt2 SIMD computations
- Further optimizations are possible
- Better prefetching, more efficient shading
- Performance
- 200K to 1.5M primary rays/s (800 MHz, P-III)
- Almost linear in of reflection shadow rays
54Comparison Test Scenes
55Comparison Software Ray-Tracers
- Time per primary ray (1 CPU, 5122, in ?s)
- Main memory RTRT 256MB, others up to 1GB
- Rayshade Best grid resolution
56Comparison OpenGL Hardware
- Frame rate with SGI-Performer (5122, fps)
- HW Octane V8, Onyx3/IR3, Geforce II GTS
- CPUs Onyx 8, nVidia 2, RTRT 1
57Comparison Scaling with Scene Size
- Render time of subsampled terrain (spf)
- Typical linear scaling of rasterization HW
- Worst case for RT No occlusion
- Only 1 CPU !
58 59Distributed RT of Massive Models
60Reference Model (12.5 Mtris)
61Previous Work
- Rendering of Massive Models Aliaga 99
- Framerate 5 to 15 fps for single power plant
- Needs shared-memory supercomputer (SGI)
- Framework of algorithms
- Textured-depth-meshes (96 reduction in tris)
- View-Frustum Culling LOD (50 each)
- Hierarchical occlusion maps (10)
- Extensive preprocessing required
- Entire model 3 weeks (estimated)
- Only semi-automatic
62Distributed RT of Massive Models
- Ray-Tracing and massive models just match
- Logarithmic scaling in primitives
- Ideal for big models
- Preprocessing
- Simple and fast spatial sorting, fully automatic
- Distributed computing
- Parallel scalability to many networked computers
- No scene replication
- ? Our Approach Use coherent ray-tracing
- Caching of scene data in network
- Deal with network issues by reordering
63Ray-Tracing Issues
- Distributed Scene Management
- Several GB of scene data
- File size and virtual address space (32 bit)
- Cannot use OS caching (demand paging)
- Cache miss will stall the entire process
- 1ms network latency time to trace several
hundred rays - Reordering would need non-blocking memory read
- Need to handle cache manually
- No longer limited by address space
- Allows reordering of computations
- Do not wait for missing data
- Continue with other rays while data is being
fetched
64Massive Models Caching
- 2-Level BSP-Trees
- Caching based on voxels
- Voxels are completely self-contained
65Structure of the BSP-Tree
66Distribution Issues
- Preprocessing
- Simple spatial sorting
- Need out-of-core algorithm due to model size
- Simplistic implementation 2.5 hours
- Estimated with optimizations lt 30 min
- Model Server
- Single server provides all model data
- Potenial bottleneck
- Should be distributed as well
- At least for more than 10 clients
- Trivial to implement
67Distribution Issues
- Load Balancing
- Tile based (32x32 pixels)
- Demand driven
- Avoid idle-times
- prefetching tiles
- Asynchronous communication
-
- Frame-to-Frame Coherence
- Keep rays on the same client
- Simple Keep tiles on the same client
- Better Assign tiles based on reprojected pixels
- Larger effective cache size
- Increases with number of clients
68Results
- Setup
- Seven dual Pentium-III 800-866 MHz
- FastEthernet (100Mbit) for normal clients
- GigabitEthernet only for display model server
- Performance for one Power Plant
- 4-5 fps without SSE optimization
- Factor 2 speedup with SSE
- Almost perfect scaling from 1 to 14 CPUs
- Never tried any more than that
69Animation Framerate vs. Bandwidth
70Speedup
71 72Tutorial Overview
- Introduction
- Introduction to Ray-Tracing
- Discussion Ray-Tracing versus Rasterization
- Previous Work
- Approximating Ray-Tracing
- Accelerated Ray-Tracing
- Interactive Ray-Tracing on PCs
- Coherent Ray-Tracing Implementation
- Comparisons (SW / HW)
- Distributed RT of Massive Models
- Outlook Hardware-Architectures for Ray-Tracing
- Future Research and Conclusions
73Ray-Tracing Hardware
- Summary so far
- RT has many technicaladvantages
- Better performance forlarge scenes, (logN vs N)
- Better image quality, more features
- But High initial cost onmain CPU
- ? Hardware support would help
74Ray-Tracing HardwareWhy today ?
- The setting has changed
- Real scenes arent suited for rasterization any
more - High depth complexity
- Large scenes, small triangles
- Shading becomes more expensive
- Demand for more features (shading,
programmability) - Advantages of raytracing finally come to play
- Also Flops arent that expensive any more
- Number of Gigaflops per Gforce ?
- Neither is memory
75Ray-Tracing HardwarePrevious Work
- Over the last decade Several research systems
- Often suffered from lack of resources
- Memory and Flops too expensive 10 years ago
- Offline-Ray-Tracing AR250 (ART)
- Accelerated offline rendering, bandwidth limited
- Volume-Ray-Casting systems
- Full volume ray casting on a chip
- Many, some already commercially successful
76Ray-Tracing HardwareThe SHARP Architecture
- SHARP architecture Tim Purcell, Stanford
- Mixed SW/HW approach
- Based on SmartMemories Mai 00
- Multiprocessor on a Chip
- Roughly 64 R10k, with 8GB/s (!) memory bandwith
77Ray-Tracing HardwareThe SHARP Architecture
- Conclusions from SHARP(Also see Siggraph 2001,
Course 13) - Simple caching works very well
- Good ray coherence
- Off-chip bandwidth is minimal
- Simple memory access design
- Performance (512x512)
- Conference scene 50 fps
- Reconfigurability allows to adapt to demands
- Adapt number of shading/traversal units to scene
78Ray-Tracing HardwareOther Architectures
- RAYA (MERL, Siggraph 2001, Course 13)
- Based on Memory Coherent Ray-Tracing Pharr
- CORA (Saarbrücken)
- Hardware version of Coherent RT Algorithm
- Custom-design chip
- Est. performance 30/25 fps at 1024x768
- Cruiser 3.5 Mtris, 2 lights
- BunnyQuake 110 Ktris, 2 lights, 3 reflection
levels
79Tutorial Overview
- Introduction
- Introduction to Ray-Tracing
- Discussion Ray-Tracing versus Rasterization
- Previous Work
- Approximating Ray-Tracing
- Accelerated Ray-Tracing
- Interactive Ray-Tracing on PCs
- Coherent Ray-Tracing Implementation
- Comparisons (SW / HW)
- Distributed RT of Massive Models
- Outlook Hardware-Architectures for Ray-Tracing
- Future Research and Conclusions
80What you should take home with you
- Interactive Ray Tracing IS feasible
- If importance is paid to underlying hardware
- Its not only feasible, its already there
- Not only a theoretical phantasy any more
- And even on cheap PCs
- Not only better, it can even be faster
- At least for certain applications
81The Future
- IRT enables completely new applications
- Just think what has been done OpenGL
- Large scale visualization engineering,
- Handling of huge models
- Interactive global illumination (?)
- Need to adapt algorithms to new situation
- Flexible rendering
- Gaze tracking and non-uniform sampling density
- Image-Based or Frameless rendering
- Question What can IRT do for you?
82Open Research Problems
- Can we make it even faster ?
- Hardware
- What is the best HW architecture?
- Dynamic Scenes
- Optimized rebuild or transformation of index?
- API
- Better alternative to OpenGLs push model?
- OpenGL not suited for Ray-Tracing
- Global Illumination
- Efficient new algorithms
83Acknowledgements
- AMD
- Generous support, sponsoring and collaboration
soon 24-node dual-Althlon IV, 1.5GHz cluster - Presenters of the Siggraph 2001 Course 13
- Images, material, and information
- Tim Purcell Pat Hanrahan (Stanford)
- Many discussions and ideas
- The Max-Planck-Institute at Saarbruecken
- Collaboration and use of their Graphics Hardware
- C. Benthin M. Wagner others
- Work on the RT implementation and discussions
84Links
- mailto//wald_at_graphics.cs.uni-sb.de
- For any questions or comments
- http//graphics.cs.uni-sb.de/rtrt
- The Saarland Universities RealTime RayTracing
Project - http//graphics.cs.uni-sb.de/pub/afrigraph01
- Tutorial Notes (Slides, Papers)
- http//www.openrt.de
- The OpenRT Interactive Raytracing API (not yet
online)
85The Future
- Applications on compute clusters
- Visualization of large models
- Previewing of animations with full shading
- Hardware support for IRT
- At least for specialized applications
- Convergence between RT and TR
- Occlusion culling
- Improved shading capabilities
- Eventually based on the same API?
86Open Research ProblemsGlobal Illumination
- New situation
- Ray-tracing bottleneck is gone (Well, almost)
- New challenges
- Need for coherence
- Efficient computations
- Usage of view-importance
- High-degree of parallelism
- Small communication overhead
- Interactivity !!!
- Can we trade quality for speed ?