Title: Scalability
1Scalability
- Advanced D3D Programming
- Richard Huddy
- RichardH_at_nvidia.com
2Basic Objectives
- To produce the best experience on every users
machine - To exploit all of the resources available
- To cope with a broad spread of hardware
- To avoid bottoming out during the shelf-life of
the game / engine
3What is a high-end PC?
- A 125 mega-texel device
- A 125 mega-pixel device
- A fast CPU ( gt 350MHz)
- AGP 2X/4X Bus
- Lots of system RAM ( gt 64MB)
- Huge frame buffers (16 to 32 MB)
- Multi-Texture at low cost
4Power Trends
CPU Speed
Fill Rate
?
Appreciate the absolute values and the ratios.
5So whats the problem?
BeginScene()
time
- Second generation hardware
A
B
C
EndScene()
CPU
a
b
c
Graphics
time
Third generation hardware Wow, 10 faster!
A
B
C
EndScene()
CPU
b
c
a
Graphics
6What can you do to help?
- Scalability is the key
- Run at higher screen resolutions
- Run at higher color depths
- Use more complex rendering techniques on good
hardware - Ship multiple geometry models
- Protect your CPU
- Unlock the frame rate
7Higher Screen Resolutions
- 1) Include direct support for higher resolution
modes (uses lots of disk space). - 2) Store high resolution art and filter down to
produce lower resolution art. - 3) Store low resolution art and pixel double
- If you have art at 512x384 use it for 1024x768
- If you have art at 640x480 use it on 1280x1024
- (but only use a 1280x960 viewport)
8Higher Color Depths
- Runs at much the same speed but gives the user a
much richer experience - Uses frame buffer memory constructively
- You can re-use the previous 16 bit assets
- The main performance loss in true color is often
due to texture management
But beware the Frame Buffer Z Buffer depth
constraint on Riva TNT
9Complex Rendering Techniques - I
- Environment Mapping
- Beware of spending too much CPU on this.
- Dual Texture Lighting
- Bump Mapping
- Use more alpha transparency
- But see also Alpha sort issues later on
- Please try to use the extra fill rate!
10Complex Rendering Techniques - II
- Trilinear mipmapping for almost everything
- Use Detail textures
- Large textures for extra realism
- 32 bit textures - where its a quality win
- Compressed textures as long as quality is not
compromised
11Protect your CPU
- The big ones
- __ftol and other type conversion nightmares
- sqrt()
- thatll be seventy cycles please...
- Reciprocal square root
- One hundred and nine cycles through the FPU
- Transform and lighting (more on that later)
12Removing __ftol
- Remember that the compiler doesnt have a choice
but you can check the output - Write you own inline assembler conversion routine
if - You can accept differing rounding rules
- This doesnt break the optimiser!
13Replacement for sqrt()
- Sqrt seems natural if you are normalising
vectors, calculating environment map coordinates
or calculating distances - but its sloooow - Sample code is available from the developer web
site or from me directly and will be in future
versions of the SDK.
14Saturation Arithmetic (C)
- Limiting a floating point number to lie in the
range 0.0 to 1.0 inclusive (traditional method) - if (f lt 0.0)
- f 0.0
- else if (f gt 1.0)
- f 1.0
15Saturation Arithmetic (Pentium)
- if ((long )f lt 0)
- (long )f 0
- else if ((long )f gt 0x3f800000)
- (long )f 0x3f800000
- This is faster on a Pentium class processor since
the FPU is non-optimal (i.e. slow) and the
integer unit is much faster.
16Saturation Arithmetic (Pentium II)
- Use the cmov instructions
- cmp f,0
- cmovb f,0
- cmp f,3f800000
- cmova f,3f800000
Faster since unpredictable branches are the
bottleneck here. Unavailable on a Pentium.
17Unlock the Frame Rate
- Its essential that your physics model can run at
high refresh rates. - At least 100fps
- 30 or 60 fps limits are not acceptable and lead
to flat performance on high end hardware
18The Value of Batching
- Case Specifics
- The average of Polys Per Call (PPC) to
DrawPrimitive was 2.6, producing 40fps - Removing state changes to raise the average PPC
to 50 produced 58fps - Most of the removed state changes were
reasonable, i.e. not logically redundant - The changes did not reduce visual quality at all
- PPC of 200 is optimal
19Alpha Sort Issues
- The standard solution is
- 1) Draw all non-alpha polys (sort by texture)
- 2) Draw all alpha polys in back to front order
with Z compare enabled and Z update disabled.
This copes with overlapping alpha polys but you
cant sort by texture. (Intersection requires
decimation).
20Alpha Sort with Bounding Boxes
- When you are ready to draw your alpha polys then
draw non-overlapping sets using the
sort-by-texture technique as before
A
Here, you can safely draw all of A before any of
B or C BC need sorting
B
Viewport
C
21Geometry - Part 1
- Use the DX6 Transform and Clip engine - itll be
nearly as fast as your best efforts - It takes advantage of CPU specific optimisations
done by Intel, AMD etc. - It uses the guard band clipping region to enhance
performance - Use the DX7 interface ASAP
22Geometry - Part 2
- This gets you ready for hardware which can do the
job much faster than the CPU - Tell the chip designers if you need anything
non-standard - If you think DX is too slow then use a run-time
benchmark to select between DX and your own code
23Geometry - Part 3
- Use the DX pipeline for geometry which may be
rendered - Use your own transform for bounding boxes,
collisions, portals etc - Treat hardware TL as
- Write only
- Not necessarily pixel identical to CPU TL
DIPVB()
24Geometry - Part 4
- Consider choosing between models at game start-up
time - More complex Geometry should be several times
more complex - Introduce some LOD management
- Your artists are probably generating more complex
models and then throwing them away
25Lighting - Part 1
- If the DX Lighting model is good enough then
there are people who want to help you - Multi-texture shadow maps and light maps can be
very fast now - remember that (multi-pass ! multi-texture)
- Tell the chip companies what you need
26Lighting - Part 2
- Support more lights
- User a richer set of light types
- Scale with available power
- If you have more complex geometry you get better
lighting quality
27Summary
- Use the D3D pipeline as much as possible
- Use the CPU carefully- Abuse the fill rate
- Get on board with DX7
- Offer the richest experience possible
- You may have to treat the PC as two distinct
platforms, High-end and Low-end
28Questions
?
?
?
?
?
?
Richard Huddy RichardH_at_nvidia.com www.nvidia.com