Scalability - PowerPoint PPT Presentation

About This Presentation
Title:

Scalability

Description:

Scalability Advanced D3D Programming Richard Huddy RichardH_at_nvidia.com – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 29
Provided by: Richard1992
Category:

less

Transcript and Presenter's Notes

Title: Scalability


1
Scalability
  • Advanced D3D Programming
  • Richard Huddy
  • RichardH_at_nvidia.com

2
Basic Objectives
  • To produce the best experience on every users
    machine
  • To exploit all of the resources available
  • To cope with a broad spread of hardware
  • To avoid bottoming out during the shelf-life of
    the game / engine

3
What is a high-end PC?
  • A 125 mega-texel device
  • A 125 mega-pixel device
  • A fast CPU ( gt 350MHz)
  • AGP 2X/4X Bus
  • Lots of system RAM ( gt 64MB)
  • Huge frame buffers (16 to 32 MB)
  • Multi-Texture at low cost

4
Power Trends
CPU Speed
Fill Rate
?
Appreciate the absolute values and the ratios.
5
So whats the problem?
BeginScene()
time
  • Second generation hardware

A
B
C
EndScene()
CPU
a
b
c
Graphics
time
Third generation hardware Wow, 10 faster!
A
B
C
EndScene()
CPU
b
c
a
Graphics
6
What can you do to help?
  • Scalability is the key
  • Run at higher screen resolutions
  • Run at higher color depths
  • Use more complex rendering techniques on good
    hardware
  • Ship multiple geometry models
  • Protect your CPU
  • Unlock the frame rate

7
Higher Screen Resolutions
  • 1) Include direct support for higher resolution
    modes (uses lots of disk space).
  • 2) Store high resolution art and filter down to
    produce lower resolution art.
  • 3) Store low resolution art and pixel double
  • If you have art at 512x384 use it for 1024x768
  • If you have art at 640x480 use it on 1280x1024
  • (but only use a 1280x960 viewport)

8
Higher Color Depths
  • Runs at much the same speed but gives the user a
    much richer experience
  • Uses frame buffer memory constructively
  • You can re-use the previous 16 bit assets
  • The main performance loss in true color is often
    due to texture management

But beware the Frame Buffer Z Buffer depth
constraint on Riva TNT
9
Complex Rendering Techniques - I
  • Environment Mapping
  • Beware of spending too much CPU on this.
  • Dual Texture Lighting
  • Bump Mapping
  • Use more alpha transparency
  • But see also Alpha sort issues later on
  • Please try to use the extra fill rate!

10
Complex Rendering Techniques - II
  • Trilinear mipmapping for almost everything
  • Use Detail textures
  • Large textures for extra realism
  • 32 bit textures - where its a quality win
  • Compressed textures as long as quality is not
    compromised

11
Protect your CPU
  • The big ones
  • __ftol and other type conversion nightmares
  • sqrt()
  • thatll be seventy cycles please...
  • Reciprocal square root
  • One hundred and nine cycles through the FPU
  • Transform and lighting (more on that later)

12
Removing __ftol
  • Remember that the compiler doesnt have a choice
    but you can check the output
  • Write you own inline assembler conversion routine
    if
  • You can accept differing rounding rules
  • This doesnt break the optimiser!

13
Replacement for sqrt()
  • Sqrt seems natural if you are normalising
    vectors, calculating environment map coordinates
    or calculating distances - but its sloooow
  • Sample code is available from the developer web
    site or from me directly and will be in future
    versions of the SDK.

14
Saturation Arithmetic (C)
  • Limiting a floating point number to lie in the
    range 0.0 to 1.0 inclusive (traditional method)
  • if (f lt 0.0)
  • f 0.0
  • else if (f gt 1.0)
  • f 1.0

15
Saturation Arithmetic (Pentium)
  • if ((long )f lt 0)
  • (long )f 0
  • else if ((long )f gt 0x3f800000)
  • (long )f 0x3f800000
  • This is faster on a Pentium class processor since
    the FPU is non-optimal (i.e. slow) and the
    integer unit is much faster.

16
Saturation Arithmetic (Pentium II)
  • Use the cmov instructions
  • cmp f,0
  • cmovb f,0
  • cmp f,3f800000
  • cmova f,3f800000

Faster since unpredictable branches are the
bottleneck here. Unavailable on a Pentium.
17
Unlock the Frame Rate
  • Its essential that your physics model can run at
    high refresh rates.
  • At least 100fps
  • 30 or 60 fps limits are not acceptable and lead
    to flat performance on high end hardware

18
The Value of Batching
  • Case Specifics
  • The average of Polys Per Call (PPC) to
    DrawPrimitive was 2.6, producing 40fps
  • Removing state changes to raise the average PPC
    to 50 produced 58fps
  • Most of the removed state changes were
    reasonable, i.e. not logically redundant
  • The changes did not reduce visual quality at all
  • PPC of 200 is optimal

19
Alpha Sort Issues
  • The standard solution is
  • 1) Draw all non-alpha polys (sort by texture)
  • 2) Draw all alpha polys in back to front order
    with Z compare enabled and Z update disabled.
    This copes with overlapping alpha polys but you
    cant sort by texture. (Intersection requires
    decimation).

20
Alpha Sort with Bounding Boxes
  • When you are ready to draw your alpha polys then
    draw non-overlapping sets using the
    sort-by-texture technique as before

A
Here, you can safely draw all of A before any of
B or C BC need sorting
B
Viewport
C
21
Geometry - Part 1
  • Use the DX6 Transform and Clip engine - itll be
    nearly as fast as your best efforts
  • It takes advantage of CPU specific optimisations
    done by Intel, AMD etc.
  • It uses the guard band clipping region to enhance
    performance
  • Use the DX7 interface ASAP

22
Geometry - Part 2
  • This gets you ready for hardware which can do the
    job much faster than the CPU
  • Tell the chip designers if you need anything
    non-standard
  • If you think DX is too slow then use a run-time
    benchmark to select between DX and your own code

23
Geometry - Part 3
  • Use the DX pipeline for geometry which may be
    rendered
  • Use your own transform for bounding boxes,
    collisions, portals etc
  • Treat hardware TL as
  • Write only
  • Not necessarily pixel identical to CPU TL

DIPVB()
24
Geometry - Part 4
  • Consider choosing between models at game start-up
    time
  • More complex Geometry should be several times
    more complex
  • Introduce some LOD management
  • Your artists are probably generating more complex
    models and then throwing them away

25
Lighting - Part 1
  • If the DX Lighting model is good enough then
    there are people who want to help you
  • Multi-texture shadow maps and light maps can be
    very fast now
  • remember that (multi-pass ! multi-texture)
  • Tell the chip companies what you need

26
Lighting - Part 2
  • Support more lights
  • User a richer set of light types
  • Scale with available power
  • If you have more complex geometry you get better
    lighting quality

27
Summary
  • Use the D3D pipeline as much as possible
  • Use the CPU carefully- Abuse the fill rate
  • Get on board with DX7
  • Offer the richest experience possible
  • You may have to treat the PC as two distinct
    platforms, High-end and Low-end

28
Questions
?
?
?
  • ?

?
?
?
Richard Huddy RichardH_at_nvidia.com www.nvidia.com
Write a Comment
User Comments (0)
About PowerShow.com