Title: Iosif Antochi
13D Graphics Benchmarks for Low-Power
Architectures
Computer Engineering Laboratory Delft University
of Technology The Netherlands
2Overview
- Part I (Benchmarking environment)
- Overview
- Our tracer (Grtrace) player
- OpenGL Implementation
- Front-End
- Back-end (Rasterizer Simulator)
- Part II (Benchmarks and Statistics)
- The proposed benchmark set
- Architectural implications (Detailed statistics)
- Conclusions and future work
3PART IBenchmarking Environment
4Benchmarking EnvironmentOverview
5Simulator Framework (Front-end)
Applications (Benchmarks)
MESA LIBRARY
Mesa Core
Device Driver Entry Points
Software Rasterizer
(OS Mesa)
Rasterizer instructions LDTRI DEPTHEN
SETDPTFCT BLENDDIS . . . SWPBUFF
Decoupled data transfer methods
TCP / IP
FILES ON DISKS
6Simulator Framework (Back-end)
Bus Interface
Control Logic
Triangle Setup
Span Interpolation
Texture Processing
GUI written in Qt used for data visualization
and interpretation
Results
Pixel Processing
Video Memory
Frame, Z, Accumulation, and Stencil
Buffers
Texture Memory
7Simulator Framework Status
- The front-end is almost complete
- Hooks for functions calls (ISA oriented)
- Communication between front-end and back-end is
implemented using - TCP/IP
- Files
- The back-end needs more work
- Texture unit not finished
- Graphical interface (rewritten in Qt from GLUT)
is working but must be extended
8Benchmarks Characteristics
- Relevance - Applications that are often used for
the targeted architecture - Relevant applications groups for low-power 3D
graphics acc. - 3D Games such as FPS
- Virtual tour guides
- E-commerce ( 3D models of products)
- Irrelevant applications
- CAD/CAM applications
- Repeatability - The possibility to obtain the
same workload every time the application is used - Interactive games are usually not repeatable
9Obtaining Repeatability
- By using a pre-recorded demo
- Some of the interactive games allow playing
prerecorded demos - By using a tracing environment
- A tracing library that intercepts all the
graphics calls is placed between the application
and the graphics library. - Requires either dynamic linking or source code
of the application (next slide) - A player is used to generate the workload based
on the recorded traces
10OpenGL Environment
- Types of linking an application with the OpenGL
library. - Static
- Dynamic
- Dynamic linking allows to intercept OpenGL
calls while static linking does not
Application
Application
OpenGL
OpenGL
Static Linking
Dynamic Linking
11OpenGL tracers
- We developed our own OpenGL tracer
- Existing but unsuitable tracers
- ZAPdb from IBM
- gldebug from SGI
- Mesa
- gltrace from Stanford
- spyGLass
- gltrace from Hawksoft
12Grtrace
- Based on gltrace library version. 2.3a from
HawkSoft - Has several bugs fixed
- Code was rearranged in order to be more flexible
- Various improvements
- Frames images capture
- Improved speed
- Tuned towards complete traces
- A player for the generated traces was added
13Grtrace (II)
- Portable traces
- Create generic calls for window dependent
functions - Improving tracing performance
- Tracing can slow down significantly the traced
application and can modify its behavior - Solutions
- Binary traces
- Buffering
void GLAPIENTRY glAlphaFunc (GLenum func,
GLclampf ref) STARTBIN(TKG_ALPHAFUNC) //writes
2 bytes CMDLEN(8) //command body len writes 2 or
4 bytes print_value_bin(_GLenum, func) //writes
4 bytes print_value_bin(_GLclampf, ref)
//writes 4 bytes ENDBIN GLV.glAlphaFunc (func,
ref) //call to the target glAlphaFunc
14Grtrace(III)
- Reproducing OpenGL calls made by applications
- Requires complete traces
- Problematic OpenGL calls
- All the OpenGL functions related to arrays must
be expanded since arrays sizes are unknown. - Affected functions
- glArrayElement
- glDrawArrays
- glDrawElements
15PART II Benchmarks and Statistics
16Unsuitability of Current 3D Graphics Benchmarks
- Existing 3D graphics benchmark suites
- ViewPerf partially suitable
- Mostly CAD/CAM applications
- Designed for high image resolutions (over 800x600
pixels) - High number of polygons (over 20k)
- Recent Games
- Designed for high-end graphics accelerators that
are burning a lot of power
17The Proposed Benchmark Set
- Quake III (Q3)
- Q3L
- Q3H
- Tux Racer (Tux)
- AWadvs-04(AW)
- VRMLview(VR) not ready yet
- Austrian National Library
- Gratz
- Dino
18Quake III
- 3D FPS Game
- We used two profiles
- Q3L
- Low Res. 320x240
- Q3H
- High Res. 640x480
19Tux Racer
- Fast pace arcade game
- We used a resolution of 640x480
- Higher image quality than Q3
- Uses automatic texture coordinate generation
20AWadvs-04
- Part of the ViewPerf package
- Resolution used 640x480
- Uses the highest number of triangles (up to 25k)
21VRMLView- VRML models (I)
Austrian National Library - Total 10292 polygons
22VRMLView- VRML models (II)
Gratz3D - Total 8859 polygons
23VRMLView- VRML models (III)
Dino Total 4300 triangles Has a lower
complexity than the previous 3D models
24Results
25Detailed unit usage(based on partial results)
263D Graphics Pipeline
3D Graphics Pipeline
27Detailed statistics
28Architectural Implications (I)
- Highly used units should be definitely highly
optimized - Texture unit
- Depth unit
- Blending unit
29Architectural Implications (II)
- Less used units - should be implemented off the
critical path. - Alpha unit
- Fog unit
- Dithering (might be used more on low res. devices
with low color depth) - Unused units could be implemented in software,
when used the rendering speed would be greatly
reduced - Color Sum
- LogicOp unit
- Stencil
30Conclusions Future Work
- We developed a flexible OpenGL tracer suitable
for tracing and reproducing interactive OpenGL
applications. - The tracer can be used to generate benchmarks
from interactive applications - We propose a set of benchmarks
- Relevant results obtained from simulation on our
architectural simulator were presented