Title: Tr
1- Très courte présentation
- du groupe de Perpignan
2Personnes impliquées
- Permaments
- Philippe Langlois (30 )
- Marc Daumas (20 )
- David Defour (10 )
- Non-Permanents
- Nicolas Louvet
- Sylvain Collange
3État davancement
- Preuves automatiques
- M. Daumas
- Utilisation des statistiques pour caractériser le
comportement de code flottant - Spécification
- M. Daumas, D. Defour
- Arithmétiques exotiques (GPU)
- Algorithmes pour lévaluation dexpression
flottante - N. Louvet, P. Langlois, M. Daumas, D. Defour
- Algorithmes compensés
- Approximation bivariée à base de table
4- Statistiques et erreurs en arithmétique flottante
5Systems are now running fast enough and long
enough for their errors to impact on their
functionality
- Worst case analysis is meaningless for
applications that run for a long time - For example
- A process adds numbers in 1 to single precision
- Each addition produces a round-off error of
2-25 - This process adds 225 items
- The accumulated error is 1
- Note that
- 10 hours of flight time
- At operating frequency of 1 kHz
- Is approximately 225 operations
- Provided round-off errors are not correlated, the
actual accumulated error will be much smaller
6FAA regulations for aircraft require
- Probability of an error be below 10-9 for a 10
hour flight - Provides a bound on the number of numeric
operations (fixed or floating point) that can
safely be performed before accuracy is lost - Important implications for control systems with
safety-critical software - Worst-case analysis would blindly advise the
replacement of existing systems that have been
successfully running for years - Set of formal theorems validated by the PVS proof
assistant - Allow code analyzing tools to produce formal
certificates
7Some easy ways to obtain worst case behavior
- Systematic ad-hoc errors may lead to the slow
accumulation of small quantities of the same sign - Biased measures
- Synchronized time shift
8Developing probabilities on floating point
arithmetic
- Formal proof assistants such as ACL2, HOL, Coq
and PVS are used in areas where - Errors can cause loss of life or significant
financial damage - Common misunderstandings can falsify key
assumptions - Developments in probability share many features
with developments in floating point arithmetic - Each result usually relies on a long list of
hypotheses and slight variations induce a large
number of results that look almost identical - Most people want a trustworthy result but they
are not proficient enough to either select the
best scheme or detect minor faults that can
quickly lead to huge problems - Validation of a safety-critical numeric software
using probability should be done using an
automatic proof checker
9The Central Limit Theorem in action (n 1, 2 or
5)
10Limitations of the Central Limit Theorem to
target probability 10-9 (n 5, 40, 100 or 200)
11 12Problématique
- Notre expertise larithmétique IEEE-754
- Cadre très précis
- Précision, arrondi, gestion des exceptions
- Portabilité
- Nouvelles architectures (GPU)
- Ne respecte pas la norme
- Gestion des arrondis et des exceptions
- Problématiques
- Comment vont ce comporter les algorithmes sur ces
architectures - Est-il possible de définir des algorithmes
robustes ?
13Caractéristiques de larithmétique des GPU
- Dépendant de la génération et des constructeurs
- Plusieurs unités de calcul
- 3 MAD
- A.x B
- 1 unité pour le calcul des fonctions spéciales
(exp, log, cos, sin, 1/x, 1/?x) - 1 interpolateurbilinéaire, trilinéaire,
anisotropique - Exemple a.x b.y , a0.(a1.x1 b1.y1)
b0.(a2.x2 b2.y2) - 1 unité de mélange
- Exemple r a.r b.y
- Chaque unité se situe le long dun pipeline
- Contrainte sur leurs utilisations
14Bloc diagramme dun GPU
Command data fetch
Vertex Shader
Cull/Clip/Setup
Rasterization
Z-Cull
SharedL2 textureCache
Pixel Shader
Fragment pixel crossbar
Z-compare Blend
Memorypartition
Memorypartition
Memorypartition
Memorypartition
GDDR 3128 Mo
GDDR 3128 Mo
GDDR 3128 Mo
GDDR 3128 Mo
15Vertex Shader programmable
Vertex data
VLIW
MIMD 4 voies- MAD
VertexTexture Fetch
FP32ScalarUnit
L1 cache
1 voiesin,cos,log,exprcp, rsq
Shared L2 texture Cache
Branch Unit
- Vertex engine
- Multithread
- Branchement sanspénalité
- 2 inst. / cycle
- 9 FLOPS
Primitive Assembly
Viewport Processing
Mémoire de Texture
Triangle setup
16Pixel Shader
Texture data
Pixel data
MADD SIMD 4 voies calcul adresse de
texture Mini ALU Normalisation FP16
Mip-mapping Filtrage
FPTexture Processor
CacheL1
Mini-ALU
MADD SIMD 4 voies Mini ALU
Shared L2 texture Cache
Mini-ALU
- Pixel engine
- Multithread
- SIMD
Branch Unit
Fog Unit
Mémoirede Texture
Fragment pixel Crossbar
17Notre travail
- Caractérisation des MAD
- A.x B avec arrondi au milieu (? FMA)
- Mode darrondi troncature
- Nombre de bit supplémentaire entre 0 et 2
- Multiplication sans le calcul de tous les
produits partiels - Ajout éventuel dune constante de biais
- Pas de gestion des dénormalisés (? 0)
- Pas de qNaN
- Précision
- Définition dalgorithmes float-float fonctionnels
18Réflexion
- Objectifs
- Définir des algorithmes robustes en labsence
de standard flottant - Quantifier le surcoût induit
- Exemple
- Addition / multiplication float-float avec
arithmétique faithfull
D. M. Priest, On properties of floating point
arithmetic's Numerical stability and the cost of
accurate computations. Phd Thesis, 1992
19Opérateurs Float-Float
D. M. Priest, Algorithms for arbitrary precision
floating point arithmetic, Proceedings of the
10th IEEE Symposium on Computer Arithmetic
(Arith-10), 1991
20Arithmétique flottante sur GPU
Reference Number of bits Number of bits Number of bits Number of bits Special values
Reference Total Sign Exponant Mantissa Special values
Nvidia 16 1 5 10 NaN, Inf
Nvidia 32 1 8 23 ( 1) NaN, Inf
ATI 16 1 5 10 No
ATI 24 1 7 16 No
ATI 32 1 8 23 ( 1) No documentation
IEEE-754 ANSI-ISO 32 1 8 23 ( 1) NaN, Inf
IEEE-754 ANSI-ISO 64 1 11 52 ( 1) NaN, Inf