Title: Claude Tadonki
1- Claude Tadonki
- Mines ParisTech CRI Mathématiques et Systèmes
- Laboratoire de lAccélérateur Linéaire/IN2P3/CNRS
- France
- claude.tadonki_at_u-psud.fr
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.
2Large Scale Kronecker Product on Supercomputers
C. TADONKI
The Kronecker product (définition and
applications)
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.
3Large Scale Kronecker Product on Supercomputers
C. TADONKI
The Kronecker product (properties and problem
formulation)
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.
4Large Scale Kronecker Product on Supercomputers
C. TADONKI
The Kronecker (complexity and recurrence equation)
- Forming the matrix first would
- require a huge amount of memory
- yield lot of redundant multiplication, which in
total would be
Using the so-called normal factorization, we
could derive an optimal scheme which reduces the
number of floatting point multiplication to
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.
5Large Scale Kronecker Product on Supercomputers
C. TADONKI
The Kronecker product and its applications
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.
6Large Scale Kronecker Product on Supercomputers
C. TADONKI
Performance issues and heuristic for finding a
good topology
- The total (parallel) execution time depends on
- the sizes of the matrices
- the gap between virtual topology and physical
topology - the way the task is splitted among the
processors (decomposition)
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.
7Large Scale Kronecker Product on Supercomputers
C. TADONKI
Performances
We consider N 6 matrices of orders 30, 36, 32,
18, 24, 16, thus L 159 252 480
- We see that
- our heuristic yields a significant improvment
compare to trivial decompositions - we start loosing the scalabily when the number
of cores increases (com) - We the turn to hybrid implementation
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.
8Large Scale Kronecker Product on Supercomputers
C. TADONKI
Performance of the hybrid implementation
- We see that
- the hybrid implementation is better for larger
number of cores - for smaller number of cores, the SM
implemntation exacerbates on cache misses - Need to investigate on the compromise and a
better memory layout.
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.
9Large Scale Kronecker Product on Supercomputers
C. TADONKI
END QUESTIONS
2nd Workshop on Architecture and Multi-Core
Applications 23rd International Symposium on
Computer Architecture and High Performance
Computing (SBAC PAD 2011) October, 26 29 2010,
Vitória, Espírito Santo, Brazil.