Title: Kein Folientitel
1Gesellschaft für Parallele Anwendungen und
Systeme mbH
WP1 Contribution
Karl Solchenbach, Pallas GmbH Apart Meeting,
June 25, 1999
Pallas GmbHHermülheimer Straße 10D-50321 Brühl,
Germanyinfo_at_pallas.dehttp//www.pallas.com
2Performance Evaluation
- Intelligent use of Vampir and Dimemas
- To qualify the performance of a parallel code
- (Speed-up of 9.3 on 16 processors, good or bad?)
- To understand reasons of poor performance
- (Communication, latency, IO, imbalance, ...)
- To develop strategies for code tuning
3Total time Useful time Overhead
Overhead
times efficiencies ?
4Other efficiencies
- Single node efficiency
- sustained vs. peak rate (Enode r / r? )
- often lt 10
- Total efficiency
- sustained vs. parallel peak rate (Enode ?
Epar) - Algorithmical efficiency
- best sequential algorithm vs. parallel algorithm
5Example Sequential chain
- Etrans 1
- Elb 1
- Edep 1/P
- Econc 1/P
6Example One process busy
- Etrans 1
- Elb 1/P
- Edep 1
- Econc 1/P
7Example Ring
- Elat 85
- Ebw 59
- Elb 1
- Edep 1
- Econc 50
8Example NAS LU/ (Class W) on T3E (8 PEs)
- Elat 91.9
- Ebw 99.6
- Etrans 91.5
- Elb 91.1
- Edep 93.7
- Eblock 85.4
- Econc 78.1
9Example NAS LU (detail)
dependency
10Comparison on T3E and SX-4
Cray T3E NEC SX-4 Elat 91.9 96.7 Ebw
99.6 100.0 Etrans 91.5 96.7 Elb 91.1
95.0 Edep 93.7 90.0 Eblock 85.4
85.5 Econc 78.1 82.7
11FE-Code Scalability considerations
12Increasing floating-point speed
13Conclusion
- Vampir and Dimemas can quantify and separate the
main sources for efficiency loss(without source
analysis or code know-how) - Strategy for code tuning can be developed
- replace parallel algorithm ?
- improve implementation of existing algorithm ?
- use/buy other HPC platform ?
- Powerful tools and their intelligent use improve
productivity in performance tuning