Title: HPCBugBase An Experience Base for HPC Defects
1HPCBugBaseAn Experience Base for HPC Defects
- Taiga NakamuraUniversity of Maryland
2Study of HPC Defects
- Software defects (bugs) are major bottlenecks to
development productivity in HPC - Knowledge about common defects (bugs) will help
us explore how to reduce the time spent debugging - Novice developers can learn how to detect/prevent
them - Someone may develop tools and/or improve language
- We are building defect patterns in HPC
- Symptoms help identify the nature of the bug
- Advice for avoiding/preventing them help make
fewer bugs - Examples specific examples
- Based on the empirical data we collected in
various studies
3Defect Experience Base
HPCBugBase (www.hpcbugbase.org) a Wiki-based
implementation of a defect experirnce base
Public website anyone can view/edit the
content Additional feature for accepting feedback
easily
4Building knowledge about defects
Goal Provide guidance about types of defects
likely to occur during HEC software development
using an Iterative/incremental process
Process 1 building initial defect patterns
Activities Build and evolve an experience base
for storing and sharing results of studies
Collecting knowledge
Collecting Defect reports
Applying knowledge
Hypothesis Knowledge about domain specific
defects can help developers avoid them
Process 3 packaging knowledge
Build Defect ExperienceBase
Reports ontool demands
Educationalmaterials
Refining knowledge
Feedback fromexperts/developers
Process 2 validating and adding knowledge
5Defect Classification
6Top-level Defect Type
- Pattern Space Decomposition
- Incorrect mapping between the problem space and
the program memory space - Symptoms
- Segmentation fault (if array index is out of
range) - Incorrect or slightly incorrect output
- Causes
- Mapping in parallel version can be different from
that in serial version - E.g., Array origin is different in every
processor - E.g., Additional memory space for communication
can complicate the mapping logic - Cures preventions
- Validate array origin, whether buffer includes
guard buffers, whether buffer refers to global
space or local space, etc. - these can change
while parallelizing the code - Encapsulate the mapping logic to a dedicated
function - Consider designing the code which is easy to
parallelize
7Defect Example
Problem space may not be equally
divisible Incorrect loop boundary and array index
MPI_Comm_size(MPI_COMM_WORLD size) MPI_Comm_rank
(MPI_COMM_WORLD rank) nlocal N / size buffer
(int)malloc((nlocal2)
sizeof(int)) nextbuffer (int)malloc((nlocal2)
sizeof(int)) / Main loop / for (n 0 n lt
steps n) for (x 0 x lt nlocal x)
nextbufferx (buffer(x-1N)Nbuffer(x1)N
) 10 / Exchange boundary cells with
neighbors / ... tmp buffer buffer
nextbuffer nextbuffer tmp
8Defect Experience Base
Content structure
Classification schemes
Articles
InitialClassificationScheme
Use CaseScenarios
Defect types
Side Effects
Synchronization
Performance
Space Decomposition
Use of Lang Features
Algorithm
ProgrammingPractices
I/O Defects
Defects withRandom Funcs
LoadBalancing
Scheduling
Deadlock
Race
Specific defects
Using the Same Randomization Seed in All
Processes
MissingMPI_Wait
Bottleneck in Message Scheduling
MissingMPI_Finalize
CorruptedFile Output
Defect instances
Instance
Instance
Instance
Instance
Instance
Instance
Instance
Instance
9Defect Experience Base
Defect type
Name Name of the defect type Entries List of
sub-types and specific defects that belong to the
defect type Symptoms Advice for detection, how
does someone know there is a defect? Cures and
preventions Advice for solving and avoiding a
defect
Specific defect
Name Name of the defect Fault description What
was wrong in the code? Instances List of defect
instances Other findings Other findings and
contexts
Instance
Defect Link to a description of defect Code
Source code containing the defect Location Where
the defect was found Time to find and fix When
the defect was inserted and fixed
10Current Status
- Defect patterns have been developed and evolved
- Mainly MPI, plus OpenMP and UPC
- Major products
- Defect experience base (HPCBugBase)
- Educational materials
11Teaching Defect Patterns
- Question if students are explicitly taught
common defect patterns in a lecture, can they
avoid and/or find/fix them?
Reference group (2 classes, 21 students)
Test group (2 classes, 13 students)
Language basics
Language basics
Treatment (Lecture Homework)
Programming assignment
Programming assignment
Compare
Defect?
Defect?
12Teaching Defect Patterns
Number of defects identified for each student
of defects (never resolved)
of defects (resolved)
of defects (total)
The test group made fewer defects at a
statistically significant level (P 0.048)
34 The test group left fewer defects in the code
at a statistically significant level (P 0.029)
31 The test group found 42.3 of defects on
average while the reference group found 39.8 of
the defects on average
13Expanding the Scope
Expert validation (Top-down)
Research method
Study defect patterns in new HPCS
languages (Interactions with vendors)
Programming model
Improve usability
X10Chapel
HPCBugBase
UPC CAF
OpenMP
Collect defect patterns from real HPC projects
(Flash GS2, )
MPI
Code analysis (Bottom-up)
Problem size
14Discussion Points
- What functionality is desired?
- E.g., Search
- What can be done to encourage the use of
HPCBugBase? - Advertisement?
- What data sources are available for defect study?
- Who will maintain HPCBugBase?