Title: Empirical Software Engineering Strategies
1 Empirical Software Engineering Strategies
- Anneliese Amschler Andrews
- Department of Computer Science
- University of Denver
- andrews_at_cs.du.edu
2Types
- Survey (interview, questionnaire)
- ex post facto
- Case Study (observation, measurement, statistical
analysis) - Experiment (highly controlled, manipulate one or
more variables) - quasi-experiment (no random
- assignment)
3Surveys
- Retrospective
- Understand population
- Many variables possible
- Purposes descriptive, explanatory, explorative
- Questionnaires, Interviews
4Case Studies
- Single entity or phenomenon
- Specific time and space
- Data collection and analysis
- Industrial evaluation (process improvement,
different techniques) - Comparison (sister project, partial application)
- Validation
- Multiple sites
5Case study
- Concern
- Small case study may not scale
- Lack of control
- Influence of confounding factors
- Generalizability
- Not reproducible
6Case Study Positives
- Positives
- Typical behavior
- Realistic
- Scale
- Usually many variables
7Experiments
- Precise control
- Manipulate variables of interest (software,
treatments, expertise) - Small scale tasks
- High cost
- Confirm theories, conventional wisdom
- Explore relationships
- Evaluate accuracy of models
- Validate measures
8Experiment process
- Definition
- Planning
- Operation
- Analysis and interpretation
- Presentation and package
9Strategies Comparison
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16Define Experiment- Goal definition template
- Object of study
- Purpose
- Quality focus
- Perspective
- Context
17Experiment context
18Definition framework
19Experiment Planning
- Context selection
- Hypothesis formulation
- Variable selection
- Selection of subjects
- Experiment design
- Instrumentation
- Validity evaluation
20Context Selection
- Goal large, real projects, professional staff
- Trade-offs necessary
- Off-line vs. on-line
- Students vs. professionals
- Toy vs. real problems
- Specific vs. general
21Hypothesis Formulation
- Null hypothesis, H0
- only reason for different outcomes is accidental
- Alternative hypothesis, Ha, H1
- Reject null hypothesis
- Formulation of hypothesis drives statistical tests
22Errors in Hypothesis Testing
- Type-I error
- P(type-I error)P(reject H0H0 true)
- Type-II error
- P(type-II error)P(not reject H0 H0 false)
- Power of test can it reveal true pattern in
collected data - powerP(reject H0H0 false)1-P(type-II error)
23Variables Selection
- Independent variables
- Controllable
- Influence dependent ones
- Levels of measurement
- Dependent variables
- One or more
- Direct or indirect (validate!)
- Define measurement scales, range of values
24Selection of Subjects
- Influences generalization of results
- Population sample
- Probability or non-probability sampling
25Probability Sampling Types
- Simple random sampling
- Systematic sampling
- Stratifies random sampling
- Convenience sampling
- Quota sampling
26Choosing Sample Size
- Influences error, power of statistical test
- Large variability gt large sample
- Data analysis method influences sample size
27Experiment Design
- Influenced by
- hypotheses,
- necessary treatments,
- power of statistical tests,
- measurement scales,
- objects and subjects
28General Design Principles
- Randomization (objects, subjects, order of tests,
population sampling) - Blocking (factor of no interest that influences
outcomes) group by factor level (e. g.
experience) - Balancing (equal number of subjects for each
treatment)
29Standard Design Types
- One factor, 2 treatments
- One factor, gt2 treatments
- Two factors, 2 treatments
- More than two factors each with 2 treatments
301 factor, 2 treatments
- Completely randomized
- 1 treatment (T1, T2) per subject
- Needs more subjects
- Random assignment of S to T1/T2
- Paired Comparison
- Subject has both treatments
- Order of treatment randomized
- Balanced sam number of subjects for each order
of treatments
311 factor , gt 2 treatments
- Completely randomized
- 1 treatment per subject
- Random assignment of treatments
- Needs more subjects
- Randomized complete block design
- Variability between subjects large
- Block subjects into groups
- Assign order of treatments within block randomly
322 factors
- 22 factorial design
- 2 factors, each with 2 treatments
- 4 possible treatment pairs
- Randomly assign subjects to each treatment pair
- 2 stage nested design
- Connections between factors leading to related
treatments - Ex F1 PL (OO/F) F2 (not) fault-prone
- Qu efficiency of unit testing
33gt2 factors
- 2k factorial design
- Factors k3, 2 treatments per factor
- 8 combinations of factors and treatments
- Assign subjects randomly to each (balance!)
- More than 2 treatments construct Latin square
342k fractional factorial design
- Assumption higher order factor interactions
negligible - Sparsity of effect principle
- Projection property
- Sequential experimentation
- One-half fractional factorial design
- Choose half of full factorial design
- Removing 1 factor results in full factorial
design for remainder
352k fractional factorial design (cont.)
- One-quarter fractional factorial design
- Choose one quarter of full fact. Des.
- Assumes fewer important factor interactions
- May be used sequentially to screen importance of
factor interactions
36Instrumentation
- Objects specs., code, designs
- Need to know/control properties (e. g.
seeded/actual faults for code inspection) - Guidelines for participants
- Written instructions
- Training
- Measurement instruments
- Unobtrusive during task
- Tests, forms, questionnaires
- Instrumentation must not affect outcome of
experiment
37Validity
- Conclusion validity
- Treatment has statistical rel. to outcome
- Internal validity
- Treatment causes outcome
- Construct validity
- Relationship between theory and observation
- Cause construct/treatment
- Effect construct/outcome
- External validity
- generalization
38Conclusion validity
- Low statitsical power
- Violation of assumptions of statistical tests
- Fishing and error rate
- Reliability of measures
- Reliability of treatment implemenation
- Random irrelevancies in experimental setting
- Random heterogeneity of subjects
39Internal validity
- Single group
- History
- Maturation
- Testing
- Instrumentation
- Statistical regression
- Selection
- Mortality
- Ambiguity about direction of causal influence
40Internal validity (cont.)
- Multigroup
- Interactions with selection
- Social threats
- Diffusion of imitations of treatments
- Compensatory equalization of treatments
- Compensatory rivalry
- Resentful demoralization
41Construct validity
- Design
- Inadequate preoperational explication of
constructs - Mono-operation bias
- Mono-method bias
- Confounding constructs and levels of constructs
- Interaction of different treatments
- Interaction of testing and treatment
- Restricted generalizability across constructs
42Construct validity
- Social threats
- Hypothesis guessing
- Evaluation apprehension
- Experimenter expectancies
43External validity
- Interaction of selection and treatment
- Interaction of setting and treatment
- Interaction of history and treatment
44Setting priorities among validity threats
- Depends on purpose of experiment
- Theory testing
- Internal, construct, conclusion, external
- Applied research
- Internal, external, construct, conclusions
45Experiment Operation
- Preparation
- Commit participants
- Ascertain proper instrumentation
- Execution
- Data collection
- Experimental environment
- Data validation
46Commit Participants
- Obtain consent
- Sensitive results
- Inducements
- Deception
47Ethical considerations
- Trust with information received from company
- Monitoring and enforceability
- Responsibility to disclose all relevant
information - Publications
- Conflict of interest
- Regulatory compliance
- Disclose all potential adverse effects
- Dont use results against Cos employees
48Analysis and Interpretation
- Descriptive statistics
- Data set reduction
- Hypothesis testing
49Descriptive Statistics
50Measures of Dispersion
- Variance
- Standard deviation
- Range
- Variation interval (xmin, xmax)
- Coefficient of variation
- Frequency
- Relative frequency
51Frequency Table Example
52Measures of Dependency
53Measures of Dependency
54Measures of Dependency
- Correlation coefficient
- Pearson (interval/ratio, close to normal) linear
dependency - Spearman rank (ordinal, far from normal)
- Kendalls
55Measures of dependency (cont.)
- Multivariate analysis
- Multiple regression
- Principal components analysis
- Cluster analysis
- Discriminant analysis
56Graphical visualization
57Graphical visualization
58Graphical visualization
- Cumulative histogram
- Pie chart
59Data Set Reduction
- Outlier analysis
- Validation
- May need to include outlier
- May lead to grouping by factor levels not
considered before