Title: L
1Lévaluation des interfaces utilisateurs
- N.B. Dans ces diapos, BGBG réfère à la 2e
édition du livre Human-Computer Interaction
de Baecker, Grudin, Buxton et Greenberg (1995)
2Formative vs Summative Evaluation
- Formative evaluation (Évaluation formative)
- Happens throughout the design process
- Can evaluate scenarios, sketches, models,
prototypes - Summative evaluation (Évaluation
sommative/récapitulative) - Typically happens at the end
- Assesses system andinterface design
quality,i.e., how well have we done?
3Analytic vs Empirical Evaluations (BGBG pp.
228-229)
- Analytic Evaluations (Évaluations analytiques)
- Do not involve actual users
- Focus is on why things happen the way they
do,and on the components of the system - Produce interpretations and suggestions, not
solid facts - Better for formative evaluation than summative
evaluation - Can be used early in design process,before any
high-fidelity prototype exists - Examples heuristic evaluation, walkthrough,
claims analysis - Empirical Evaluations (Évaluations empiriques)
- Involve actual users
- Focus is on what actually happens in practice
- Produce factual measurements and observations
- Good for summative evaluation,but may not
clearly point to what changes to make - Can produce a lot of data that is laborious to
analyze - Examples experiments, usability testing, field
studies
4Empirical EvaluationNaturalistic Observation vs
True Experiments(Example Ray and Ravizza 1985)
5Empirical Evaluation User Testing
- Design and implement scenario or prototype
- Record user behaviour
- Typical usage, or critical incidents
- Keystroke and mouse event recording
- Thinking aloud protocols
- Audio or video recording
- Collect subjective impressions(questionnaire,
interview) - Analyze recordings of user behaviour
6Typical Steps in User Testing (Gomoll, in Laurel,
85-90)
- Set up the observation
- Describe the purpose of the study, and how the
data collected will be used - Tell the user (verbally and on paper) that it's
OK to quit at any time - Ask participant if they are willing to sign form
to give their permission to begin - Pre-questionnaire (name, age, handedness,
background, education, experience with computers,
etc.) - Talk about and demonstrate the equipment
- Explain how to think aloud
- Explain that you will not provide help
- Describe the task and introduce the system
- Ask if there are questions before you start then
begin observation - Post-questionnaire and/or interview to solicit
opinions, impressions, etc. - Conclude the observation and debrief participants
- Transcribe, tabulate the data and results
- Analyze, interpret the results
7User Testing (BGBG, Fig. 2.8, p. 85, adapted from
Neilsen, 1992)
- Practical study design
- Reflect on the participants backgrounds and how
they might affect the study - Be aware of problems that arise when
experimenters know the users personally - Prepare for the study carefully (avoid last
minute panic) - Select the tasks carefully to be representative
and to fit the allotted time - In general, start with an easier (but not
frivolous) task - Write down features of system not being tested as
well as those that are! - Define the start-up state for the study precisely
- Define precise rules for when and how users can
be helped during the study - Plan timing and cut-off procedure (if subject
gets stuck) for each part of study - Include provisions for data collection (e.g.,
audio, video, or keystroke capture) - Plan data analysis techniques in advance
- Carry out an initial pilot study to test your
protocol - Written materials
- Participant release (permission) form
- Pre-questionnaire covering prior experience etc.
- Introduction to the study for users, including
scenario of use,and description of tasks - Checklist for experimenters, and paper for
note-taking - Post-questionnaire or survey
8User Testing (BGBG, Fig. 2.8, p. 85, adapted from
Neilsen, 1992)
- Carrying out the study
- Let users know that complete anonymity will be
preserved - Let them know that they may quit at any time
- Stress that the system is being tested, not the
participant - Note participant is the more modern term for
subjet - Indicate that you are only interested in their
thoughts relevant to the system - Demonstrate the thinking-aloud method by acting
it out for a simple task, e.g., figuring out how
to load a stapler - Hand out instructions for each part of the study
individually, not all at once - Maintain a relaxed environment free of
interruptions - Occasionally encourage users to talk if they grow
silent - If users ask questions, try to get them to talk
(e.g., What do you think is going on? and
follow predefined rules on when to help or
interrupt to help. - Debrief each user after the experiment
9Thinking Aloud
- Attempt to elicit thought processes of
participant, thereby yielding valuable insights
(although process is slowed down and may be
changed) - Participant talking while they are doing
- Problems they are having
- Solutions they are considering
- Why they are having trouble
- Insights that they have
- Wishes that they have
- Co-Discovery Pairs of participants conversing
(Co-Discovery Learning, Kennedy paper in BGBG,
pp. 182-185)
10Data Capture and Analysis
- Keystrokemouse logging
- Record precise user behaviour
- Record times to carry out actions
- Record user errors
- Observation and note taking by observers,especial
ly of user problems and critical incidents - Best if note taking done by a 2nd observer
- Audio and video recordings
- Can't observe and record all behaviour in
real-time - Preserve behaviour for review (even non-verbal
behaviour) - Can produce a lot of data ?
11Asking Users in Addition to Observing Them
- Methods
- (Post-) Questionnaire design
- Formulating asking questions, analyzing
answers - Hard to avoid bias in the phrasing of questions
- Therefore requires pre-testing (pilot testing)
- Surveys (Sondages) (possibly large-scale)
administration of questionnaires to appropriate
samples of individuals chosen from a population - Administration of questions through interviews
12Ethical Issues
- Basic principles
- Do no harm
- Voluntary participation
- Informed consent
- Right to privacy
- Use of research protocols and consent forms
- Explanation of study and purpose
- Anonymity
- Ability to withdraw at any time
- For example, see p. 256 of Rosson Carroll
13Une taxonomie de plusieurs techniques
dévaluation
14Taxonomie de McGrath
(discret)
(intrus, dérangeant)
15Quadrant 1 Field Strategies
- Study systems in real use on real tasks in real
work environments, i.e., observe under settings
with conditions as natural as possible - Field studies Study systems in situ, disturbing
as little as possible, e.g., with ethnography,
contextual inquiry - Field experiments Observe impact of changing
(ideally) one aspect of a work environment, e.g.,
in beta testing, studies of technological change
and new technology introduction
16Quadrant 2 Experimental Strategies
- Study systems in a lab under controlled
conditions, i.e., conditions concocted for
research purposes - Laboratory experiments Carry out controlled
experiments studying impacts of (ideally) one (or
two) interface parameter(s) - Experimental simulations Create in lab for
experimental purposes a real system that is used
by real users on (usually) artificially
simplified tasks, e.g., user testing, usability
engineering
17Quadrant 3 Respondent Strategies
- Ask informants to tell us something about
themselves and/or their work or about an
interface, i.e., where the setting in which
questions are asked plays no role - Judgment studies Ask respondents about an
interface, e.g., in a demonstration, or with
usability inspection - Sample surveys Ask respondents about themselves
and/or their work, e.g., with questionnaires,
surveys, interviews
18Usability Inspection (a Respodant strategy)
- Methods
- Heuristic evaluation Judgments by a panel of
evaluators (e.g, 3 to 5) of the degree to which
an interface satisfies a set of usability
guidelines, followed by discussion and analysis - Cognitive walkthroughs
- Roles
- Evaluation without users (contrast to usability
tests, etc.) - Elicit expert opinions about the users model,
functionality, look feel, etc.
19Usability Inspection (contd)
- Advantages
- Structured method of using accumulated wisdom of
experts - Disadvantages
- Doesnt take advantage of real insights from real
users - Example Heuristic evaluation with 10 usability
guidelines (Nielsen, BGBG, Fig. 2.7, p. 83) - Visibility of system status
- Match between system and the real world
- User control and freedom
- Consistency and standards
- Error prevention
- Recognition rather than recall
- Flexibility and efficiency of ue
- Aesthetic and minimalist design
- Help users recognize, diagnose, and recover from
errors - Help and documentation
20Demonstrations (a Respodant strategy)
- Demonstrate system to
- Any random person
- Management, potential investors, journalists
- Potential customers
- Potential users
- Potential business partners
- Take detailed notes
- Elicit reactions to user's model, functionality,
interface - Advantages
- Get feedback early in prototype or system
construction - You're going to have to give demos anyway why
not learn from them? - Disadvantages
- System still rough, which introduces noise into
process
21Quadrant 4 Theoretical Strategies
- Ask a theory to tell us something about people's
work and/or about an interface, i.e., no
observation of behaviour, experiments, or
questions are required - Formal theory Use a qualitative theory or some
equations, e.g., behavioural theory, such as
colour vision or Fitts Law - Computer simulation Use and run a computer
model, e.g., human information processing theory
22Résumé des techniques dévaluation
- Stratégies sur le terrain (Field Strategies)
- Études sur le terrain (Field Studies)
- Observer processus in situ, en changeant le
système le moins possible - Exemples études ethnographiques, enquêtes
contextuelles (contextual inquiry) (BGBG pages
42, 46) (pas nécessaire à
savoir pour lexamen) - Expérimentations sur le terrain (Field
Experiments) - Changer un aspect de lenvironnement et observer
les effets - Stratégies expérimentales (Experimental
Strategies) - Expérimentations de laboratoire (Laboratory
Experiments / Controlled Experiments) - Varier ou manipuler, de façon précise, une ou
plusieurs variables indépendentes - Mesurer de façon précise, une ou plusieurs
variables dépendentes - Essayer de contrôler soigneusement les conditions
- Simulation expérimentale
- Créer un système réel, dans un laboratoire, pour
des utilisateurs réels - Exemples
- Tests dutilisabilité / tests dutilisateurs
- Emploi souvent un protocol e de penser à haute
voix et/ou une phase de découverte où
lutilisateur explore linterface emploi souvent
aussi des questionnaires et/ou entrevues - Génie dutilisabilité (Usability engineering)
- Plus formel que les tests dutilisabilité
- Mesures quantitatives de performance (métriques)
23Résume des techniques dévaluation (2)
- Stratégies de répondants (Respondant Strategies)
- Études de jugement
- Exemple inspection dutilisabilité (usability
inspection) ou expert review - Fait par des experts ou concepteurs, sans
utilisateurs - Exemples évaluation heuristique (heuristic
evaluation) - Utilise un ensemble de directives de conceptions
ou de règles (heuristiques) (exemple les
heuristique de Nielsen) - Exemple cognitive walkthrough
- Exemple démonstrations
- Sondages (Surveys)
- Exemples questionnaires, entrevues
- Stratégise théoriques (Theoretical Strategies)
- Théories formelles
- Involves a model of the user, the system, and
interaction between the two - Exemples loi de Fitts, loi de Hick-Hyman, KLM,
GOMS, etc. - Simulations à lordinateur
- Simuler un modèle
24Compromis (Tradeoffs)
A Généralizable (validité externe)B Précis
(validité interne (?))C Réaliste (validité
écologique)
25Controlled Experiments
26Controlled Experiments
- Method
- Manipulate independent variables, system
characteristics - Control for other variables (hold them constant)
- Measure dependent variables, user behaviour
- Roles
- Understanding factors influencing interface
quality - Determining which conditions or which interface
is best
27Controlled Experiments
- Advantages
- Strong statements about causality (good internal
validity) - Many experimental designs suitable for varying
situations - Disadvantages
- Requires time, planning, may be expensive
- Complex designs (more than 3 or 4 independent
variables) are often difficult to interpret - Often lack external validity and especially
ecological validity
28Examples
- Of 3 interfaces, A, B, C, which enables fastest
performance at a given task? - Does prozac have an effect on performance at
tying shoe laces? - How does frequency of advertisements on
television affect voting behaivour? - Can casting a spell on a pair of dice affect what
numbers appear on them?
29Elements of an Experiment
- Population
- Set of all possible subjects / observations
- Sample
- Subset of the population chosen for study a set
of subjects / observations - Subjects
- People/users under study. The more politically
correct term within HCI is participants. - Observations / Dependent variable(s)
- Individual data points that are
measured/collected/recorded - E.g. time to complete a task, errors, etc.
- Condition / Treatment / Independent variables(s)
- Something done to the samples that distinguishes
them(e.g. giving a drug vs placebo, or using
interface A vs B) - Goal of experiment is often to determine whether
the conditions have an effect on observations,
and what the effect is
30Tasks to Design and Run an Experiment
- Design
- Choose independent variables
- Choose dependent variables
- Develop hypothesis
- Choose design paradigm
- Choose control procedures
- Choose a sample size
- Pilot experiment
- Often more exploratory, varying a greater number
of variables to get a feel for where the
effect(s) might be - Run experiment
- Focuses in on the suspected effect tries to
gather lots of data under key or optimal
conditions to result in a strong conclusion - Analyze data
- Using statistical tests such as ANOVA
- Interpret results
31The Problem Effectiveness of New Method of
Source Code Presentation
- Source code appearance makes inadequate use of
capabilities of digital typography - Potential to make code more readable, more
comprehensible with new and enhanced
presentation format - See book by Baecker and Marcus, Human Factors and
Typography for More Readable Programs,
Addison-Wesley, 1990 - On following slides, bullet points that refer to
an experimental study of our new presentation
format indicated by
32Conventional Presentation
33New Presentation
34Independent Variables
- The variable manipulated by the experimenter
- Also known as factor or treatment
- Experiment may involve one or many independent
variables - Each independent variable
- Has 2 or more levels (i.e. values)
- May be metric (continuous, like the length of a
menu) or categorical (discrete, like mouse vs.
trackball, or a Likert scale) - In our example just one independent variable,
with two levels new typesetting format or
traditional presentation format
35Dependent Variables
- Definition
- Variable measured by experimenter
- Variable which may depend on the independent
variables - Relationship is not necessarily causal e.g. may
only be correlated - Examples
- Accuracy, or number of errors
- Number of subtasks completed in a given time
period - Time to complete each task
- In our example, ability to comprehend program
as measured by of questions answered in given
time
36Hypotheses
- Statement, to be tested, of relationship between
independent and dependent variables - The null hypothesis is that the independent
variables have no effect on the dependent
variables - Hypothesis in our example reading
comprehension as defined above is improved by new
method of source code presentation
37Experimental Design Paradigms
- Between subjects or within subjects
manipulation(entre participants vs à travers
tous les participants) - Example designs with one independent variable
- Between subjects (randomized group) design
- One independent variable with 2 or more levels
- Subjects randomly assigned to groups
- Each subject tested under only 1 condition
- Within subject (repeated measures) design
- One independent variable with 2 or more levels
- Each subject tested under all conditions
- Order of conditions randomized or counterbalanced
(why?) - In our example, within subjects chosen with two
conditions, i.e., two sample programs
38Control Procedures
- Goal is to eliminate confound hypothesis, i.e.,
that there are alternative explanation(s) for the
observed effect(s) - To do this Make sure there are no systematic
differences between conditions other than the
independent variable - In our example, ensure that two sample
programs are identical in length, complexity,
difficulty
39What To Control
- Subject characteristics
- Gender, handedness, etc.
- Ability
- Experience
- Task variables
- Instructions
- Materials used
- Environmental variables
- Setting
- Noise, light, etc.
- Order effects
- Practice
- Fatigue
40How to Control
- Hold constant
- Use males only, or students from same class
only - Novices only
- Randomize
- Subjects to groups
- Counterbalance
- Half (chosen randomly) get new presentation
format first
41Sample Size Selection
- More subjects --gt more confidence in results.
i.e., greater statistical significance - But this can be very expensive
- Many methods to reduce the required number of
subjects - Most HCI experiments 4 to 25 subjects per group
- In our example, 44 subjects chosen from an 3rd
year programming course
42Designing and Running the Experiment and
Collecting the Data
- Run pilot studies
- Check experimental design
- Test and improve
- Task definition
- Experimental materials (often the most difficult)
- Instructions
- Practice tasks
- Develop experimenter skills
- Identify and deal with special problems
- Run actual experiment
- Record data
- Observe behaviour
43 The Presentation Format Experiment
- Within-subjects design, 44 subjects from 3rd year
programming course - Two similar short C programs, roughly 200 lines
of code, 4 to 5 pages - 40 minutes to skim first program and attempt to
answer 18 questions, half in familiar format and
half in new format - Then each group given other program in other
format
44Data Analysis and Hypothesis Testing
- Describe data
- Descriptive statistics (means, medians, standard
deviations) - Graphs and tables
- Perform statistical analysis of results
- Are results due to chance? (That is, with what
probability) - In our example, mean percentage of correct
answers with new format 44, with conventional
format 35 - Analysis of variance showed that effect of
presentation format in increasing program
readability was significant, F(1,42)18.25,
plt0.0001.
45ANOVA
- Analysis of Variance
- A statistical test that compares the
distributions of multiple samples, and determines
the probability that differences in the
distributions are due to chance - In other words, it determines the probability
that the null hypothesis is correct - If probability is below 0.05 (i.e. 5 ), then we
reject the null hypothesis, and we say that we
have a (statistically) significant result - Why 0.05 ? Dangers of using this value ?
46Techniques for Making Experiment more Powerful
(i.e. able to detect effects)
- Reduce noise (i.e. reduce variance)
- Increase sample size
- Control for confounding variables
- E.g. psychologists often use in-bred rats for
experiments ! - Increase the magnitude of the effect
- E.g. give a larger dosage of the drug
47Uses of Controlled Experiments within HCI
- Evaluate or compare existing systems/features/inte
rfaces - Discover and test useful scientific principles
- Examples ?
- Establish benchmarks/standards/guidelines
- Examples ?