L - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

L

Description:

N.B.: Dans ces diapos, ' BGBG ' r f re la 2e dition du livre ' ... Can evaluate scenarios, sketches, ... Changer un aspect de l'environnement et ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 48
Provided by: mjm89
Category:
Tags: changer

less

Transcript and Presenter's Notes

Title: L


1
Lévaluation des interfaces utilisateurs
  • N.B. Dans ces diapos,  BGBG  réfère à la 2e
    édition du livre  Human-Computer Interaction 
    de Baecker, Grudin, Buxton et Greenberg (1995)

2
Formative vs Summative Evaluation
  • Formative evaluation (Évaluation formative)
  • Happens throughout the design process
  • Can evaluate scenarios, sketches, models,
    prototypes
  • Summative evaluation (Évaluation
    sommative/récapitulative)
  • Typically happens at the end
  • Assesses system andinterface design
    quality,i.e., how well have we done?

3
Analytic vs Empirical Evaluations (BGBG pp.
228-229)
  • Analytic Evaluations (Évaluations analytiques)
  • Do not involve actual users
  • Focus is on why things happen the way they
    do,and on the components of the system
  • Produce interpretations and suggestions, not
    solid facts
  • Better for formative evaluation than summative
    evaluation
  • Can be used early in design process,before any
    high-fidelity prototype exists
  • Examples heuristic evaluation, walkthrough,
    claims analysis
  • Empirical Evaluations (Évaluations empiriques)
  • Involve actual users
  • Focus is on what actually happens in practice
  • Produce factual measurements and observations
  • Good for summative evaluation,but may not
    clearly point to what changes to make
  • Can produce a lot of data that is laborious to
    analyze
  • Examples experiments, usability testing, field
    studies

4
Empirical EvaluationNaturalistic Observation vs
True Experiments(Example Ray and Ravizza 1985)
5
Empirical Evaluation User Testing
  • Design and implement scenario or prototype
  • Record user behaviour
  • Typical usage, or critical incidents
  • Keystroke and mouse event recording
  • Thinking aloud protocols
  • Audio or video recording
  • Collect subjective impressions(questionnaire,
    interview)
  • Analyze recordings of user behaviour

6
Typical Steps in User Testing (Gomoll, in Laurel,
85-90)
  • Set up the observation
  • Describe the purpose of the study, and how the
    data collected will be used
  • Tell the user (verbally and on paper) that it's
    OK to quit at any time
  • Ask participant if they are willing to sign form
    to give their permission to begin
  • Pre-questionnaire (name, age, handedness,
    background, education, experience with computers,
    etc.)
  • Talk about and demonstrate the equipment
  • Explain how to think aloud
  • Explain that you will not provide help
  • Describe the task and introduce the system
  • Ask if there are questions before you start then
    begin observation
  • Post-questionnaire and/or interview to solicit
    opinions, impressions, etc.
  • Conclude the observation and debrief participants
  • Transcribe, tabulate the data and results
  • Analyze, interpret the results

7
User Testing (BGBG, Fig. 2.8, p. 85, adapted from
Neilsen, 1992)
  • Practical study design
  • Reflect on the participants backgrounds and how
    they might affect the study
  • Be aware of problems that arise when
    experimenters know the users personally
  • Prepare for the study carefully (avoid last
    minute panic)
  • Select the tasks carefully to be representative
    and to fit the allotted time
  • In general, start with an easier (but not
    frivolous) task
  • Write down features of system not being tested as
    well as those that are!
  • Define the start-up state for the study precisely
  • Define precise rules for when and how users can
    be helped during the study
  • Plan timing and cut-off procedure (if subject
    gets stuck) for each part of study
  • Include provisions for data collection (e.g.,
    audio, video, or keystroke capture)
  • Plan data analysis techniques in advance
  • Carry out an initial pilot study to test your
    protocol
  • Written materials
  • Participant release (permission) form
  • Pre-questionnaire covering prior experience etc.
  • Introduction to the study for users, including
    scenario of use,and description of tasks
  • Checklist for experimenters, and paper for
    note-taking
  • Post-questionnaire or survey

8
User Testing (BGBG, Fig. 2.8, p. 85, adapted from
Neilsen, 1992)
  • Carrying out the study
  • Let users know that complete anonymity will be
    preserved
  • Let them know that they may quit at any time
  • Stress that the system is being tested, not the
    participant
  • Note participant is the more modern term for
    subjet
  • Indicate that you are only interested in their
    thoughts relevant to the system
  • Demonstrate the thinking-aloud method by acting
    it out for a simple task, e.g., figuring out how
    to load a stapler
  • Hand out instructions for each part of the study
    individually, not all at once
  • Maintain a relaxed environment free of
    interruptions
  • Occasionally encourage users to talk if they grow
    silent
  • If users ask questions, try to get them to talk
    (e.g., What do you think is going on? and
    follow predefined rules on when to help or
    interrupt to help.
  • Debrief each user after the experiment

9
Thinking Aloud
  • Attempt to elicit thought processes of
    participant, thereby yielding valuable insights
    (although process is slowed down and may be
    changed)
  • Participant talking while they are doing
  • Problems they are having
  • Solutions they are considering
  • Why they are having trouble
  • Insights that they have
  • Wishes that they have
  • Co-Discovery Pairs of participants conversing
    (Co-Discovery Learning, Kennedy paper in BGBG,
    pp. 182-185)

10
Data Capture and Analysis
  • Keystrokemouse logging
  • Record precise user behaviour
  • Record times to carry out actions
  • Record user errors
  • Observation and note taking by observers,especial
    ly of user problems and critical incidents
  • Best if note taking done by a 2nd observer
  • Audio and video recordings
  • Can't observe and record all behaviour in
    real-time
  • Preserve behaviour for review (even non-verbal
    behaviour)
  • Can produce a lot of data ?

11
Asking Users in Addition to Observing Them
  • Methods
  • (Post-) Questionnaire design
  • Formulating asking questions, analyzing
    answers
  • Hard to avoid bias in the phrasing of questions
  • Therefore requires pre-testing (pilot testing)
  • Surveys (Sondages) (possibly large-scale)
    administration of questionnaires to appropriate
    samples of individuals chosen from a population
  • Administration of questions through interviews

12
Ethical Issues
  • Basic principles
  • Do no harm
  • Voluntary participation
  • Informed consent
  • Right to privacy
  • Use of research protocols and consent forms
  • Explanation of study and purpose
  • Anonymity
  • Ability to withdraw at any time
  • For example, see p. 256 of Rosson Carroll

13
Une taxonomie de plusieurs techniques
dévaluation
14
Taxonomie de McGrath
(discret)
(intrus, dérangeant)
15
Quadrant 1 Field Strategies
  • Study systems in real use on real tasks in real
    work environments, i.e., observe under settings
    with conditions as natural as possible
  • Field studies Study systems in situ, disturbing
    as little as possible, e.g., with ethnography,
    contextual inquiry
  • Field experiments Observe impact of changing
    (ideally) one aspect of a work environment, e.g.,
    in beta testing, studies of technological change
    and new technology introduction

16
Quadrant 2 Experimental Strategies
  • Study systems in a lab under controlled
    conditions, i.e., conditions concocted for
    research purposes
  • Laboratory experiments Carry out controlled
    experiments studying impacts of (ideally) one (or
    two) interface parameter(s)
  • Experimental simulations Create in lab for
    experimental purposes a real system that is used
    by real users on (usually) artificially
    simplified tasks, e.g., user testing, usability
    engineering

17
Quadrant 3 Respondent Strategies
  • Ask informants to tell us something about
    themselves and/or their work or about an
    interface, i.e., where the setting in which
    questions are asked plays no role
  • Judgment studies Ask respondents about an
    interface, e.g., in a demonstration, or with
    usability inspection
  • Sample surveys Ask respondents about themselves
    and/or their work, e.g., with questionnaires,
    surveys, interviews

18
Usability Inspection (a Respodant strategy)
  • Methods
  • Heuristic evaluation Judgments by a panel of
    evaluators (e.g, 3 to 5) of the degree to which
    an interface satisfies a set of usability
    guidelines, followed by discussion and analysis
  • Cognitive walkthroughs
  • Roles
  • Evaluation without users (contrast to usability
    tests, etc.)
  • Elicit expert opinions about the users model,
    functionality, look feel, etc.

19
Usability Inspection (contd)
  • Advantages
  • Structured method of using accumulated wisdom of
    experts
  • Disadvantages
  • Doesnt take advantage of real insights from real
    users
  • Example Heuristic evaluation with 10 usability
    guidelines (Nielsen, BGBG, Fig. 2.7, p. 83)
  • Visibility of system status
  • Match between system and the real world
  • User control and freedom
  • Consistency and standards
  • Error prevention
  • Recognition rather than recall
  • Flexibility and efficiency of ue
  • Aesthetic and minimalist design
  • Help users recognize, diagnose, and recover from
    errors
  • Help and documentation

20
Demonstrations (a Respodant strategy)
  • Demonstrate system to
  • Any random person
  • Management, potential investors, journalists
  • Potential customers
  • Potential users
  • Potential business partners
  • Take detailed notes
  • Elicit reactions to user's model, functionality,
    interface
  • Advantages
  • Get feedback early in prototype or system
    construction
  • You're going to have to give demos anyway why
    not learn from them?
  • Disadvantages
  • System still rough, which introduces noise into
    process

21
Quadrant 4 Theoretical Strategies
  • Ask a theory to tell us something about people's
    work and/or about an interface, i.e., no
    observation of behaviour, experiments, or
    questions are required
  • Formal theory Use a qualitative theory or some
    equations, e.g., behavioural theory, such as
    colour vision or Fitts Law
  • Computer simulation Use and run a computer
    model, e.g., human information processing theory

22
Résumé des techniques dévaluation
  • Stratégies sur le terrain (Field Strategies)
  • Études sur le terrain (Field Studies)
  • Observer processus in situ, en changeant le
    système le moins possible
  • Exemples études ethnographiques, enquêtes
    contextuelles (contextual inquiry) (BGBG pages
    42, 46) (pas nécessaire à
    savoir pour lexamen)
  • Expérimentations sur le terrain (Field
    Experiments)
  • Changer un aspect de lenvironnement et observer
    les effets
  • Stratégies expérimentales (Experimental
    Strategies)
  • Expérimentations de laboratoire (Laboratory
    Experiments / Controlled Experiments)
  • Varier ou manipuler, de façon précise, une ou
    plusieurs variables indépendentes
  • Mesurer de façon précise, une ou plusieurs
    variables dépendentes
  • Essayer de contrôler soigneusement les conditions
  • Simulation expérimentale
  • Créer un système réel, dans un laboratoire, pour
    des utilisateurs réels
  • Exemples
  • Tests dutilisabilité / tests dutilisateurs
  • Emploi souvent un protocol e de penser à haute
    voix et/ou une phase de découverte où
    lutilisateur explore linterface emploi souvent
    aussi des questionnaires et/ou entrevues
  • Génie dutilisabilité (Usability engineering)
  • Plus formel que les tests dutilisabilité
  • Mesures quantitatives de performance (métriques)

23
Résume des techniques dévaluation (2)
  • Stratégies de répondants (Respondant Strategies)
  • Études de jugement
  • Exemple inspection dutilisabilité (usability
    inspection) ou expert review
  • Fait par des experts ou concepteurs, sans
    utilisateurs
  • Exemples évaluation heuristique (heuristic
    evaluation)
  • Utilise un ensemble de directives de conceptions
    ou de règles (heuristiques) (exemple les
    heuristique de Nielsen)
  • Exemple cognitive walkthrough
  • Exemple démonstrations
  • Sondages (Surveys)
  • Exemples questionnaires, entrevues
  • Stratégise théoriques (Theoretical Strategies)
  • Théories formelles
  • Involves a model of the user, the system, and
    interaction between the two
  • Exemples loi de Fitts, loi de Hick-Hyman, KLM,
    GOMS, etc.
  • Simulations à lordinateur
  • Simuler un modèle

24
Compromis (Tradeoffs)
A Généralizable (validité externe)B Précis
(validité interne (?))C Réaliste (validité
écologique)
25
Controlled Experiments
26
Controlled Experiments
  • Method
  • Manipulate independent variables, system
    characteristics
  • Control for other variables (hold them constant)
  • Measure dependent variables, user behaviour
  • Roles
  • Understanding factors influencing interface
    quality
  • Determining which conditions or which interface
    is best

27
Controlled Experiments
  • Advantages
  • Strong statements about causality (good internal
    validity)
  • Many experimental designs suitable for varying
    situations
  • Disadvantages
  • Requires time, planning, may be expensive
  • Complex designs (more than 3 or 4 independent
    variables) are often difficult to interpret
  • Often lack external validity and especially
    ecological validity

28
Examples
  • Of 3 interfaces, A, B, C, which enables fastest
    performance at a given task?
  • Does prozac have an effect on performance at
    tying shoe laces?
  • How does frequency of advertisements on
    television affect voting behaivour?
  • Can casting a spell on a pair of dice affect what
    numbers appear on them?

29
Elements of an Experiment
  • Population
  • Set of all possible subjects / observations
  • Sample
  • Subset of the population chosen for study a set
    of subjects / observations
  • Subjects
  • People/users under study. The more politically
    correct term within HCI is participants.
  • Observations / Dependent variable(s)
  • Individual data points that are
    measured/collected/recorded
  • E.g. time to complete a task, errors, etc.
  • Condition / Treatment / Independent variables(s)
  • Something done to the samples that distinguishes
    them(e.g. giving a drug vs placebo, or using
    interface A vs B)
  • Goal of experiment is often to determine whether
    the conditions have an effect on observations,
    and what the effect is

30
Tasks to Design and Run an Experiment
  • Design
  • Choose independent variables
  • Choose dependent variables
  • Develop hypothesis
  • Choose design paradigm
  • Choose control procedures
  • Choose a sample size
  • Pilot experiment
  • Often more exploratory, varying a greater number
    of variables to get a feel for where the
    effect(s) might be
  • Run experiment
  • Focuses in on the suspected effect tries to
    gather lots of data under key or optimal
    conditions to result in a strong conclusion
  • Analyze data
  • Using statistical tests such as ANOVA
  • Interpret results

31
The Problem Effectiveness of New Method of
Source Code Presentation
  • Source code appearance makes inadequate use of
    capabilities of digital typography
  • Potential to make code more readable, more
    comprehensible with new and enhanced
    presentation format
  • See book by Baecker and Marcus, Human Factors and
    Typography for More Readable Programs,
    Addison-Wesley, 1990
  • On following slides, bullet points that refer to
    an experimental study of our new presentation
    format indicated by

32
Conventional Presentation
33
New Presentation
34
Independent Variables
  • The variable manipulated by the experimenter
  • Also known as factor or treatment
  • Experiment may involve one or many independent
    variables
  • Each independent variable
  • Has 2 or more levels (i.e. values)
  • May be metric (continuous, like the length of a
    menu) or categorical (discrete, like mouse vs.
    trackball, or a Likert scale)
  • In our example just one independent variable,
    with two levels new typesetting format or
    traditional presentation format

35
Dependent Variables
  • Definition
  • Variable measured by experimenter
  • Variable which may depend on the independent
    variables
  • Relationship is not necessarily causal e.g. may
    only be correlated
  • Examples
  • Accuracy, or number of errors
  • Number of subtasks completed in a given time
    period
  • Time to complete each task
  • In our example, ability to comprehend program
    as measured by of questions answered in given
    time

36
Hypotheses
  • Statement, to be tested, of relationship between
    independent and dependent variables
  • The null hypothesis is that the independent
    variables have no effect on the dependent
    variables
  • Hypothesis in our example reading
    comprehension as defined above is improved by new
    method of source code presentation

37
Experimental Design Paradigms
  • Between subjects or within subjects
    manipulation(entre participants vs à travers
    tous les participants)
  • Example designs with one independent variable
  • Between subjects (randomized group) design
  • One independent variable with 2 or more levels
  • Subjects randomly assigned to groups
  • Each subject tested under only 1 condition
  • Within subject (repeated measures) design
  • One independent variable with 2 or more levels
  • Each subject tested under all conditions
  • Order of conditions randomized or counterbalanced
    (why?)
  • In our example, within subjects chosen with two
    conditions, i.e., two sample programs

38
Control Procedures
  • Goal is to eliminate confound hypothesis, i.e.,
    that there are alternative explanation(s) for the
    observed effect(s)
  • To do this Make sure there are no systematic
    differences between conditions other than the
    independent variable
  • In our example, ensure that two sample
    programs are identical in length, complexity,
    difficulty

39
What To Control
  • Subject characteristics
  • Gender, handedness, etc.
  • Ability
  • Experience
  • Task variables
  • Instructions
  • Materials used
  • Environmental variables
  • Setting
  • Noise, light, etc.
  • Order effects
  • Practice
  • Fatigue

40
How to Control
  • Hold constant
  • Use males only, or students from same class
    only
  • Novices only
  • Randomize
  • Subjects to groups
  • Counterbalance
  • Half (chosen randomly) get new presentation
    format first

41
Sample Size Selection
  • More subjects --gt more confidence in results.
    i.e., greater statistical significance
  • But this can be very expensive
  • Many methods to reduce the required number of
    subjects
  • Most HCI experiments 4 to 25 subjects per group
  • In our example, 44 subjects chosen from an 3rd
    year programming course

42
Designing and Running the Experiment and
Collecting the Data
  • Run pilot studies
  • Check experimental design
  • Test and improve
  • Task definition
  • Experimental materials (often the most difficult)
  • Instructions
  • Practice tasks
  • Develop experimenter skills
  • Identify and deal with special problems
  • Run actual experiment
  • Record data
  • Observe behaviour

43
The Presentation Format Experiment
  • Within-subjects design, 44 subjects from 3rd year
    programming course
  • Two similar short C programs, roughly 200 lines
    of code, 4 to 5 pages
  • 40 minutes to skim first program and attempt to
    answer 18 questions, half in familiar format and
    half in new format
  • Then each group given other program in other
    format

44
Data Analysis and Hypothesis Testing
  • Describe data
  • Descriptive statistics (means, medians, standard
    deviations)
  • Graphs and tables
  • Perform statistical analysis of results
  • Are results due to chance? (That is, with what
    probability)
  • In our example, mean percentage of correct
    answers with new format 44, with conventional
    format 35
  • Analysis of variance showed that effect of
    presentation format in increasing program
    readability was significant, F(1,42)18.25,
    plt0.0001.

45
ANOVA
  • Analysis of Variance
  • A statistical test that compares the
    distributions of multiple samples, and determines
    the probability that differences in the
    distributions are due to chance
  • In other words, it determines the probability
    that the null hypothesis is correct
  • If probability is below 0.05 (i.e. 5 ), then we
    reject the null hypothesis, and we say that we
    have a (statistically) significant result
  • Why 0.05 ? Dangers of using this value ?

46
Techniques for Making Experiment more Powerful
(i.e. able to detect effects)
  • Reduce noise (i.e. reduce variance)
  • Increase sample size
  • Control for confounding variables
  • E.g. psychologists often use in-bred rats for
    experiments !
  • Increase the magnitude of the effect
  • E.g. give a larger dosage of the drug

47
Uses of Controlled Experiments within HCI
  • Evaluate or compare existing systems/features/inte
    rfaces
  • Discover and test useful scientific principles
  • Examples ?
  • Establish benchmarks/standards/guidelines
  • Examples ?
Write a Comment
User Comments (0)
About PowerShow.com