Title: Usability Evaluation
1Usability Evaluation
- Dr. Yan Liu
- Department of Biomedical, Industrial and Human
Factors Engineering - Wright State University
2Introduction
- What is Usability Evaluation
- Assess the extent to which the product can be
used by specified users to achieve specified
goals with effectiveness, efficiency, and
satisfaction in a specified context of use - Usability Evaluation in Design Process
- Should occur throughout the design life cycle,
with the results of the evaluation feeding back
into modification to the design - Usability Evaluation Methods
- Based on expert evaluation, without direct user
involvement - Particularly useful for assessing early designs
and prototypes - Involve users to study actual use of the system
- Usually require a working prototype or
implementation - Users may also be involved in assessing early
design ideas - e.g. focus groups in which a group people are
asked about their opinions of the system
3Goals of Usability Evaluation
- Assess Systems Functionality
- The systems functionality must accord with the
users requirements - Making the appropriate functionality available
within the system - Making the functionality clearly reachable by the
user in terms of the actions that the user needs
to take to perform the tasks - Assess Users Experience with the Interaction
- Aspects such as how easy the system is to learn
and use and the users satisfaction with it - Users enjoyment and emotional response
(particularly in systems aimed at entertainment) - Identify Specific Problems with the Design
- Aspects of the design which, when used in their
intended context, cause unexpected results or
confusion among users
4Evaluation Though Expert Analysis
- Overview
- The basic intention is to identify any areas that
are likely to cause difficulties because they
violate known design rules or ignore accepted
empirical results - Flexible and can be used at any stage in the
development process - Design specification, storyboards and prototypes,
full implementations - Relatively cheap
- Do not assess actual use of the system
- Approaches
- Cognitive walkthrough
- Heuristic evaluation
- Use of models
- Use of previous work
5Cognitive Walkthrough
- Overview
- Proposed in Polson et al. (1992)
- Main focus is usually to establish how easy a
system is to support exploratory learning - Phase One
- Collect Information about the System, Users, and
Tasks - A fairly detailed specification or prototype of
the system - An indication of who the users are and what kind
of experience and knowledge the evaluators can
assume about them - A description of the representative tasks most
users would want to perform on the system - A complete, written list of actions needed to
complete the tasks with the proposed system
6Cognitive Walkthrough
- Phase Two
- The evaluators step through the action sequence
identified earlier to critique the system and
tell a believable story about its usability by
asking themselves a set of questions for each
step - Q1 Is the effect of the action the same as the
users goal at that point? - e.g. If the effect of an action is to save a
document, is saving a document what the user
wants to do? - Q2 Will the user see that the action is
available? (visibility of the action) - e.g. Will the user see the control that is used
to save a document? - Q3 Once the user has found the correct action,
will he/she know it is the one he/she needs?
(meaning and effect of the action) - e.g. Even if the user can see the control, will
the user recognize that it is the one he/she is
looking for to complete the task? - Q4 After the action is taken, will the user
understand the feedback he/she gets? - Appropriate feedback should be provided to inform
the user of what has happened
7Cognitive Walkthrough
- Phase Two (Cont.)
- Document the cognitive walkthrough to keep a
record of the evaluators evaluation results - Pros and cons of the system
- It is a good idea to produce some standard
evaluation forms for the walkthrough - The cover form would list the four questions
asked during the walkthrough process, as well as
the dates and time of the walkthrough and the
names of the evaluators - For each action, a separate standard form is
filled out that answers each of the four
questions - Any negative answer for a particular action
should be documented in detail on a separate
usability problem report sheet, including the
severity of the problem
8Suppose we are designing a remote control for a
video recorder (VCR) and interested in the task
of programming the VCR to do timed recordings.
Our initial design is shown in the following
figures. This VCR allows the user to program up
to three timed recordings in different streams.
The next available stream number is automatically
assigned. We want to evaluate the design using
the cognitive walkthrough method.
After the Time-Record Button Has Been Pressed
9- Collect information about the system, users, and
tasks - We can assume that the user is familiar with
VCRs but not with this particular design - Identify a representative task programming the
video to time record a program starting at 1800
and finishing at 1915 on channel 4 on Oct. 16,
2008 - Specify the action sequence for the task in
terms of the users action (UA) and the systems
display or response (SD) - UA1 Press the time-record button
- SD1 Display moves to timer mode. Flashing
cursor appears after Start - UA2 Press digits 1 8 0 0
- SD2 Each digit is displayed as typed and
flashing cursor moves to next position - UA3 Press the time-record button
- SD3 Flashing cursor moves to after End
- UA4 Press digits 1 9 1 5
- SD4 Each digit is displayed as typed and
flashing cursor moves to next position - UA5 Press the time-record button
- SD5 Flashing cursor moves to after Channel
- US6 Press digit 4
- SD6 Digit is displayed as typed and flashing
cursor moves to next position - UA7 Press the time-record button
- SD7 Flashing cursor moves to after Date
- US8 Press digits 16 10 0 8
After the Time-Record Button Has Been Pressed
10- Step through the action sequence and for each
action, we must answer the four questions and
tell a story about the usability of the system. - UA1 Press the time-record button
- Q1 Is the effect of the action the same as the
users goal at that point? - The time-record button initiates timer
programming. It is reasonable to assume that a
user who is familiar with VCRs would be trying to
do this as his/her first goal - Q2 Will the user see that the action is
available? - The time-record button is visible on the
remote control - Q3 Once the user has found the correct action,
will he/she know it is the one he/she need? - It is not clear which button is the
time-record button. The icon of clock is a
possible candidate but this could be interpreted
as a button to change the time. Other possible
candidates might be the button with a filled
circle or the button at the leftmost of the 4th
row. The correct choice is the icon of clock, but
it is quite possible that the user could fail at
this point. This identifies a potential usability
problem - Q4 After the action is taken, will the user
understand the feedback he/she gets? - Once the action is taken, the display changes to
the time-record mode and shows familiar headings
(Start, End, Channel, and Date). Therefore, it is
reasonable to assume the user would recognize
these as indicating successful completion of the
first action
We have found a potential usability problem
regarding recognizing the time-record button.
Therefore, we may have to check whether our
target user group could correctly distinguish the
time-record button from others on the remote
control.
The same procedure is followed for each action in
the action sequence .
11Heuristic Evaluation
- Overview
- Proposed in Molich Nielsen (1990)
- A method for structuring the critique of a system
using a set of relatively simple and general
heuristics - Heuristics are guidelines, general principles, or
rules of thumb that can guide a design decision
or be used to critique a decision that has been
made - A flexible and cheap approach
- Can be performed on a design specification for
evaluating early design, storyboards and
prototypes, and fully functioning systems - Often considered as a discount usability
technique - The general idea is that several evaluators
independently critique a system to come up with
potential usability problems - Between three and five evaluators is sufficient,
with five usually resulting in about 75 of the
overall usability problems being discovered
12Heuristic Evaluation
- Nielsens Ten Heuristics (Nielsen, 1994)
- A set of ten heuristics are provided to aid the
evaluators in discovering usability problems - Related to design principles and guidelines and
can be supplemented where required by heuristics
that are specific to the particular domain - Each evaluator assesses the system and notes
violations of any of these heuristics that would
indicate a potential usability problem - Each evaluator assesses the severity of each
usability problem, based on four factors - How common the problem is
- How easy it is for the user to overcome
- Whether it will be a one-off problem or a
persistent one - How seriously the problem will be perceived
- Once each evaluator has completed his/her
separate assessment, all the problems are
collected and the mean severity ratings are
calculated to help the designers to determine the
most important problems
13Overall severity rating on a scale of 0 4 in
heuristic evaluation
0 I dont agree that this is a usability
problem at all 1 Cosmetic problem only need
not be fixed unless extra time is available on
the project 2 Minor usability problem fixing
this should be given low priority 3 Major
usability problem important to fix, so should be
given high priority 4 Usability catastrophe
imperative to fix this before product can be
released
14Heuristic Evaluation
- Nielsens Ten Heuristics (Cont.)
- Heuristic 1. Visibility of system status
- Always keep users informed about what is going
on, through appropriate feedback within
reasonable time - If a system will take some time, give an
indication of how long and how much is complete - Heuristic 2. Match between the system and real
world - The system should speak the users' language, with
words, phrases and concepts familiar to the user,
rather than system-oriented terms - Follow real-world conventions, making information
appear in a natural and logical order - Heuristic 3. User control and freedom
- Users often choose system functions by mistake
and will need a clearly marked "emergency exit"
to leave the unwanted state without having to go
through an extended dialogue - Support undo and redo
15Heuristic Evaluation
- Nielsens Ten Heuristics (Cont.)
- Heuristic 4. Consistency and standards
- Users should not have to wonder whether different
words, situations, or actions mean the same thing
- Follow platform conventions and accepted
standards - Heuristic 5. Error prevention
- Make it difficult to make error
- Even better than good error messages is a careful
design which prevents a problem from occurring in
the first place - Either eliminate error-prone conditions or check
for them and present users with a confirmation
option before they commit to the action - Heuristic 6. Recognition rather than recall
- Minimize the user's memory load by making
objects, actions, and options visible - The user should not have to remember information
from one part of the dialogue to another - Instructions for use of the system should be
visible or easily retrievable whenever
appropriate
16Heuristic Evaluation
- Nielsens Ten Heuristics (Cont.)
- Heuristic 7. Flexibility and efficiency of use
- Allow users to tailor frequent actions
- Accelerators -- unseen by the novice user -- may
often speed up the interaction for the expert
user such that the system can cater to both
experienced and inexperienced users - Heuristic 8. Aesthetic and minimalist design
- Dialogs should not contain information which is
irrelevant or rarely needed - Every extra unit of information in a dialogue
competes with the relevant units of information
and diminishes their relative visibility - Heuristic 9. Help users recognize, diagnose, and
recover from errors - Error messages should be expressed in plain
language (no codes), precisely indicate the
problem, and constructively suggest a solution - Heuristic 10. Help and documentation
- Even though it is better if the system can be
used without documentation, it may be necessary
to provide help and documentation - Any such information should be easy to search,
focused on the user's task, list concrete steps
to be carried out, and not be too large
17Model-Based Approach
- Cognitive and Design Models
- Provide a means of combining design specification
and evaluation into the same framework - GOMS (goals, operators, methods and selection)
model - Predicts user performance with a particular
interface and can be used to filter particular
design options - Low-level modeling techniques
- Provides predictions of the time users will take
to perform low-level physical tasks - e.g. Keystroke-level model
18Use of Previous Studies
- Experimental Results and Empirical Evidence from
Previous Studies - Can be used to support or refute some aspects of
the design - Some are specific to particular domains, but many
deal with more generic issues and can be applied
in a variety of situations - Review Previous Studies Carefully
- Experimental design, participants, data analyses,
and assumptions - e.g. An experiment testing the usability of a
particular style of help system using novice
participants may not be applicable to the
evaluation of a help system designed for expert
users
19Evaluation Though User Participation
- Overview
- User participation in evaluation tends to occur
in the later stages of development - Tested on a working prototype
- Range from a simulation of the systems
interactive capabilities without its underlying
functionality, a basic functional prototype, to a
fully implemented system - Observing and surveying users can contribute to
earlier design stages - Design specification and requirement capture
- Evaluation Styles
- Laboratory studies
- Field studies
20Evaluation Though User Participation
- Laboratory Studies
- Users are taken out of their normal work
environment to take part in controlled tests
(often in a specialist usability laboratory) - Advantages
- Allow manipulation of the situation in order to
uncover problems or observe less used procedures - Allow comparison of alternative designs with a
controlled context which reduces ambiguity in
interpretation of results regarding cause and
effect - The possible influence of other extraneous
factors is reduced - Laboratory observation is the only option in some
situations - e.g. The system is located in a dangerous or
remote location (such as space station)
21Evaluation Though User Participation
- Laboratory Studies (Cont.)
- Disadvantages
- Artificiality of laboratory experiments
- A well-equipped usability laboratory may contain
sophisticated equipment (e.g. audio/ visual
recording and analysis facilities) that cannot be
replicated in the work environment - Participants usually operate in an
interruption-free environment in the laboratory
setting, which is seldom the case in the real
world - It is especially difficult to observe several
people cooperating on a task in a laboratory
situation - Interpersonal communication is so heavily
dependent on context
22Evaluation Though User Participation
- Field Studies
- The designer or evaluator goes into the users
work environment in order to observe the system
in action - Advantage
- Users are observed in their natural environment
- Allow observation of interactions between systems
and between individuals that would have been
missed in laboratory studies - Disadvantages
- Lack of control in many aspects of the situation
makes it difficult to draw cause and effect
relationships in interpretation of results - High levels of ambient noise, greater levels of
movement and constant interruptions (e.g. phone
calls) make field observation difficult
23Experimental Evaluation
- Overview
- One of the most powerful methods to compare
alternative designs - Provides empirical evidence to support particular
hypotheses - Important Elements
- A hypothesis to test
- e.g. Icons with naturalistic images are easier to
remember than icons with abstract images - Independent variable(s)
- The variable(s) that will be manipulated by the
experimenter to study its(their) impacts on the
dependent variable - e.g. The type of icons (naturalistic images vs.
abstract images) - Dependent variable(s)
- The variable(s) that will be measured to describe
the outcome of experimental runs - e.g. The number of mistakes made in using the
icons
24Experimental Evaluation
- Important Elements (Cont.)
- Experiment method
- Depends on the available resources and the tasks
performed in the experiment - Between-subject design
- Participants are randomly assigned to the various
conditions so that each participates in only one
group - Within-subject design
- Each participant participants in all conditions
- Mixed design
- A combination of between-subject and
within-subject designs - Research participants
- How to recruit the research participants, their
characteristics, how many, etc.
25Observational Techniques
- Overview
- Gather information about actual use of a system
through observing users interacting with it - Users are usually asked to complete a set of
predetermined tasks - Users may be observed going about their normal
duties if the observation is carried out in their
place of work - Think Aloud
- A form of observation during which the user is
asked to speak loud what he is doing as he is
being observed - Advantages
- Simple, requires little expertise to perform
- Can provide useful insight into problems with an
interface - Can be used for evaluation throughout the design
process - Disadvantages
- The information provided is often subjective and
may be selective, depending on the tasks provided - The very act of describing what himself/herself
is doing often changes the way the user does it
26Observational Techniques
- Cooperative Evaluation
- A more relaxed variation of the think aloud
process - The user is encouraged to see himself/herself as
a collaborator in the evaluation and not simply
as an experimental participant - The evaluator can ask the user questions if
his/her behavior is unclear - The user can ask the evaluator for clarification
if a problem arises - Advantages
- The process is less constrained and therefore
easier to learn to use by the evaluator - The user is encouraged to criticize the system
- The evaluator can clarify points of confusion at
the time they occur and so maximize the
effectiveness of the approach for identifying
problem areas
27Observational Techniques
- Post-Task Walkthrough (Retrospective Recall)
- Transcripts of the participants actions are
played back to the participant who is invited to
comment or directly questioned by the evaluator - Usually done straightaway
- May be done after a delay
- The evaluator has some time to frame suitable
questions and focus on specific incidents - The answers are more likely to be the
participants post hoc interpretation - Useful to identify reasons for actions and
alternatives considered - Necessary in cases where think aloud is not
possible - e.g. during a critical task or when the task is
too intensive
28Observational Techniques
- Protocol Recording
- A protocol refers to the record of an evaluation
session - Paper-and-pencil
- Primitive and cheap
- Allows the evaluator to note interpretations and
extraneous events as they occur - Hard to get detailed information, limited by the
evaluators writing speed - Coding schemes for frequent activities can
improve the rate of recording substantially, but
can take some time to develop - A variation is to use a notebook computer for
direct entry - Limited by the evaluators typing speed
- Loses the flexibility of paper for writing
styles, quick diagrams and spatial layout - A specific note-taker, separate from the
evaluator, is recommended if it is the only
recording facility available
29Observational Techniques
- Protocol Recording (Cont.)
- Audio recording
- Useful if the user is actively thinking aloud
- May be difficult to record sufficient information
to identify exact actions in later analysis - Can be difficult to match an audio recording to
some other form of protocol (e.g. handwritten
script) - Video recording
- Allow us to see what the participant is doing
- Choosing suitable camera positions and viewing
angles to get sufficient detail can be difficult
when the user may move out of the view of the
camera - For single-user computer-based tasks, two video
cameras are typically used - One camera looks at the computer screen (may not
be necessary if the computer system is being
logged) - One camera with a wider focus records the users
face and hands
30Observational Techniques
- Protocol Recording (Cont.)
- Computer logging
- Advantages
- Relatively easy and cheap method to record user
actions at a keystroke level - One of the most popular recording methods that
observe users without interrupting their plans
and actions - Can be used for longitudinal studies where we
observe users over periods of weeks or months - Disadvantages
- Keystroke data only tell us about the
lowest-level actions but not why they are
performed or how they are structured - The sheer volume of data collected can become
unmanageable without automatic analysis
31Observational Techniques
- Protocol Recording (Cont.)
- User notebooks
- The participants are asked to keep logs of their
activities or problems - Records at a very coarse level
- Records every few minutes or hourly
- Especially useful in longitudinal studies and
when we want a log of unusual or infrequent tasks
and problems - Mixture of recording methods
- Different methods can complement one another
- e.g. Keep a paper note of special events as well
as use more sophisticated audio/visual recording - Synchronization problems when using a collection
of different sources
32Observational Techniques
- Automatic Protocol Analysis Tools
- Very important as evaluation tools by offering a
means of handling large volumes of data collected
in observational studies and allowing a
systematic approach to the data analysis - Noldus Observer XT (http//www.noldus.com)
- Select data for analysis
- Visualize data
- Analyze data with different techniques
- Multi-level analysis
- Statistical analysis
- Compare results from different analyses
- Calculate inter-and intrarater reliability
- etc.
33Query Techniques
- Overview
- Directly ask the user about the interface
- Useful in eliciting detail of the users view of
a system - Advantages
- Get the users viewpoint directly
- Reveal issues that have not been considered by
the designer - Relatively simple and cheap to administer
- Disadvantages
- The information gathered is necessarily
subjective - The information may be a rationalized account
of events rather than a wholly accurate one - Difficult to get accurate feedback about
alternative designs if the user has not
experienced them - Provide useful supplementary material to other
methods
34Query Techniques
- Interviews
- A direct and structured way of gathering
information - Advantages
- The level of questions can be varied to suit the
context - Can be effective for high-level evaluation,
particularly in eliciting information about user
preferences, impressions and attitudes - May also reveal problems that have not been
anticipated by the designer or that have not
occurred under observation - The evaluator can probe the user more deeply on
interesting issues as they arise - Interviews should be planned in advance
- Interviews are structured around a set of
prepared central questions - Helps to focus the purpose of the interview
- Ensures a base of consistency between the
interviews of different users
35Query Techniques
- Questionnaires
- Disadvantages
- Less flexible than interviews
- Questions are fixed in advance
- Questions are less probing
- Advantages
- Can reach a wider participant group
- Take less time to administer
- Can be analyzed more rigorously
- Types of questions
- General questions
- Help to establish the background of the user and
his/her place within the user population - e.g. Age, gender, occupation, previous experience
with computers, etc.
36Query Techniques
- Questionnaires
- Types of questions (Cont.)
- Open-ended questions
- Ask the user to provide his/her unprompted
opinion on a question - e.g. Can you suggest any improvements to the
interface? - Useful for gathering subjective information but
difficult to analyze in any rigorous way - May identify errors or make suggestions that have
not been considered by the designer - Ranking
- Ask the user to judge a specific statement on a
numeric scale, usually corresponding to a measure
of agreement or disagreement with the statement - e.g. It is easy to recover from mistake. Disagree
1 2 3 4 5 Agree - Multi-choice
- Offer the user a choice of explicit responses
- The user may select only one response or as many
as apply
37Monitoring Physiological Responses
- Eye Tracking for Usability Evaluation
- Eye movements are believed to reflect the amount
of cognitive processing a display requires and
thus how easy or difficult it is to process - Measuring not only where people look but also
their patterns of eye movement may tell us which
areas of a screen they are finding easy or
difficult to understand - Possible measurements
- Number of fixations (where the eyes retains a
stable position for a period time) - The more fixations, the less efficient the search
strategy - Fixation duration
- Longer fixations may indicate difficulty with a
display - Scan path
- Indicates areas of interest, search strategy and
cognitive load - Plotting scan paths and fixation can indicate
what people look at, how often and how long - Eye tracking for usability is still very new and
the equipment is prohibitively expensive for
everyday use
38Monitoring Physiological Responses
- Physiological Measures
- Recordings of responses of the body which may
reflect the users emotional response to the
system - Galvanic skin response (GSR)
- A measure of general emotional arousal and
anxiety - Measures the electrical conductance of the skin,
which changes when sweating occurs - Electromyogram (EMG)
- A measure of tension or stress
- Measures muscle tension
- Electroencephalogram (EEG)
- A measure of electrical activity of brain cells
- Record general brain arousal as a response to
different situations, activity in different parts
of the brain as learning occurs, etc.
39Monitoring Physiological Responses
- Physiological Measures (Cont.)
- Magnetic resonance imaging (MRI)
- Provides an image of the brain structure of an
individual - Allows comparing the brain structure of
individuals with a particular condition (e.g.
cognitive impairment) with the brain structure of
those without the condition - Functional MRI (fMRI) can be used to scan areas
of the brain while a participant performs a
physical or cognitive task - Provide evidence for what brain processes are
involved in these tasks - Other physiological measures
- Body temperature, heart rate, etc.
40Choosing An Evaluation Method
- The Stage in the Cycle at Which the Evaluation is
Carried Out - Evaluation at the early design stage needs to be
quick and cheap, hence it might involve design
experts only and be analytic - Evaluation of the implementation needs to be more
comprehensive and thus brings in users as
participants - There are exceptions
- Participatory design involves users throughout
the design process - Cognitive walkthrough is expert-based and
analytic but can be used to evaluate
implementations as well as designs - Laboratory vs. Field studies
- Laboratory studies allow controlled
experimentation and observation while losing some
naturalness of the users environment - Field studies retain the naturalness of the
users environment but do not allow control over
user activity
41Choosing An Evaluation Method
- Subjective vs. Objective
- The more subjective techniques rely to a large
extent on the knowledge and expertise of the
evaluator who must recognize problems and
understand what the user is doing - Can be powerful if used correctly and provide
information that may not be available from more
objective methods - The problem of evaluator bias should be
recognized and avoided - One way to decrease the possibility of bias is to
use more than one evaluator - Objective techniques can produce repeatable
results which are not dependent on the persuasion
of the particular evaluator - Avoid bias and provide comparable results
- May not reveal the unexpected problem or give
detailed feedback on user experiences - Both objective and subjective approaches should
be used
42Choosing An Evaluation Method
- Quantitative vs. Qualitative Measures
- Quantitative measures are usually numeric and can
be easily analyzed using statistical techniques - Qualitative measures are non-numeric and
therefore more difficult to analyze, but can
provide important detail that cannot be
determined from numbers - Information Provided
- The information provided by an evaluator at any
stage of the design process may range from
low-level information to enable a design decision
to be made to high-level information - Controlled experiments are excellent at providing
low-level information - An experiment can be designed to measure a
particular aspect of the interface - Higher-level information can be gathered using
questionnaire and interview questions - Provide a more general impression of the users
view of the system
43Choosing An Evaluation Method
- Immediacy of Response
- Some methods (e.g. post-talk walkthrough) rely on
the users recollection of events - Recollection is liable to suffer from bias in
recall and reconstruction, with users
interpreting events according to their
preconceptions - Recall may also be incomplete
- Some methods (e.g. think aloud) record the users
behavior at the time of the interaction itself - The process of measurement can actually alter the
way the user works - Intrusiveness of Response
- Related to the immediacy of response
- Most immediate evaluation techniques are
intrusive to the user during the interaction and
thus run the risk of influencing the way the user
behaves
44Choosing An Evaluation Method
- Resources
- Resources to consider include equipment, time,
money, participants, expertise of evaluator and
context - e.g. It is impossible to produce a video protocol
without access to a video camera Cognitive
walkthrough relies more on evaluator expertise
than laboratory studies
Tables 9.4 9.6 in the textbook show the
classification of evaluation techniques, which
can help you choose the techniques that most
closely fit your evaluation requirements