Title: Formal User Testing
1Formal User Testing
- MIS 441 User Interface Design, Prototyping, and
Evaluation - Class 19 - March 27, 2000
2Agenda for Today
- Administrivia
- Milestone 4 due today
- Heuristic evaluation assignment due
- Essay 1 should be returned next class
- Milestone 5 (Hi-Fi and user test plan) due Mon,
Apr 17 - Milestone 6 (HE) due Wed, Apr 26
- Review heuristic evaluation (HE)
- lets talk about the aggregated list of heuristic
violations - Formal user testing
3...
Multiple evaluators independently produce a list
of usability problems (i.e., identify design
elements that violate one or more heuristics)
Evaluator 1
Evaluator 2
The findings are aggregated into a single list of
problems and the heuristics violated. At this
stage, redundancies are eliminated and
clarifications are made
Problem Heur Violated Description
...
The aggregated list is then sent back out to each
evaluator who then independently review the list
and assign a severity rating to each problem.
Apply severity ratings
Apply severity ratings
The lists are collected and a summary report is
created that includes the average severity rating
for each problem. The evaluators and design team
then go through a debriefing session , discuss
the problems, potential fixes, and add fix
ratings to the summary report
Summary Report
Final Report
UI Redesign / Next Prototype
4What is User Testing?
- Participants are real (or representative) users
- Participants perform real tasks in a real work
context - The administrator
- observes / records what participants do and say
- need to decide what to measure and how to measure
it - quantitative and qualitative performance and
preference measures - analyzes the data
- diagnoses the problem
- recommends changes to fix those problems
5Why Do User Testing?
- Cant tell how good or bad UI is until
- people use it!
- preference vs. choice
- e.g., surveys, interviews, focus groups, beta
testing - Other methods are based on evaluators who
- may know too much (about the intent of the
design) or - may not know enough (about tasks, etc.)
- e.g., cognitive walkthroughs, heuristic
evaluations - Hard to predict what real users will do until
they do them
6Observation A Critical Difference
- Observing seems easy but is very complicated
- Requires careful consideration and skill
- Types of observation
- direct observation
- video recording
- data logging software
- Disadvantages of observation??
- experiment effect
- Hawthorne effect (1939)
7Who Should Be On a User Testing Team?
- Humans factors specialist
- Product marketing specialist
- Software / hardware engineer
- System designers and programmers
- Technical communicators
- Job training specialists
- Customer service representatives
- And many more...
8Planning a User TestUser Test Proposal
- Problem statement or test objective
- Participant profile
- Scenarios
- Measures to collect
- Data collection methods
- Testing environment
- Roles of design team members
9Test Objectives
Test Objective User profile Scenarios Measure to
collect Data collection methods Testing
environment
- What is the focus of each user test (evaluation)?
- easy to learn, easy to remember, efficient to
use, few errors, aesthetically pleasing - General objective example
- will new users be able to navigate through the
menus quickly and easily? Learnability - Specific objective example
- will new users be able to find the right menu
path to read, write, send, respond to, forward,
save, and delete a message - What you want to learn from the test will lead to
- who are the participants, what tasks will they
perform during the evaluation, what measures to
collect
10Measures to Collect
Test Objective User profile Scenarios Measures to
collect Data collection methods Testing
environment
- Two types of data
- process data
- observations of what users are doing and thinking
- bottom-line data (i.e., performance measures)
- counts of actions / behaviors that you see
- time, errors, successes
11Using the Results of Process Data (Think Aloud)
- Summarize the data
- make a list of all critical incidents (CI)
- positive something they liked or that worked
well - negative difficulties with the UI
- include references back to the original data
- try to judge why each difficulty occurred
- What does the data tell you?
- UI work the way you thought it would?
- is your model consistent with the users
conceptual model? - great way to better understand users conceptual
model - something missing?
12Using the Results (Think Aloud)
- Update task analysis and rethink design
- rate severity and ease of fixing critical
incidents - fix severe problems and make the easy fixes
- Will thinking aloud give the right answers
- not always
- if you ask a question, people will always give an
answer, even when it has nothing to do with the
facts
13Measuring Bottom-Line Usability
- Situations in which numbers are useful
- time requirements for task completion
- number of successful completions
- number of errors made by users
- compare 2 designs on speed or number of errors
- Do not combine with think aloud protocol
- talking can affect speed and accuracy (neg. and
pos.) - your project is an exception to this general rule
- Time is easy to record
- Error or successful completion is harder
- define in advance what this means
14Bottom-Line Data
- Typical Performance Measures
- time to finish a task
- time spent navigating menus
- time spent in the online help
- time to find information in the manual
- time spent recovering from errors
- number of wrong menu choices
- number of incorrect choices in the dialog boxes
- number of wrong icon choices
- number of repeated errors (the same error more
than once) - number of calls to the help desk or for aid
- number of screens of on-line help looked at
- number of repeated looks at the same help screen
- number of times turned to the manual
- number of pages looked at in each visit to the
manual
Typical Subjective User Preference
Measures Ratings of ease of learning ease of
using the product ease of doing a particular
task ease of installing the product helpfulnes
s of the on-line help ease of finding
information in the manual ease of understanding
the information usefulness of the examples in
the help Preferences over a previous version and
reasons over a competitors product for the over
the way they are doing their tasks
now preferences Predictions Would you buy this
product? Would you pay extra for the
manual? How much would you pay for this
product? Spontaneous Comments I dont
understand this message!
15Statistical Analysis of Bottom-Line Data
- Example trying to get task time lt30 min.
- test gives 20, 15, 40, 90, 10, 5
- Sample Mean 30, Median 17.5, Looks good!
- wrong answer, not certain of anything
- Factors contributing to our uncertainty
- small number of test users (n6)
- results are very variable (standard deviation
32) - general rule 95 confident that true mean lies
within 2 standard deviations from the sample mean - Confidence Interval is about -34 minutes, 94
minutes
16Measuring User Preferences
- How much users like or dislike the system
- Likert scale
- Semantic differential scale
- can ask users to rate on a scale of 1 to 10
- can have them choose among statements
- Best UI Ive ever used, better than
average... - If you get many low ratings, you are in trouble
- Can get some useful data by asking open-ended
questions about - what they liked, disliked, where they had
trouble, best part, worst part, etc.
17Simple Single-Room Setup
Test Objective User profile Scenarios Measures to
collect Data collection methods Testing
environment
- Advantages
- test monitor can see is going on with the
participant - verbal cues, facial expressions, mannerisms
- allows interaction with participant in early,
exploratory tests - may be more natural to think aloud with someone
in the room - Disadvantages
- test monitors behavior may affect the
participants behavior - there is limited space for observers
18Modified Single-Room Setup
- Advantages
- Test monitor can be less concerned about
controlling body language, mannerisms, taking
notes, etc. - Participant does not feel isolated since monitor
is still in the room - Participant more likely to think aloud
- Disadvantages
- Monitor cant see subtle facial expressions /
mannerisms as well - Monitor location may make user feel
self-conscious or uneasy
19Electronic Observation-Room Setup
- Advantages
- Same as single-room setup
- Observers dont interfere with or bias the users
- Disadvantages
- Monitor behavior can bias user
- Requires the use of 2 rooms at a time
20Classic Testing Laboratory Setup
- Advantages
- Unobtrusive data collection (but user still knows
she is being videotaped) - Monitors and observes can talk to each other and
discuss how to solve problems that come up - Setup can accommodate many observers
- Disadvantages
- Requires lots of money, resources, and commitment
to testing
21Testing Environment Trade-Offs
- Test monitor access to participant
- Accommodations for the observers
- location
- number of observers allowed
- Cost
- equipment video cameras, data-logging
equipment, one-way mirrors, etc. - space number and size of rooms occupied during
testing
22Roles of the Design Team MembersDuring Evaluation
- Test monitor / administrator
- greets, interacts with, and debriefs the test
users - accumulates and communicates test results
- Timers
- keep track of beginning, ending, and elapsed time
of test activities - Video recording operators
- record comments by test users, instructions by
monitor, and all interactions between monitor,
participant, and prototype - camera angles to maximize user/product visibility
23Roles of the Design Team MembersDuring
Evaluation (Continued)
- Product / technical experts
- make sure system does not malfunction during the
test - Other testing roles
- play a customer role in the test
- simulate help calls on a hotline
- Test observers
- development team Leads to better appreciation
for user-centered design perspective and the
problems users will have - do not let managers of test users be observers at
the test - members of other project development teams
24Characteristics of an Effective Test Monitor
- Grounding in basic usability engineering
- cognitive/information processing, user-centered
design, human factors expertise - Quick learner
- understand / interpret the comments / actions of
test users - able to probe users and ask effective follow-up
questions - Instant rapport with participants
- make friends, put user at easy, develop a trust
- Excellent memory
25Characteristics of an Effective Test Monitor
(Continued)
- Good listener
- listen with new ears each time
- Comfortable with ambiguity
- Flexibility
- know when to deviate from the test plan
- Long attention span
- There is no predicting when a gem of a discovery
will arise during a test session - Usually 10 -20 sessions, 2-3 hours each watching
the same tasks repeatedly
26Characteristics of an Effective Test Monitor
(Continued)
- Empathetic people person
- Good communicator
- presenting information to design team
- making recommendations
- writing skills for written report
- presentation skills for convincing team members
of changes that need to be made - Good organizer
27Preparing Test Materials
- Recruiting letter and pretest questionnaire
- Test / orientation script (sample in Rubin, page
150) - read verbatim usually
- tells users what will happen during the test
- intended to put them at easy
- product is being evaluated, not the user
- Nondisclosure agreement and tape consent form
28Preparing Test Materials (Continued)
- Task scenarios
- List of measures / data to be collected
- performance and preference data
- Posttest questionnaire
- preference information (opinions and feelings)
from the user - usually lots of Likert and semantic differential
scales - Debriefing topics (issues)
- get open-ended feedback and clarifications from
the user
29Usability Testing Services
- Usability Sciences
- http//www.usabilitysciences.com/
- seeking users to usability test software products
and get paid! - Users are videotaped and asked for feedback as
they perform a set of tasks with the product(s)
being tested. Your feedback is turned into
recommendations for the client. In most studies,
tests last roughly 2-3 hours. All users are
compensated for their time. In most cases, all
testing is conducted in Usability Sciences'
testing labs in Las Colinas in the Dallas/Fort
Worth metroplex. If you would like to
participate in a usability test, please contact
Stephanie Farley at testing_at_usabilitysciences.com,
or call us at (972) 550-1599.
30Usability Testing Services (Continued)
- Interface Analysis Associates
- http//www.interface-analysis.com/home.shtml
- Egosoft Laboratories Incorporated
- http//www.ergolabs.com/
- Check out Ergosofts links and downloads page
- http//www.ergolabs.com/links_and_downloads/links_
and_downloads.htm - There are many others...
31Usability Testing Services (Continued)
- Human Factors International, Inc.
- Design and UT consultants (colors / layout /
wording) - http//www.humanfactors.com/
- Siemens Usability Center
- http//www.aut.sea.siemens.com/usability/testing.h
tm
32On-line PC Magazine Article
- Making Software Easier Through Usability
Testing - http//www.zdnet.com/pcmag/pctech/content/17/17/tu
1717.001.html - Microsofts usability lab
- Talks about the setup of Microsofts usability
testing labs - User testing of Windows 95, Windows 98, and
Office 97 - http//www.microsoft.com/usability
- And there are job openings at Microsoft for
usability groups - IBM (user-centered design)
- http//www-3.ibm.com/ibm/easy/eou_ext.nsf/Publish/
17
33Milestone 5
- Milestone 5a
- develop the revised lo-fi storyboards
- develop the hi-fi prototype based on these
storyboards - create the hi-fi storyboards with screen shots
- ltPrint Screengt captures the entire monitor screen
- ltAltgtltPrint Screengt captures just the active
window only - Milestone 5b
- develop the formal user test proposal
- from test objectives to roles of the design team
members - prepare the test materials to be used in the test
- however, you should not perform the user test at
this point
34 Next Class
- More specifics on conducting a user test
- continue reading through the assigned readings
from Rubin - you can skim through sections with which you are
familiar (e.g., discussions about the user
profile, task analysis, scenarios, etc.)