Title: UI Observational Techniques
1UI Observational Techniques
2Agenda
- Goals for observation
- Usability specifications
- Participants, IRB, ethics
- What to observe
- Think aloud
- Cooperative evaluation
- Performing Tasks
- Observation mechanisms
- Direct
- Recording by audio or video
- Computer logging
3Observation - What, Why
- Watching users as they perform
- Summative or formative
- Depends on the purpose of exercise
- Qualitative or quantitative
- Depends on recording and analysis
4Usability Specifications
- Is it good enough
- to stop working on it?
- to get paid?
How do we judge these things?
5Why Evaluate?
- Recall
- Users and their tasks were identified
- Needs and requirements were specified
- Interface was designed, prototype built
- But is it any good? Does the system support the
users in their tasks? Is it better than what was
there before (if anything)?
6Usability Specifications
- Quantitative usability goals, used as guide for
knowing when interface is good enough - Should be established as early as possible
- Generally a large part of the Requirements
Specifications at the center of a design contract - Evaluation is often used to demonstrate the
design meets certain requirements (and so the
designer/developer should get paid) - Often driven by competitions usability,
features, or performance
7Formulating Specifications
- Better be more useful than this
8Measurement Process
- If you cant measure it,you cant manage it
- Need to keep gathering data on each iterative
evaluation and refinement - Compare benchmark task performance to specified
levels - Know when to get it out the door!
9What is Included?
- Common usability attributes that are often
captured in usability specs - Initial performance
- Long-term performance
- Learnability
- Retainability
- Advanced feature usage
- First impression
- Long-term user satisfaction
Quantitative
10Assessment Technique
How will you judge whether your design meets the
criteria?
Usability Measure Value to Current
Worst Planned Best poss
Observ attribute instrum. be meas.
level perf. level target level level
results Initial Benchmk Length
of 15 secs 30 secs 20 secs
10 secs perf task time to
(manual)
successfully add
appointment on
the first trial First Quest.
Likert scale .2 -2 1.5
2 Impression -2 -1 0 1 2
11Specific Data
- Measuring Instrument
- Questionnaires, Benchmark tasks
- Value to be measured
- Time to complete task
- Number or percentage of errors
- Percent of task completed in given time
- Ratio of successes to failures
- Number of commands used
- Frequency of help usage
- Target level
- Often established by comparison with competing
system or non-computer based task
12Data
- Information gathered can be objective or
subjective - Information also can be qualitative or
quantitative
Which are tougher to measure?
13Summary
- Usability specs can be useful in tracking the
effectiveness of redesign efforts - They are often part of a contract
- Designers can set their own usability specs, even
if the project does not specify them in advance - Know when it is good enough, and be confident to
move on to the next project
14One Way to Use User Testing
Evaluation can help your design
15Types of Evaluation
- Interpretive and Predictive (a reminder)
- Heuristic evaluation, cognitive walkthroughs,
ethnography - Summative vs. Formative
- What were they, again?
16Now With Users Involved
- Interpretive (naturalistic) vs. Empirical
- Naturalistic
- In realistic setting, usually includes some
detached observation, careful study of users - Empirical
- People use system, manipulate independent
variables and observe dependent ones
17Conducting an Evaluation
- Determine the performance measures
- Determine the tasks
- Develop the plan
- IRB approval
- Recruit participants
- Collect the data
- Inspect analyze the data
- Draw conclusions to resolve design problems
- Redesign and implement the revised interface
- Keep the designers gagged in the background
18The Task
- Benchmark tasks - gather quantitative data
- Representative tasks - add breadth, can help
understand process - Tell them what to do, not how to do it
- Real people doing real tasks
- Issues
- Lab testing vs. field testing
- Validity - typical users typical tasks typical
setting? - Run pilot versions to shake out the bugs
19Benchmark Tasks
- Specific, clearly stated task for users to carry
out - Example Email handler
- Find the message from Mary and reply with a
response of Tuesday morning at 11. - Users perform these under a variety of conditions
and you measure performance
20Defining Performance
- Based on the task
- Specific, objective measures/metrics
- Examples
- Speed (reaction time, time to complete)
- Time to attain a level of proficiency
- Until can do a specific task in 30 minutes
- Accuracy (errors, hits/misses)
- Production (number of files processed)
- Score (number of points earned)
- others?
21Speed of Learning
- Typical users with typical training
- How much practice needed until can do some
benchmark task in fixed time - Such as edit a marked-up document in 30 minutes
- Cant practice on the same document all the time
) - To test documentation and on-line help, provide
no other training
22Speed of Use
- On a number of different tasks
- Recovering from errors is part of task completion
- Meaning that dont explicitly worry about how
many errors are made
23Now What?
- Youve got your task, performance measures,
testing design, etc. - Now you need to gather the data
- So you need PARTICIPANTS
24IRB, Participants, Ethics
- Institutional Review Board (IRB)
- http//www.osp.gatech.edu/compliance.htm
- Reviews all research involving human (or animal)
participants - Safeguarding the participants, and thereby the
researcher and university - Not a science review (i.e., not to assess your
research ideas) only safety ethics - Complete Web-based forms, submit research
summary, sample consent forms, etc. - All experimenters must complete NIH online
history/ethics course prior to submitting
25Recruiting Participants
- Various subject pools
- Volunteers
- Paid participants
- Students (e.g., psych undergrads) for course
credit - Friends, acquaintances, family, lab members
- Public space participants - e.g., observing
people walking through a museum - Must fit user population (validity)
- Motivation is a big factor - not only but also
explaining the importance of the research - Note Ethics, IRB, Consent apply to all
participants, including friends pilot subjects
26Ethics
- Testing can be arduous
- Each participant should consent to be in
experiment (informal or formal) - Know what experiment involves, what to expect,
what the potential risks are - Must be able to stop without danger or penalty
- All participants to be treated with respect
27Consent
- Why important?
- People can be sensitive about this process and
issues - Errors will likely be made, participant may feel
inadequate - May be mentally or physically strenuous
- What are the potential risks (there are always
risks)? - Examples?
- Vulnerable populations need special care
consideration ( IRB review) - Children disabled pregnant students (why?)
28Before Study
- Be well prepared so participants time is not
wasted - Make sure they know you are testing software, not
them - (Usability testing, not User testing)
- Maintain privacy
- Explain procedures without compromising results
- Can quit anytime
- Administer signed consent form
29During Study
- Make sure participant is comfortable
- Session should not be too long
- Maintain relaxed atmosphere
- Never indicate displeasure or anger
30After Study
- State how session will help you improve system
(debriefing) - Show participant how to perform failed tasks
- Dont compromise privacy (never identify people,
only show videos with explicit permission) - Data to be stored anonymously, securely, and/or
destroyed
31Attribution Theory
- Studies why people believe that they succeeded or
failed--themselves or outside factors (gender,
age differences) - Want your subjects to not attribute problems to
themselves, but to the interface - Explain how errors or failures are not
participants problem---places where interface
needs to be improved. You need their help!!
32Evaluation is Detective Work
- Goal gather evidence that can help you determine
whether your usability goals are being met - Evidence (data) should be
- Relevant
- Diagnostic
- Credible
- Corroborated
33Data as Evidence
- Relevant
- Appropriate to address the hypotheses
- e.g., Does measuring number of errors provide
insight into how effective your new air traffic
control system supports the users tasks? - Diagnostic
- Data unambiguously provide evidence one way or
the other - e.g., Does asking the users preferences clearly
tell you if the system performs better? (Maybe)
34Data as Evidence
- Credible
- Are the data trustworthy?
- Gather data carefully gather enough data
- Corroborated
- Do more than one source of evidence support the
hypotheses? - e.g. Both accuracy and user opinions indicate
that the new system is better than the previous
system. But what if completion time is slower?
35General Recommendations
- Include both objective subjective data
- e.g. completion time and preference
- Use multiple measures, within a type
- e.g. reaction time and accuracy
- Use quantitative measures where possible
- e.g. preference score (on a scale of 1-7)
- Note Only gather the data required do so with
the min. interruption, hassle, time, etc.
36Types of Data to Collect
- Demographics
- Info about the participant, used for grouping or
for correlation with other measures - e.g. handedness age first/best language SAT
score - Note Gather if it is relevant. Does not have to
be self-reported you can use tests
(e.g.,Edinburgh Handedness) - Quantitative data
- What you measure
- e.g. reaction time number of yawns
- Qualitative data
- Descriptions, observations that are not
quantified - e.g. different ways of holding the mouse
approaches to solving problem trouble
understanding the instructions
37Collecting Data
- Capturing the Session
- Observation Note-taking
- Audio and video recording
- Instrumented user interface
- Software logs
- Think-aloud protocol - can be very helpful
- Critical incident logging - positive negative
- User Journals
- Post-session activities
- Structured interviews debriefing
- What did you like best/least? How would you
change..? - Questionnaires, comments, and rating scales
- Post-hoc video coding/rating by experimenter
38Pros and Cons of recording
- Richness of record
- Time to transcribe analyze
39Observing Users
- Not as easy as you think
- One of the best ways to gather feedback about
your interface - Watch, listen and learn as a person interacts
with your system - Preferable to have it done by others than
developers - Keep developers in background, gagged
40Observation
- Direct
- In same room
- Can be intrusive
- Users aware of your presence
- Only see it one time
- May use 1-way mirror to reduce intrusion
- Cheap, quicker to set up and to analyze
- Indirect
- Video recording
- Reduces intrusion, but doesnt eliminate it
- Cameras focused on screen, face keyboard
- Gives archival record, but can spend a lot of
time reviewing it
41Location
- Observations may be
- In lab - maybe a specially built usability lab
- Easier to control
- Can have user complete set of tasks
- In field
- Watch their everyday actions
- More realistic
- Harder to control other factors
42Challenge
- In simple observation, you observe actions but
dont know whats going on in their head - Often utilize some form of verbal protocol where
users describe their thoughts
43Verbal Protocol
- One technique Think-aloud
- User describes verbally what s/he is thinking
while performing the tasks - What they believe is happening
- Why they take an action
- What they are trying to do
44Think Aloud
- Very widely used, useful technique
- Allows you to understand users thought processes
better - Potential problems
- Can be awkward for participant
- Thinking aloud can modify way user performs task
45Teams
- Another technique Co-discovery learning
(Constructive interaction) - Join pairs of participants to work together
- Use think aloud
- Perhaps have one person be semi-expert (coach)
and one be novice - More natural (like conversation) so removes some
awkwardness of individual think aloud
46Alternative
- What if thinking aloud during session will be too
disruptive? - Can use post-event protocol
- User performs session, then watches video and
describes what s/he was thinking - Sometimes difficult to recall
- Opens up door of interpretation
47Historical Record
- In observing users, how do you capture events in
the session for later analysis? - ?
48Capturing a Session - Paper pencil
- Can be slow
- May miss things
- Is definitely cheap and easy
Task 1 Task 2 Task 3
Time 1000 1003 1008
1022
S e
S e
49Capturing a Session - Recording
- (audio and/or video)
- Good for talk-aloud
- Hard to tie to interface
- Multiple cameras probablyneeded
- Good, rich record of session
- Can be intrusive
- Can be painful to transcribe and analyze
50Sun Microsystem Usability Lab
51ObservationRoom
Large viewing area in this one-way mirror which
includes an angled sheet of glass the improves
light capture and prevents sound transmission
between rooms. Doors for participant room and
observation rooms are located such that
participants are unaware of observers movements
in and out of the observation room.
http//www.surgeworks.com/services/observation_roo
m2.htm
52Usability Lab -Observation Room
- State-of-the-art observation room equipped with
three monitors to view participant, participant's
monitor, and composite picture in picture. - One-way mirror plus angled glass captures light
and isolates sound between rooms. - Comfortable and spacious for three people, but
room enough for six seated observers. - Digital mixer for unlimited mixing of input
images and recording to VHS, SVHS, or MiniDV
recorders.
53Usability Lab - Participant Room
- Sound proof participant room with a feel similar
to a standard office environment. - Pan-tilt-zoom high resolution digital camera
(visible in the upper right corner of image at
left). - Microphone pickup can be moved near participant
or left in location, which is just below right
side of the observation window. - Observation room door is not visible by
participants from reception/waiting area.
Participants are unaware of people entering or
leaving observation room.
54Usability Lab - Participant Room
- Note the half-silvered mirror
55Capturing a Session - Software
- Modify software to log user actions
- Can give time-stamped keypress or mouse event
- Sync with video
- Commercial software available
- Two problems
- Too low-level, want higher level events
- Massive amount of data, need analysis tools
56Issues
- What if user gets stuck on a task?
- You can ask
- What are you trying to do..?
- What made you think..?
- How would you like to perform..?
- What would make this easier to accomplish..?
- Maybe offer hints
- Can provide design ideas
57Post-task walkthroughs
- Discussion with subject after observation
- Added richness and interpretations
- Warning post hoc interpretation