SBD: Usability Evaluation

1 / 42
About This Presentation
Title:

SBD: Usability Evaluation

Description:

Find most expensive house for sale? Best case (expert) ... Video-tape screen & mouse. Eye tracking, biometrics? Analyze. Initial reaction: 'stupid user! ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 43
Provided by: chris78
Learn more at: http://courses.cs.vt.edu

less

Transcript and Presenter's Notes

Title: SBD: Usability Evaluation


1
SBDUsability Evaluation
  • Chris North
  • cs3724 HCI

2
ANALYZE
claims about current practice
analysis of stakeholders, field studies
Problem scenarios
Scenario-Based Design
DESIGN
Activity scenarios
iterative analysis of usability claims
and re-design
metaphors, information technology, HCI
theory, guidelines
Information scenarios
Interaction scenarios
PROTOTYPE EVALUATE
summative evaluation
formative evaluation
Usability specifications
3
Evaluation
  • Formative vs. Summative
  • Analytic vs. Emprical

4
Usability Engineering
Reqs Analysis
Design
Evaluate
Develop
many iterations
5
Usability Engineering
Formative evaluation
Summative evaluation
6
Usability Evaluation
  • Analytic Methods
  • Usability inspection, Expert review
  • Heuristic Evaluation
  • Cognitive walk-through
  • GOMS analysis
  • Empirical Methods
  • Usability Testing
  • Field or lab
  • Observation, problem identification
  • Controlled Experiment
  • Formal controlled scientific experiment
  • Comparisons, statistical analysis

7
User Interface Metrics
  • Ease of learning
  • learning time,
  • Ease of use
  • perf time, error rates
  • User satisfaction
  • surveys
  • Not user friendly

8
Usability Testing
9
Usability Testing
  • Formative helps guide design
  • Early in design process
  • when architecture is finalized, then its too
    late!
  • A few users
  • Usability problems, incidents
  • Qualitative feedback from users
  • Quantitative usability specification

10
Usability Specification Table
Scenario task Worst case Planned Target Best case (expert) Observed
Find most expensive house for sale? 1 min. 10 sec. 3 sec. ??? sec

11
Usability Test Setup
  • Set of benchmark tasks
  • Easy to hard, specific to open-ended
  • Coverage of different UI features
  • E.g. find the 5 most expensive houses for sale
  • Different types learnability vs. performance
  • Consent forms
  • Not needed unless video-taping users face
    (new rule)
  • Experimenters
  • Facilitator instructs user
  • Observers take notes, collect data, video tape
    screen
  • Executor run the prototype if faked
  • Users
  • 3-5 users, quality not quantity

12
Usability Test Procedure
  • Goal mimic real life
  • Do not cheat by showing them how to use the UI!
  • Initial instructions
  • We are evaluating the system, not you.
  • Repeat
  • Give user a task
  • Ask user to think aloud
  • Observe, note mistakes and problems
  • Avoid interfering, hint only if completely stuck
  • Interview
  • Verbal feedback
  • Questionnaire
  • 1 hour / user

13
Usability Lab
  • E.g. McBryde 102

14
Data
  • Note taking
  • E.g. _at_ user keeps clicking on the wrong
    button
  • Verbal protocol think aloud
  • E.g. user thinks that button does something else
  • Rough quantitative measures
  • HCI metrics e.g. task completion time, ..
  • Interview feedback and surveys
  • Video-tape screen mouse
  • Eye tracking, biometrics?

15
Analyze
  • Initial reaction
  • stupid user!, thats developer Xs fault!,
    this sucks
  • Mature reaction
  • how can we redesign UI to solve that usability
    problem?
  • the user is always right
  • Identify usability problems
  • Learning issues e.g. cant figure out or didnt
    notice feature
  • Performance issues e.g. arduous, tiring to
    solve tasks
  • Subjective issues e.g. annoying, ugly
  • Problem severity critical vs. minor

16
Cost-Importance Analysis
  • Importance 1-5 (task effect, frequency)
  • 5 critical, major impact on user, frequent
    occurance
  • 3 user can complete task, but with difficulty
  • 1 minor problem, small speed bump, infrequent
  • Ratio importance / cost
  • Sort by this
  • 3 categories Must fix, next version, ignored

Problem Importance Solutions Cost Ratio I/C

17
Refine UI
  • Simple solutions vs. major redesigns
  • Solve problems in order of importance/cost
  • Example
  • Problem user didnt know he could zoom in to
    see more
  • Potential solutions
  • Better zoom button icon, tooltip
  • Add a zoom bar slider (like moosburg)
  • Icons for different zoom levels boundaries,
    roads, buildings
  • NOT more help documentation!!! You can do
    better.
  • Iterate
  • Test, refine, test, refine, test, refine,
  • Until? Meets usability specification

18
Project Usability Evaluation
  • Usability Evaluation
  • gt3 users Not (tainted) HCI students
  • Simple data collection (Biometrics optional!)
  • Exploit this opportunity to improve your design
  • Report
  • Procedure (users, tasks, specs, data collection)
  • Usability problems identified, specs not met
  • Design modifications

19
Controlled Experiments
20
Usability test vs. Controlled Expm.
  • Usability test
  • Formative helps guide design
  • Single UI, early in design process
  • Few users
  • Usability problems, incidents
  • Qualitative feedback from users
  • Controlled experiment
  • Summative measure final result
  • Compare multiple UIs
  • Many users, strict protocol
  • Independent dependent variables
  • Quantitative results, statistical significance

21
What is Science?
  • Measurement
  • Modeling

22
Scientific Method
  • Form Hypothesis
  • Collect data
  • Analyze
  • Accept/reject hypothesis
  • How to prove a hypothesis in science?
  • Easier to disprove things, by counterexample
  • Null hypothesis opposite of hypothesis
  • Disprove null hypothesis
  • Hence, hypothesis is proved

23
Empirical Experiment
  • Typical question
  • Which visualization is better in which
    situations?
  • Spotfire vs. TableLens

24
Cause and Effect
  • Goal determine cause and effect
  • Cause visualization tool (Spotfire vs.
    TableLens)
  • Effect user performance time on task T
  • Procedure
  • Vary cause
  • Measure effect
  • Problem random variation
  • Cause vis tool OR random variation?

random variation
Realworld
Collecteddata
uncertain conclusions
25
Stats to the Rescue
  • Goal
  • Measured effect unlikely to result by random
    variation
  • Hypothesis
  • Cause visualization tool (e.g. Spotfire ?
    TableLens)
  • Null hypothesis
  • Visualization tool has no effect (e.g. Spotfire
    TableLens)
  • Hence Cause random variation
  • Stats
  • If null hypothesis true, then measured effect
    occurs with probability lt 5 (e.g. measured
    effect gtgt random variation)
  • Hence
  • Null hypothesis unlikely to be true
  • Hence, hypothesis likely to be true

26
Variables
  • Independent Variables (what you vary), and
    treatments (the variable values)
  • Visualization tool
  • Spotfire, TableLens, Excel
  • Task type
  • Find, count, pattern, compare
  • Data size ( of items)
  • 100, 1000, 1000000
  • Dependent Variables (what you measure)
  • User performance time
  • Errors
  • Subjective satisfaction (survey)
  • HCI metrics

27
Example 2 x 3 design
  • n users per cell

Ind Var 2 Task Type
Task1 Task2 Task3
Spot-fire
Table-Lens
Ind Var 1 Vis. Tool
Measured user performance times (dep var)
28
Groups
  • Between subjects variable
  • 1 group of users for each variable treatment
  • Group 1 20 users, Spotfire
  • Group 2 20 users, TableLens
  • Total 40 users, 20 per cell
  • With-in subjects (repeated) variable
  • All users perform all treatments
  • Counter-balancing order effect
  • Group 1 20 users, Spotfire then TableLens
  • Group 2 20 users, TableLens then Spotfire
  • Total 40 users, 40 per cell

29
Issues
  • Eliminate or measure extraneous factors
  • Randomized
  • Fairness
  • Identical procedures,
  • Bias
  • User privacy, data security
  • IRB (internal review board)

30
Procedure
  • For each user
  • Sign legal forms
  • Pre-Survey demographics
  • Instructions
  • Do not reveal true purpose of experiment
  • Training runs
  • Actual runs
  • Give task
  • measure performance
  • Post-Survey subjective measures
  • n users

31
Data
  • Measured dependent variables
  • Spreadsheet

User Spotfire Spotfire Spotfire TableLens TableLens TableLens
User task 1 task 2 task 3 task 1 task 2 task 3



32
Step 1 Visualize it
  • Dig out interesting facts
  • Qualitative conclusions
  • Guide stats
  • Guide future experiments

33
Step 2 Stats
Ind Var 2 Task Type
Task1 Task2 Task3
Spot-fire 37.2 54.5 103.7
Table-Lens 29.8 53.2 145.4
Ind Var 1 Vis. Tool
Average user performance times (dep var)
34
TableLens better than Spotfire?
  • Problem with Averages lossy
  • Compares only 2 numbers
  • What about the 40 data values? (Show me the
    data!)

Avg Perf time (secs)
Spotfire TableLens
35
The real picture
  • Need stats that compare all data

Avg Perf time (secs)
Spotfire TableLens
36
Statistics
  • t-test
  • Compares 1 dep var on 2 treatments of 1 ind var
  • ANOVA Analysis of Variance
  • Compares 1 dep var on n treatments of m ind vars
  • Result
  • p probability that difference between
    treatments is random (null hypothesis)
  • statistical significance level
  • typical cut-off p lt 0.05
  • Hypothesis confidence 1 - p

37
In Excel
38
p lt 0.05
  • Woohoo!
  • Found a statistically significant difference
  • Averages determine which is better
  • Conclusion
  • Cause visualization tool (e.g. Spotfire ?
    TableLens)
  • Vis Tool has an effect on user performance for
    task T
  • 95 confident that TableLens better than
    Spotfire
  • NOT TableLens beats Spotfire 95 of time
  • 5 chance of being wrong!
  • Be careful about generalizing

39
p gt 0.05
  • Hence, no difference?
  • Vis Tool has no effect on user performance for
    task T?
  • Spotfire TableLens ?
  • NOT!
  • Did not detect a difference, but could still be
    different
  • Potential real effect did not overcome random
    variation
  • Provides evidence for Spotfire TableLens, but
    not proof
  • Boring, basically found nothing
  • How?
  • Not enough users
  • Need better tasks, data,

40
Data Mountain
  • Robertson, Data Mountain (Microsoft)

41
Data Mountain Experiment
  • Data Mountain vs. IE favorites
  • 32 subjects
  • Organize 100 pages, then retrieve based on cues
  • Indep. Vars
  • UI Data mountain (old, new), IE
  • Cue Title, Summary, Thumbnail, all 3
  • Dependent variables
  • User performance time
  • Error rates wrong pages, failed to find in 2
    min
  • Subjective ratings

42
Data Mountain Results
  • Spatial Memory!
  • Limited scalability?
Write a Comment
User Comments (0)