Empirical Evaluation - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Empirical Evaluation

Description:

Deep Questions. Is computer science' science? How can you 'prove' a hypothesis with science? ... Typical question: Which visualization is better in which ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 22
Provided by: chris78
Category:

less

Transcript and Presenter's Notes

Title: Empirical Evaluation


1
Empirical Evaluation
  • Chris North
  • cs5984 Information Visualization

2
Evaluating Visualizations
  • Expert Review
  • Examination by visualization expert
  • Heuristic Evaluation
  • Principles, Guidelines
  • Algorithmic
  • Usability Evaluation
  • Observation, problem identification
  • Empirical Experiment
  • Controlled scientific experiment, user study
  • Comparisons, statistical analysis

3
What is Science?
  • Measurement
  • Modeling

4
Scientific Method
  • Form Hypothesis
  • Collect data
  • Analyze
  • Accept/reject hypothesis

5
Deep Questions
  • Is computer science science?
  • How can you prove a hypothesis with science?

6
Empirical Experiment
  • Typical question
  • Which visualization is better in which
    situations?
  • Lifelines PerspectiveWall

7
More Rigorous Question
  • Does Vis Tool (Lifelines or PerspWall) have an
    effect on user performance time for task X?
  • Null hypothesis
  • No effect
  • Lifelines PerspWall
  • Want to disprove, provide counter-example, show
    an effect

8
Variables
  • Independent Variables (what you vary) and
    treatments (the variable values)
  • Visualization tool
  • Lifelines, Perspective Wall, Text UI
  • Task type
  • Find, count, pattern, compare
  • Data size ( of items)
  • 100, 1000, 1000000
  • Dependent Variables (what you measure)
  • User performance time
  • Errors
  • Subjective satisfaction (survey)
  • HCI metrics!

9
Example 2 x 3 design
  • n users per cell

Ind Var 2 Task Type
Ind Var 1 Vis. Tool
Measured user performance times (dep var)
10
Groups
  • Between subjects variable
  • 1 group of users for each variable treatment
  • Group 1 20 users, Lifelines
  • Group 2 20 users, PerspWall
  • Total 40 users, 20 per cell
  • With-in subjects (repeated) variable
  • All users perform all treatments
  • Counter-balancing order effect
  • Group 1 20 users, Lifelines then PerspWall
  • Group 2 20 users, PerspWall then Lifelines
  • Total 40 users, 40 per cell

11
Issues
  • Randomized
  • Fairness
  • Identical procedures
  • Bias
  • User privacy, data security

12
Procedure
  • For each user
  • Sign legal forms
  • Pre-Survey demographics
  • Instructions
  • Do not reveal true purpose of experiment
  • Training runs
  • Actual runs
  • Post-Survey subjective measures
  • n users

13
Data
  • Measured dependent variables
  • Spreadsheet
  • Lifelines task 1, 2, 3, PerspWall task 1, 2, 3

14
Averages
Ind Var 2 Task Type
Ind Var 1 Vis. Tool
Measured user performance times (dep var)
15
PerspWall better than Lifelines?
  • Problem with Averages lossy
  • Compares only 2 numbers
  • What about the 40 data values? (Show me the data!)

Perf time (secs)
Lifelines perspWall
16
The real picture
  • Need stats that take all data into account

Perf time (secs)
Lifelines perspWall
17
Statistics
  • t-test
  • Compares 1 dep var on 2 treatments of 1 ind var
  • ANOVA Analysis of Variance
  • Compares 1 dep var on n treatments of m ind vars
  • Result significant difference between
    treatments?
  • p significance level (confidence)
  • typical cut-off p lt 0.05

18
p lt 0.05
  • Woohoo!
  • Found a statistically significant difference
  • Averages determine which is better
  • Conclusion
  • Vis Tool has an effect on user performance for
    task1
  • PerspWall better user performance than Lifelines
    for task1
  • 95 confident that PerspWall better than
    Lifelines
  • Not PerspWall beats Lifelines 95 of time
  • Found a counter-example to the null-hypothesis
  • Null-hypothesis Lifelines PerspWall
  • Hence Lifelines ? PerspWall

19
p gt 0.05
  • Hence, same?
  • Vis Tool has no effect on user performance for
    task1?
  • Lifelines PerspWall ?
  • NOT!
  • We did not detect a difference, but could still
    be different
  • Did not find a counter-example to null hypothesis
  • Provides evidence for Lifelines PerspWall, but
    not proof
  • Boring! Basically found nothing
  • How?
  • Not enough users
  • Need better tasks, data,

20
Data Mountain
  • Robertson, Data Mountain (Microsoft)
  • Quoc, Reenal

21
Assignment
  • Thurs Visualization Development
  • Bederson, Jazz
  • Jun, Rohit
  • Literature Review due Thurs
  • Homework 2 due thurs oct 4
Write a Comment
User Comments (0)
About PowerShow.com