Statistical Tests for HCI Evaluation - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Statistical Tests for HCI Evaluation

Description:

Can we conclude that Jerry runs faster than Tom does? ... We conclude that they are the same with 99% confidence. Going back to the example... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 18
Provided by: cha52
Category:

less

Transcript and Presenter's Notes

Title: Statistical Tests for HCI Evaluation


1
Statistical Tests for HCI Evaluation
  • SEG3510 Tutorial 7 (Mar 7)
  • 2007 Spring Semester
  • Prepared by Kelvin Chan (chansk_at_se.cuhk.edu.hk)

2
Contents
  • Motivation
  • Review of basic concepts
  • Statistical tests for evaluation of HCI

3
Why statistical tests?
  • 100m sprint
  • Tom 13.9s, 12.9s, 13.4s, 15.9s, 13.7s
  • Mean 13.96s
  • Jerry 13.2s, 13.8s, 13.9s, 13.2s, 14.9s
  • Mean 13.80s
  • Can we conclude that Jerry runs faster than Tom
    does?
  • Using mean to compare is not persuading enough

4
Null Hypothesis
  • We want to make an assertion that the two
    population are different
  • The approach to take is to assume that the two
    population are the same Null hypothesis H0
  • If the data support an alternative hypothesis H1
    that the two population are different, if and
    only if it rejects/nullifies H0
  • Spirit innocent until proven guilty H0
    presumed to be true initially, rejected only when
    it is evidently false

5
Parametric v.s. Non-parametric
  • Parametric statistical tests
  • Makes assumption on the distribution of data
  • Most common underlying distribution
    Gaussian/Normal Distribution/Bell-shaped
  • Examples z-test, Students t-test, ANOVA
  • Non-parametric statistical tests
  • Makes no assumption on the underlying
    distribution
  • Chi-squared test, Wilcoxon signed-rank test,
    Mann-Whitney-Wilcoxon test

6
Two-tailed v.s. One-tailed test
  • Two-tailed
  • Only concerns whether value of test statistics
    fall outside the distribution
  • One-tailed (Left-/Right-tailed)
  • Concerns whether the value of test statistics
    fall into a specified tail
  • Level of significance/critical value halved for
    two-tailed (Z- or t-) tests

7
Wilcoxon Rank Sum Test
  • A non-parametric statistical test
  • No assumption on the form of distribution (unlike
    z-test, t-test which rely on normal distribution)
  • Applied on a small set of data
  • Details and examples covered in the lecture

8
Small sample t-test
  • A parametric statistical test
  • Formula
  • Here S1, S2 are the variances calculated from the
    sample
  • X1, X2 are the means
  • n1, n2 are the number of samples
  • We call n1n2-2 the degree of freedom

9
Going back to the example
  • Select level of significance a0.1, t0.051.812
    the null hypothesis H0 is rejected if tlt-1.812 or
    tgt1.812
  • To say they are different, a two-tailed, two
    sample t test can be applied
  • However t lies between -1.812 and 1.812
  • Even testing with a0.1, 0.05, 0.02, 0.0125,
    0.01 t is within ta
  • We conclude that they are the same with 99
    confidence

10
Going back to the example
  • Say it is discovered that the stopwatch measuring
    Jerrys time is always 1s lagging than the true
    value
  • Let H0 Jerry does not run faster than Tom does
  • All values does not change except the mean time
    of Jerry is 1s faster
  • Using a0.05, t0.051.812 lt 1.932 t
  • t is outside the interval (-1.812, 1.812)
  • We rejects the null hypothesis and asserts that
    Jerry runs faster than Tom does with 95
    confidence.

11
Statistical tests for evaluation of HCI
  • Statistical analysis required for empirical
    methods of an HCI
  • Gathered quantitative data
  • Satisfaction on Likert scale
  • Time to complete a certain task
  • Error rate reported
  • Purpose
  • Compare between alternatives
  • Show improvement over the previous version

12
Example - Usability Measurement between
Alternative Designs
  • A bank wants to adopt a new design for ATM
    interface
  • A straightforward comparison is the time take to
    complete an operation
  • Assuming data gathered from a testing ground
    where volunteer clients tried out the system

13
Example - Usability Measurement between
Alternative Designs
  • Design A
  • n15, mean time121s, variance23s
  • Design B
  • n16, mean time115s, variance40s
  • Degree of freedom 1516-2 29

14
Example - Usability Measurement between
Alternative Designs
  • Case I
  • H0 Time to complete an operation on both design
    are the same
  • This is a two-tailed test
  • Set a0.01, checking the table when degree of
    freedom29, a0.0052.756
  • 4.238gt2.756, H0 is rejected
  • The time to complete an operation on the two
    designs are different with 99 confidence

15
Example - Usability Measurement between
Alternative Designs
  • Case II
  • H0 Time to complete an operation on design A is
    no longer than that on design B
  • This is a right-tailed test
  • Set a0.01, checking the table when degree of
    freedom29, a0.012.462
  • 4.238gt2.462, H0 is rejected
  • The time to complete an operation is longer on
    design A than that on design B with 99
    confidence

16
Example - Usability Measurement between
Alternative Designs
  • Case III
  • H0 Time to complete an operation on design B is
    no shorter than that on design A
  • This is a left-tailed test
  • Set a0.01, checking the table when degree of
    freedom29, a0.012.462
  • -4.238lt-2.462, H0 is rejected
  • The time to complete an operation is shorter on
    design B than that on design A with 99
    confidence

17
Reference
  • Miller Freunds Probability and Statistics for
    Engineer, 7th edition, R. A. Johnson, Prentice
    Hall
  • t-test table
  • http//www.eridlc.com/onlinetextbook/index.cfm?fus
    eactiontextbook.appendixFileNameTable3
Write a Comment
User Comments (0)
About PowerShow.com