CS 160: Lecture 16 - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

CS 160: Lecture 16

Description:

... is a good incentive e.g. subjects get a 1/n chance to receive an MP3 player. ... is the probability that it happens by chance if the relation does not ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 42

Provided by: can6

Learn more at: https://people.eecs.berkeley.edu

Category:

Tags: lecture

more less

Transcript and Presenter's Notes

Title: CS 160: Lecture 16

1
CS 160 Lecture 16

Professor John Canny
Fall 2004

2
Outline

Basics of quantitative methods
Random variables, probabilities, distributions
Review of statistics
Collecting data
Analyzing the data

3
Incentives

Paying every subject a fixed amount can be too
expensive.
A small chance at a large reward is a good
incentive e.g. subjects get a 1/n chance to
receive an MP3 player.
Offer to tell them the results ofthe study.
Free software for software companies.
Some courses require participation asresearch
subjects.

4
Qualitative vs. Quantitative Studies

Qualitative What weve been doing so far
Contextual Inquiry trying to understand users
tasks and their conceptual model.
Usability Studies looking for critical incidents
in a user interface.
In general, we use qualitative methods to
Understand whats going on, look for problems, or
get a rough idea of the usability of an
interface.

5
Qualitative vs. Quantitative Studies

Quantitative
Use to reliably measure something.
Requires us to know how to measure it.
Examples
Time to complete a task.
Average number of errors on a task.
Users ratings of an interface
Ease of use, elegance, performance, robustness,
speed,
- You could argue that users perception of
speed, error rates etc is more important than
their actual values.

6
Quantitative Methods

Very often, we want to compare values for two
different designs (which is faster?). This
requires some statistical methods.
We begin by defining some quantities to measure -
variables.

7
Random variables

Random variables take on different values
according to a probability distribution.
E.g. X ? 1, 2, 3 is a discrete random variable.
To characterize the variable, we need to define
the probabilities
PrX1 PrX2 ¼, PrX3 ½

8
Random variables

Given PrX1 PrX2 ¼, PrX3 ½ we can
also represent the distribution with a graph

½
¼
1
2
3
9
Continuous Random variables

Some random variables take on continuous values,
e.g. Y ? -1,1.
The probability must be defined by a probability
density function (pdf).
E.g. p(Y) ¾ (1 Y2)
Note that the areaunder the curve is the total
probability,which is 1.

¾
1
-1
10
Continuous Random variables

The area between two values gives the probability
that the value of the variable lies in that
range.
i.e. Pra lt Y lt b

¾
a
b
1
-1
11
Meaning of the distribution

The limit of the area as the range a,b goes to
zero gives the value of p(Y)Pra lt Y lt adY
p(Y) dY

¾
a
1
-1
12
CDF Cumulative Distribution

The CDF is the area under the distribution from
-? to some value v
So C(- ?) 0 and C(?) 1

-1
1
v
13
Mean and Variance

The mean is the expected value of the variable.
Its roughly the average value of the variable
over many trials.
Mean EY
In this case EY ½

¾
½
1
-1
14
Variance

Variance is the expected value of the square
difference from the mean. Its roughly the squared
width of the distribution.
VarY
Standard deviation is thesquare root of
variance.

¾
½
1
-1
15
Mean and Variance

What is the mean and variance for the following
distribution?

½
¼
2
4
3
16
Independent trials

For independent trials, both the mean and the
variances add. i.e. for r.v.s X and Y,
EXY EXEY
VarXY VarX VarY

17
Identical trials

For independent trials with the same mean and
variance
EX1 Xn n EX
VarX1 Xn n VarX
StdX1 Xn ?n StdX

18
Identical trials

As the number of trials increases, the ratio of
mean to std. deviation decreases.
i.e. the distribution narrows in a relative
sense.

19
Variable types

Independent Variables the ones you control
Aspects of the interface design
Characteristics of the testers
Discrete A, B or C
Continuous Time between clicks for double-click
Dependent variables the ones you measure
Time to complete tasks
Number of errors

20
Deciding on Data to Collect

Two types of data
process data
observations of what users are doing thinking
bottom-line data
summary of what happened (time, errors, success)
i.e., the dependent variables

21
Process Data vs. Bottom Line Data

Focus on process data first
gives good overview of where problems are
Bottom-line doesnt tell you where to fix
just says too slow, too many errors, etc.
Hard to get reliable bottom-line results
need many users for statistical significance

Break

23
Some statistics

Variables X Y
A relation (hypothesis) e.g. X gt Y
We would often like to know if a relation is true
e.g. X time taken by novice users
Y time taken by users with some training
To find out if the relation is true we do
experiments to get lots of xs and ys
(observations)
Suppose avg(x) gt avg(y), or that most of the xs
are larger than all of the ys. What does that
prove?

24
Significance

The significance or p-value of an outcome is the
probability that it happens by chance if the
relation does not hold.
E.g. p 0.05 means that there is a 1/20 chance
that the observation happens if the hypothesis is
false.
So the smaller the p-value, the greater the
significance.

25
Significance

For instance p 0.001 means there is a 1/1000
chance that the observation would happen if the
hypothesis is false. So the hypothesis is almost
surely true.
Significance increases with number of trials.
CAVEAT You have to make assumptions about the
probability distributions to get good p-values.
There is always an implied model of user
performance.

26
Normal distributions

Many variables have a Normal distribution (pdf)
At left is the density, right is the cumulative
prob.
Normal distributions are completely characterized
by their mean and variance (mean squared
deviation from the mean).

27
Normal distributions

The std. deviation for a normal distribution
occurs at about 60 of its value

One standard deviation
28
T-test

The T-test asks for the probability that EX gt
EY is false.
i.e. the null hypothesis for the T-test is
whether EX EY.
What is the probability of that given the
observations?

29
T-test

We actually ask for the probability that EX and
EY are at least as different as the observed
means.

X
Y
30
Analyzing the Numbers

Example prove that task 1 is faster on design A
than design B.
Suppose the average time for design B is 20
higher than A.
Suppose subjects times in the study have a std.
dev. which is 30 of their mean time (typical).
How many subjects are needed?

31
Analyzing the Numbers

Example prove that task 1 is faster on design A
than design B.
Suppose the average time for design B is 20
higher than A.
Suppose subjects times in the study have a std.
dev. which is 30 of their mean time (typical).
How many subjects are needed?
Need at least 13 subjects for significance p0.01
Need at least 22 subjects for significance
p0.001
(assumes subjects use both designs)

32
Analyzing the Numbers (cont.)

i.e. even with strong (20) difference, need lots
of subjects to prove it.
Usability test data is quite variable
4 times as many tests will only narrow range by
2x
breadth of range depends on sqrt of of test
users
This is when online methods become useful
easy to test w/ large numbers of users (e.g.,
Landays NetRaker system)

33
Lies, damn lies and statistics

A common mistake (made by famous HCI researchers
)
Increasing n, the number of trials, by running
each subject several times.
No! the analysis only works when trials are
independent.
All the trials for one subject are dependent,
because that subject may be faster/slower/less
error-prone than others.
- making this error will not help you become a
famous HCI researcher ?.

34
Statistics with care

What you can do to get better significance
Run each subject several times, compute the
average for each subject.
Run the analysis as usual on subjects average
times, with n number of subjects.
This decreases the per-subject variance, while
keeping data independent.

35
Measuring User Preference

How much users like or dislike the system
can ask them to rate on a scale of 1 to 10
or have them choose among statements
best UI Ive ever, better than average
hard to be sure what data will mean
novelty of UI, feelings, not realistic setting,
etc.
If many give you low ratings -gt trouble
Can get some useful data by asking
what they liked, disliked, where they had
trouble, best part, worst part, etc. (redundant
questions)

36
Using Subjects