EvaluateQuantitative Methods

About This Presentation

Title:

EvaluateQuantitative Methods

Description:

Picking from list of names to invite to use facebook application. Bryan Tsao. Christine Robson ... http://www.npr.org/templates/story/story.php?storyId ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 69

Provided by: IBMU303

Category:

more less

Transcript and Presenter's Notes

Title: EvaluateQuantitative Methods

1
EvaluateQuantitative Methods

October 4, 2007
Turn in Project Proposal

2
Today

Quantitative methods
Scientific method
Aim for generalizable results
Privacy issues when collecting data

3
Quantitative methods

Reliably measure some aspect of interface
Especially to measurably compare
Approaches
Controlled experiments
Doing Psychology Experiments David W.
Martin, 7th edition, 2007
Collect usage data

4
Designing an experiment

State hypothesis
Identify variables
Independent
Dependent
Design experimental protocol
Apply for human subjects review
Select user population
Conduct experiment

5
Conducting experiment

Run pilot test
Collect data from running experiment
Perform statistical analysis
Interpret data, draw conclusions

6
Experiment hypothesis

Testable hypothesis
Precise statement of expected outcome
More specifically, how you predict the dependent
variable (i.e., accuracy) will depend on the
independent variable(s)
Null hypothesis (Ho)
Stating that there will be no effect
e.g., There will be no difference in
performance between the two groups
Data used to try to disprove this null hypothesis

7
Experiment design

Independent variables
Attributes we manipulate / vary in condition
Levels, value of attribute
Dependent variables
Outcome of experiment, measures to evaluate
Usually measure user performance
Time to completion
Errors
Amount of production
Measures of satisfaction

8
Experiment design (2)

Control variables
Attributes that remain the same across conditions
Random variables
Attributes that are randomly sampled
Can be used to increase generalizability
Avoiding confounds
Confounds are attributes that changed but were
not accounted for
Confounds prevent drawing conclusions on
independent variables

9
Example Person picker

Picking from list of names to invite to use
facebook application

Bryan Tsao Christine Robson David Sun John
Tang Jonathan Tong
Bryan Tsao Christine Robson David Sun John
Tang Jonathan Tong
10
Example Variables

Independent variables
Picture vs. no picture
Ordered horizontally or vertically
One column vs. 2 column
Dependent variables
Time to complete
Error rate
User perception
Control variables
Test setting
List to pick from
Random variables
Subject demographics
Confound
Only one woman in list
List mostly Asians

11
Experimental design goals

Internal validity
Cause and effect That change in independent
variables ? change in dependent variables
Eliminating confounds (turn them into independent
variables or random variables)
Replicability of experiment
External validity
Results generalizable to other settings
Ecological validitygeneralizable to the
real-world
Confidence in results
Statistical power (number of subjects, at least
10)

12
Experimental protocol

Defining the task(s)
What are all the combinations of conditions?
How often to repeat each condition combination?
Between-subjects or within-subjects?
Avoiding bias (instructions, ordering)

13
Task

Defining task to test hypothesis
Pictures will lead to less errors
Same time to pick users with and without pictures
(Ho)
Pictures will lead to higher satisfaction
How do you present the task?
Task Users must select the following list of
people to share application with
Jonathan Tong
Christine Robson
David Sun

14
Motivating user tasks

Create scenario, movie plot for task
Immerse subject in story that removes them from
user testing situation
Focus subject on goal, system becomes tool (and
more subject to critique)

15
Number of Conditions

Consider all combinations to isolate effects of
each independent variable
(2 order) (2 columns) (2 format) 8
Horizontal, 1 column picture text
Horizontal, 2 column picture text
Vertical, 1 column picture text
Vertical, 2 column picture text
Horizontal, 1 column text only
Horizontal, 2 column text only
Vertical, 1 column text only
Vertical, 2 column text only
Adding levels or factors ? exponential
combinations
This can make experiments expensive!

16
Reducing conditions

Vary only one independent variable at a time
But can miss interactions
Factor experiment into series of steps
Prune branches if no significant effects found

17
Choosing subjects

Balance sample reflecting diversity of target
user population (random variable)
Novices, experts
Age group
Gender
Culture
Example
30 college-age, normal vision or corrected to
normal, with demographic distributions of gender,
culture

18
Population as variable

Population as an independent variable
Identifies interactions
Adds conditions
Population as controlled variable
Consistency across experiment
Misses relevant features
Statistical post-hoc analysis can suggest need
for further study
Collect all the relevant demographic info

19
Recruiting participants

Subject pools
Volunteers
Paid participants
Students (e.g., psych undergrads) for course
credit
Friends, acquaintances, family, lab members
Public space participants - e.g., observing
people walking through a museum
Must fit user population (validity)
Motivation is a big factor - not only but also
explaining the importance of the research

20
Current events Population sampling issue

Currently, election polling conducted on
land-line phones
Legacy
Laws about manual dialing of cell phones
Higher refusal rates
Cell phone users pay for incoming phone
calls?have to compensate recipients
What bias is introduced by excluding cell phone
only users?
7 of population (2004), growing to 15 (2008)
Which candidate claims polls underrepresent?

http//www.npr.org/templates/story/story.php?story
Id14863373
21
Between subjects design

Different groups of subjects use different
designs
15 subjects use text only
15 subjects use text pictures

22
Within subjects design

All subjects try all conditions

15 subjects
15 subjects
23
Within Subjects Designs

More efficient
Each subject gives you more data - they complete
more blocks or sessions
More statistical power
Each person is their own control, less confounds
Therefore, can require fewer participants
May mean more complicated design to avoid order
effects
Participant may learn from first condition
Fatigue may make second performance worse

24
Between Subjects Designs

Fewer order effects
Simpler design analysis
Easier to recruit participants (only one session,
shorter time)
Subjects cant compare across conditions
Need more subjects
Control more for confounds

25
Within Subjects Ordering effects

Countering order effects
Equivalent tasks (less sensitive to learning)
Randomize order of conditions (random variable)
Counterbalance ordering (ensure all orderings
covered)
Latin Square ordering (partial counterbalancing)

26
Study setting

Lab setting
Complete control through isolation
Uniformity across subjects
Field study
Ecological validity
Variations across subjects

27
Before Study

Always pilot test first
Reveals unexpected problems
Cant change experiment design after collecting
data
Make sure they know you are testing software, not
them
(Usability testing, not User testing)
Maintain privacy
Explain procedures without compromising results
Can quit anytime
Administer signed consent form

28
During Study

Always follow same stepsuse checklist
Make sure participant is comfortable
Session should not be too long
Maintain relaxed atmosphere
Never indicate displeasure or anger

29
After Study

State how session will help you improve system
(debriefing)
Show participant how to perform failed tasks
Dont compromise privacy (never identify people,
only show videos with explicit permission)
Data to be stored anonymously, securely, and/or
destroyed

30
Exercise Quantitative test

Pair up with someone who has computer, downloaded
the files
DO NOT OPEN THE FILE (yet)
Make sure one of you has a stopwatch
Cell phone
Watch
Computer user will run test, observer will time
event

31
Exercise Task

Open the file
Find the item in the list
Highlight that entry like this

32
Example Variables

Independent variables
Dependent variables
Control variables
Random variables
Confound

33
Data Inspection

Look at the results
First look at each participants data
Were there outliers, people who fell asleep,
anyone who tried to mess up the study, etc.?
Then look at aggregate results and
descriptive statistics
What happened in this study? relative to
hypothesis, goals

34
Descriptive Statistics

For all variables, get a feel for results
Total scores, times, ratings, etc.
Minimum, maximum
Mean, median, ranges, etc.

What is the difference between mean median? Why
use one or the other?

e.g. Twenty participants completed both
sessions (10 males, 10 females mean age 22.4,
range 18-37 years).
e.g. The median time to complete the task in
the mouse-input group was 34.5 s (min19.2,
max305 s).

35
Subgroup Stats

Look at descriptive stats (means, medians,
ranges, etc.) for any subgroups
e.g. The mean error rate for the mouse-input
group was 3.4. The mean error rate for the
keyboard group was 5.6.
e.g. The median completion time (in seconds)
for the three groups were novices 4.4, moderate
users 4.6, and experts 2.6.

36
Plot the Data

Look for the trends graphically

37
Other Presentation Methods
Scatter plot
Box plot
Middle 50
Age
low
high
Mean
0
20
Time in secs.
38
Experimental Results

How does one know if an experiments results mean
anything or confirm any beliefs?
Example 40 people participated, 28 preferred
interface 1, 12 preferred interface 2
What do you conclude?

39
Inferential (Diagnostic) Stats

Tests to determine if what you see in the data
(e.g., differences in the means) are reliable
(replicable), and if they are likely caused by
the independent variables, and not due to random
effects
e.g. t-test to compare two means
e.g. ANOVA (Analysis of Variance) to compare
several means
e.g. test significance level of a correlation
between two variables

40
Means Not Always Perfect
Experiment 1 Group 1 Group 2 Mean 7
Mean 10 1,10,10 3,6,21
Experiment 2 Group 1 Group 2 Mean 7
Mean 10 6,7,8 8,11,11
41
Inferential Stats and the Data

Ask diagnostic questions about the data

Are these really different? What would that mean?
42
Hypothesis Testing

Going back to the hypothesiswhat do the data
say?
Translate hypothesis into expected difference in
measure
If First name is faster, then
TimeFirst lt TimeLast
If null hypothesis there should be no
difference between the completion times
H0 TimeFirst TimeLast

43
Hypothesis Testing

Significance level (p)
The probability that your hypothesis was wrong,
simply by chance
The cutoff or threshold level of p (alpha
level) is often set at 0.05, or 5 of the time
youll get the result you saw, just by chance
e.g. If your statistical t-test (testing the
difference between two means) returns a t-value
of t4.5, and a p-value of p.01, the difference
between the means is statistically significant

44
Errors

Errors in analysis do occur
Main Types
Type I/False positive - You conclude there is a
difference, when in fact there isnt
Type II/False negative - You conclude there is no
different when there is

45
Drawing Conclusions

Make your conclusions based on the descriptive
stats, but back them up with inferential stats
e.g., The expert group performed faster than
the novice group t(1,34) 4.6, p lt .01.
Translate the stats into words that regular
people can understand
e.g., Thus, those who have computer experience
will be able to perform better, right from the
beginning

46
Feeding Back Into Design

Your study was designed to yield information you
can use to redesign your interface
What were the conclusions you reached?
How can you improve on the design?
What are quantitative redesign benefits?
e.g. 2 minutes saved per transaction, 24
increase in production, or 45,000,000 per year
in increased profit
What are qualitative, less tangible benefit(s)?
e.g. workers will be less bored, less tired, and
therefore more interested --gt better customer
service

47
Remote usability testing

Telephone or video communication
Screen-sharing technology
Microsoft NetMeeting
https//www.microsoft.com/downloads/details.aspx?F
amilyID26c9da7c-f778-4422-a6f4-efb8abba021eDispl
ayLangen
VNC
http//www.realvnc.com/
Greater flexibility in recruiting subjects,
environments

48
Usage logging

Embed logging mechanisms into code
Study usage in actual deployment
Some code can even phone home
facebook usage metrics

49
Example Rhythmic Work Activity

Drawn from about 50 Awarenex (IM) users
Bi-coastal teams (3-hour time difference)
Work from home team members
Based on up to 2 years of collected data
Sun Microsystems Laboratories James "Bo" Begole,
Randall Smith, and Nicole Yankelovich

50
Activity Data Collected

Activity information
Input device activity (1-minute granularity)
Device location (office, home, mobile)
Email fetching and sending
Online calendar appointments
Activity ? Availability

51
Time of Day
Date
Computer Activity
Actogram of an Individual's Computer Activity
52
Computer Activity
Aggregate Activity
T
53
Appointment
Computer Activity
Aggregate Activity with Appointments
T
54
Comparing Aggregates Among 3 Individuals
a.
b.
c.
55
Project deployment issues

May have to be careful about widespread
deployment of application
Were only looking for a usability study with 4
people
Widespread deployment would be cool
BUT, widespread deployment may run into
provisioning issues
Provide feedback on server provisioning

56
Quantitative study of your project

What are your measures?
Task measures, performance time, errors
Usage measures (facebook utilities)
Compute summary statistics
Discussion section
Identify independent, dependent, control variables

57
Privacy issues in collecting user data

Collecting data involves respecting users privacy

58
Informed consent

Legal condition whereby a person can be said to
have given consent based upon an appreciation and
understanding of the facts and implications of an
action
EULA?
But what about actions in public places?
What about recording in public places?

59
Consent

Why important?
People can be sensitive about this process and
issues
Errors will likely be made, participant may feel
inadequate
May be mentally or physically strenuous
What are the potential risks (there are always
risks)?
Vulnerable populations need special care
consideration
Children disabled pregnant students

60
Controlling data for privacy

What data is being collected?
How will the data be used?
How can I delete data?
Who will have access to the data?
How can I review data before public
presentations?
What if I have questions afterwards?

61
Contact info for questions
How data will be used
What activity observed
What data collected
How to Delete data
Who can access data
Review before show publicly
62
Human subjects review, participants, ethics

Academic, government research must go through
human subjects review process
Committee for Protection of Human Subjects
http//cphs.berkeley.edu/
Reviews all research involving human (or animal)
participants
Safeguarding the participants, and thereby the
researcher and university
Not a science review (i.e., not to assess your
research ideas) only safety ethics
Complete Web-based forms, submit research
summary, sample consent forms, training, etc.
Practices in industry vary

63
(No Transcript)
64
(No Transcript)
65
The participants perspective

User testing can be intimidating
Pressure to perform, please observer
Fear of embarassment
Fear of critiquing (cultural)
You must remain unbiased and inviting
More tips in the Conducting the Test reading by
Rubin

66
Ethics

Testing can be arduous
Each participant should consent to be in
experiment (informal or formal)
Know what experiment involves, what to expect,
what the potential risks are
Must be able to stop without danger or penalty
All participants to be treated with respect

67
Assignment Storyboard Implementation

Create storyboard for main tasks of application
Test with at least one non-CS160 user
Reflect on what you learned
How will you change interface?
Implement initiation of facebook application and
database

EvaluateQuantitative Methods - PowerPoint PPT Presentation

EvaluateQuantitative Methods

Picking from list of names to invite to use facebook application. Bryan Tsao. Christine Robson ... http://www.npr.org/templates/story/story.php?storyId ... – PowerPoint PPT presentation