Title: EvaluateQuantitative Methods
1EvaluateQuantitative Methods
- October 4, 2007
- Turn in Project Proposal
2Today
- Quantitative methods
- Scientific method
- Aim for generalizable results
- Privacy issues when collecting data
3Quantitative methods
- Reliably measure some aspect of interface
- Especially to measurably compare
- Approaches
- Controlled experiments
- Doing Psychology Experiments David W.
Martin, 7th edition, 2007 - Collect usage data
4Designing an experiment
- State hypothesis
- Identify variables
- Independent
- Dependent
- Design experimental protocol
- Apply for human subjects review
- Select user population
- Conduct experiment
5Conducting experiment
- Run pilot test
- Collect data from running experiment
- Perform statistical analysis
- Interpret data, draw conclusions
6Experiment hypothesis
- Testable hypothesis
- Precise statement of expected outcome
- More specifically, how you predict the dependent
variable (i.e., accuracy) will depend on the
independent variable(s) - Null hypothesis (Ho)
- Stating that there will be no effect
- e.g., There will be no difference in
performance between the two groups - Data used to try to disprove this null hypothesis
7Experiment design
- Independent variables
- Attributes we manipulate / vary in condition
- Levels, value of attribute
- Dependent variables
- Outcome of experiment, measures to evaluate
- Usually measure user performance
- Time to completion
- Errors
- Amount of production
- Measures of satisfaction
8Experiment design (2)
- Control variables
- Attributes that remain the same across conditions
- Random variables
- Attributes that are randomly sampled
- Can be used to increase generalizability
- Avoiding confounds
- Confounds are attributes that changed but were
not accounted for - Confounds prevent drawing conclusions on
independent variables
9Example Person picker
- Picking from list of names to invite to use
facebook application
Bryan Tsao Christine Robson David Sun John
Tang Jonathan Tong
Bryan Tsao Christine Robson David Sun John
Tang Jonathan Tong
10Example Variables
- Independent variables
- Picture vs. no picture
- Ordered horizontally or vertically
- One column vs. 2 column
- Dependent variables
- Time to complete
- Error rate
- User perception
- Control variables
- Test setting
- List to pick from
- Random variables
- Subject demographics
- Confound
- Only one woman in list
- List mostly Asians
11Experimental design goals
- Internal validity
- Cause and effect That change in independent
variables ? change in dependent variables - Eliminating confounds (turn them into independent
variables or random variables) - Replicability of experiment
- External validity
- Results generalizable to other settings
- Ecological validitygeneralizable to the
real-world - Confidence in results
- Statistical power (number of subjects, at least
10)
12Experimental protocol
- Defining the task(s)
- What are all the combinations of conditions?
- How often to repeat each condition combination?
- Between-subjects or within-subjects?
- Avoiding bias (instructions, ordering)
13Task
- Defining task to test hypothesis
- Pictures will lead to less errors
- Same time to pick users with and without pictures
(Ho) - Pictures will lead to higher satisfaction
- How do you present the task?
- Task Users must select the following list of
people to share application with - Jonathan Tong
- Christine Robson
- David Sun
14Motivating user tasks
- Create scenario, movie plot for task
- Immerse subject in story that removes them from
user testing situation - Focus subject on goal, system becomes tool (and
more subject to critique)
15Number of Conditions
- Consider all combinations to isolate effects of
each independent variable - (2 order) (2 columns) (2 format) 8
- Horizontal, 1 column picture text
- Horizontal, 2 column picture text
- Vertical, 1 column picture text
- Vertical, 2 column picture text
- Horizontal, 1 column text only
- Horizontal, 2 column text only
- Vertical, 1 column text only
- Vertical, 2 column text only
- Adding levels or factors ? exponential
combinations - This can make experiments expensive!
16Reducing conditions
- Vary only one independent variable at a time
- But can miss interactions
- Factor experiment into series of steps
- Prune branches if no significant effects found
17Choosing subjects
- Balance sample reflecting diversity of target
user population (random variable) - Novices, experts
- Age group
- Gender
- Culture
- Example
- 30 college-age, normal vision or corrected to
normal, with demographic distributions of gender,
culture
18Population as variable
- Population as an independent variable
- Identifies interactions
- Adds conditions
- Population as controlled variable
- Consistency across experiment
- Misses relevant features
- Statistical post-hoc analysis can suggest need
for further study - Collect all the relevant demographic info
19Recruiting participants
- Subject pools
- Volunteers
- Paid participants
- Students (e.g., psych undergrads) for course
credit - Friends, acquaintances, family, lab members
- Public space participants - e.g., observing
people walking through a museum - Must fit user population (validity)
- Motivation is a big factor - not only but also
explaining the importance of the research
20Current events Population sampling issue
- Currently, election polling conducted on
land-line phones - Legacy
- Laws about manual dialing of cell phones
- Higher refusal rates
- Cell phone users pay for incoming phone
calls?have to compensate recipients - What bias is introduced by excluding cell phone
only users? - 7 of population (2004), growing to 15 (2008)
- Which candidate claims polls underrepresent?
http//www.npr.org/templates/story/story.php?story
Id14863373
21Between subjects design
- Different groups of subjects use different
designs - 15 subjects use text only
- 15 subjects use text pictures
22Within subjects design
- All subjects try all conditions
15 subjects
15 subjects
23Within Subjects Designs
- More efficient
- Each subject gives you more data - they complete
more blocks or sessions - More statistical power
- Each person is their own control, less confounds
- Therefore, can require fewer participants
- May mean more complicated design to avoid order
effects - Participant may learn from first condition
- Fatigue may make second performance worse
24Between Subjects Designs
- Fewer order effects
- Simpler design analysis
- Easier to recruit participants (only one session,
shorter time) - Subjects cant compare across conditions
- Need more subjects
- Control more for confounds
25Within Subjects Ordering effects
- Countering order effects
- Equivalent tasks (less sensitive to learning)
- Randomize order of conditions (random variable)
- Counterbalance ordering (ensure all orderings
covered) - Latin Square ordering (partial counterbalancing)
26Study setting
- Lab setting
- Complete control through isolation
- Uniformity across subjects
- Field study
- Ecological validity
- Variations across subjects
27Before Study
- Always pilot test first
- Reveals unexpected problems
- Cant change experiment design after collecting
data - Make sure they know you are testing software, not
them - (Usability testing, not User testing)
- Maintain privacy
- Explain procedures without compromising results
- Can quit anytime
- Administer signed consent form
28During Study
- Always follow same stepsuse checklist
- Make sure participant is comfortable
- Session should not be too long
- Maintain relaxed atmosphere
- Never indicate displeasure or anger
29After Study
- State how session will help you improve system
(debriefing) - Show participant how to perform failed tasks
- Dont compromise privacy (never identify people,
only show videos with explicit permission) - Data to be stored anonymously, securely, and/or
destroyed
30Exercise Quantitative test
- Pair up with someone who has computer, downloaded
the files - DO NOT OPEN THE FILE (yet)
- Make sure one of you has a stopwatch
- Cell phone
- Watch
- Computer user will run test, observer will time
event
31Exercise Task
- Open the file
- Find the item in the list
- Highlight that entry like this
32Example Variables
- Independent variables
- Dependent variables
- Control variables
- Random variables
- Confound
33Data Inspection
- Look at the results
- First look at each participants data
- Were there outliers, people who fell asleep,
anyone who tried to mess up the study, etc.? - Then look at aggregate results and
descriptive statistics - What happened in this study? relative to
hypothesis, goals
34Descriptive Statistics
- For all variables, get a feel for results
- Total scores, times, ratings, etc.
- Minimum, maximum
- Mean, median, ranges, etc.
What is the difference between mean median? Why
use one or the other?
- e.g. Twenty participants completed both
sessions (10 males, 10 females mean age 22.4,
range 18-37 years). - e.g. The median time to complete the task in
the mouse-input group was 34.5 s (min19.2,
max305 s).
35Subgroup Stats
- Look at descriptive stats (means, medians,
ranges, etc.) for any subgroups - e.g. The mean error rate for the mouse-input
group was 3.4. The mean error rate for the
keyboard group was 5.6. - e.g. The median completion time (in seconds)
for the three groups were novices 4.4, moderate
users 4.6, and experts 2.6.
36Plot the Data
- Look for the trends graphically
37Other Presentation Methods
Scatter plot
Box plot
Middle 50
Age
low
high
Mean
0
20
Time in secs.
38Experimental Results
- How does one know if an experiments results mean
anything or confirm any beliefs? - Example 40 people participated, 28 preferred
interface 1, 12 preferred interface 2 - What do you conclude?
39Inferential (Diagnostic) Stats
- Tests to determine if what you see in the data
(e.g., differences in the means) are reliable
(replicable), and if they are likely caused by
the independent variables, and not due to random
effects - e.g. t-test to compare two means
- e.g. ANOVA (Analysis of Variance) to compare
several means - e.g. test significance level of a correlation
between two variables
40Means Not Always Perfect
Experiment 1 Group 1 Group 2 Mean 7
Mean 10 1,10,10 3,6,21
Experiment 2 Group 1 Group 2 Mean 7
Mean 10 6,7,8 8,11,11
41Inferential Stats and the Data
- Ask diagnostic questions about the data
Are these really different? What would that mean?
42Hypothesis Testing
- Going back to the hypothesiswhat do the data
say? - Translate hypothesis into expected difference in
measure - If First name is faster, then
- TimeFirst lt TimeLast
- If null hypothesis there should be no
difference between the completion times - H0 TimeFirst TimeLast
43Hypothesis Testing
- Significance level (p)
- The probability that your hypothesis was wrong,
simply by chance - The cutoff or threshold level of p (alpha
level) is often set at 0.05, or 5 of the time
youll get the result you saw, just by chance - e.g. If your statistical t-test (testing the
difference between two means) returns a t-value
of t4.5, and a p-value of p.01, the difference
between the means is statistically significant
44Errors
- Errors in analysis do occur
- Main Types
- Type I/False positive - You conclude there is a
difference, when in fact there isnt - Type II/False negative - You conclude there is no
different when there is
45Drawing Conclusions
- Make your conclusions based on the descriptive
stats, but back them up with inferential stats - e.g., The expert group performed faster than
the novice group t(1,34) 4.6, p lt .01. - Translate the stats into words that regular
people can understand - e.g., Thus, those who have computer experience
will be able to perform better, right from the
beginning
46Feeding Back Into Design
- Your study was designed to yield information you
can use to redesign your interface - What were the conclusions you reached?
- How can you improve on the design?
- What are quantitative redesign benefits?
- e.g. 2 minutes saved per transaction, 24
increase in production, or 45,000,000 per year
in increased profit - What are qualitative, less tangible benefit(s)?
- e.g. workers will be less bored, less tired, and
therefore more interested --gt better customer
service
47Remote usability testing
- Telephone or video communication
- Screen-sharing technology
- Microsoft NetMeeting
- https//www.microsoft.com/downloads/details.aspx?F
amilyID26c9da7c-f778-4422-a6f4-efb8abba021eDispl
ayLangen - VNC
- http//www.realvnc.com/
- Greater flexibility in recruiting subjects,
environments
48Usage logging
- Embed logging mechanisms into code
- Study usage in actual deployment
- Some code can even phone home
- facebook usage metrics
49Example Rhythmic Work Activity
- Drawn from about 50 Awarenex (IM) users
- Bi-coastal teams (3-hour time difference)
- Work from home team members
- Based on up to 2 years of collected data
- Sun Microsystems Laboratories James "Bo" Begole,
Randall Smith, and Nicole Yankelovich
50Activity Data Collected
- Activity information
- Input device activity (1-minute granularity)
- Device location (office, home, mobile)
- Email fetching and sending
- Online calendar appointments
- Activity ? Availability
51Time of Day
Date
Computer Activity
Actogram of an Individual's Computer Activity
52Computer Activity
Aggregate Activity
T
53Appointment
Computer Activity
Aggregate Activity with Appointments
T
54Comparing Aggregates Among 3 Individuals
a.
b.
c.
55Project deployment issues
- May have to be careful about widespread
deployment of application - Were only looking for a usability study with 4
people - Widespread deployment would be cool
- BUT, widespread deployment may run into
provisioning issues - Provide feedback on server provisioning
56Quantitative study of your project
- What are your measures?
- Task measures, performance time, errors
- Usage measures (facebook utilities)
- Compute summary statistics
- Discussion section
- Identify independent, dependent, control variables
57Privacy issues in collecting user data
- Collecting data involves respecting users privacy
58Informed consent
- Legal condition whereby a person can be said to
have given consent based upon an appreciation and
understanding of the facts and implications of an
action - EULA?
- But what about actions in public places?
- What about recording in public places?
59Consent
- Why important?
- People can be sensitive about this process and
issues - Errors will likely be made, participant may feel
inadequate - May be mentally or physically strenuous
- What are the potential risks (there are always
risks)? - Vulnerable populations need special care
consideration - Children disabled pregnant students
60Controlling data for privacy
- What data is being collected?
- How will the data be used?
- How can I delete data?
- Who will have access to the data?
- How can I review data before public
presentations? - What if I have questions afterwards?
61Contact info for questions
How data will be used
What activity observed
What data collected
How to Delete data
Who can access data
Review before show publicly
62Human subjects review, participants, ethics
- Academic, government research must go through
human subjects review process - Committee for Protection of Human Subjects
- http//cphs.berkeley.edu/
- Reviews all research involving human (or animal)
participants - Safeguarding the participants, and thereby the
researcher and university - Not a science review (i.e., not to assess your
research ideas) only safety ethics - Complete Web-based forms, submit research
summary, sample consent forms, training, etc. - Practices in industry vary
63(No Transcript)
64(No Transcript)
65The participants perspective
- User testing can be intimidating
- Pressure to perform, please observer
- Fear of embarassment
- Fear of critiquing (cultural)
- You must remain unbiased and inviting
- More tips in the Conducting the Test reading by
Rubin
66Ethics
- Testing can be arduous
- Each participant should consent to be in
experiment (informal or formal) - Know what experiment involves, what to expect,
what the potential risks are - Must be able to stop without danger or penalty
- All participants to be treated with respect
67Assignment Storyboard Implementation
- Create storyboard for main tasks of application
- Test with at least one non-CS160 user
- Reflect on what you learned
- How will you change interface?
- Implement initiation of facebook application and
database
68Next time
- Lecture on implementinghardware, sensors
- Tom Zimmerman, guest lecture