EvaluateQuantitative Methods - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

EvaluateQuantitative Methods

Description:

Picking from list of names to invite to use facebook application. Bryan Tsao. Christine Robson ... http://www.npr.org/templates/story/story.php?storyId ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 69
Provided by: IBMU303
Category:

less

Transcript and Presenter's Notes

Title: EvaluateQuantitative Methods


1
EvaluateQuantitative Methods
  • October 4, 2007
  • Turn in Project Proposal

2
Today
  • Quantitative methods
  • Scientific method
  • Aim for generalizable results
  • Privacy issues when collecting data

3
Quantitative methods
  • Reliably measure some aspect of interface
  • Especially to measurably compare
  • Approaches
  • Controlled experiments
  • Doing Psychology Experiments David W.
    Martin, 7th edition, 2007
  • Collect usage data

4
Designing an experiment
  • State hypothesis
  • Identify variables
  • Independent
  • Dependent
  • Design experimental protocol
  • Apply for human subjects review
  • Select user population
  • Conduct experiment

5
Conducting experiment
  • Run pilot test
  • Collect data from running experiment
  • Perform statistical analysis
  • Interpret data, draw conclusions

6
Experiment hypothesis
  • Testable hypothesis
  • Precise statement of expected outcome
  • More specifically, how you predict the dependent
    variable (i.e., accuracy) will depend on the
    independent variable(s)
  • Null hypothesis (Ho)
  • Stating that there will be no effect
  • e.g., There will be no difference in
    performance between the two groups
  • Data used to try to disprove this null hypothesis

7
Experiment design
  • Independent variables
  • Attributes we manipulate / vary in condition
  • Levels, value of attribute
  • Dependent variables
  • Outcome of experiment, measures to evaluate
  • Usually measure user performance
  • Time to completion
  • Errors
  • Amount of production
  • Measures of satisfaction

8
Experiment design (2)
  • Control variables
  • Attributes that remain the same across conditions
  • Random variables
  • Attributes that are randomly sampled
  • Can be used to increase generalizability
  • Avoiding confounds
  • Confounds are attributes that changed but were
    not accounted for
  • Confounds prevent drawing conclusions on
    independent variables

9
Example Person picker
  • Picking from list of names to invite to use
    facebook application

Bryan Tsao Christine Robson David Sun John
Tang Jonathan Tong
Bryan Tsao Christine Robson David Sun John
Tang Jonathan Tong
10
Example Variables
  • Independent variables
  • Picture vs. no picture
  • Ordered horizontally or vertically
  • One column vs. 2 column
  • Dependent variables
  • Time to complete
  • Error rate
  • User perception
  • Control variables
  • Test setting
  • List to pick from
  • Random variables
  • Subject demographics
  • Confound
  • Only one woman in list
  • List mostly Asians

11
Experimental design goals
  • Internal validity
  • Cause and effect That change in independent
    variables ? change in dependent variables
  • Eliminating confounds (turn them into independent
    variables or random variables)
  • Replicability of experiment
  • External validity
  • Results generalizable to other settings
  • Ecological validitygeneralizable to the
    real-world
  • Confidence in results
  • Statistical power (number of subjects, at least
    10)

12
Experimental protocol
  • Defining the task(s)
  • What are all the combinations of conditions?
  • How often to repeat each condition combination?
  • Between-subjects or within-subjects?
  • Avoiding bias (instructions, ordering)

13
Task
  • Defining task to test hypothesis
  • Pictures will lead to less errors
  • Same time to pick users with and without pictures
    (Ho)
  • Pictures will lead to higher satisfaction
  • How do you present the task?
  • Task Users must select the following list of
    people to share application with
  • Jonathan Tong
  • Christine Robson
  • David Sun

14
Motivating user tasks
  • Create scenario, movie plot for task
  • Immerse subject in story that removes them from
    user testing situation
  • Focus subject on goal, system becomes tool (and
    more subject to critique)

15
Number of Conditions
  • Consider all combinations to isolate effects of
    each independent variable
  • (2 order) (2 columns) (2 format) 8
  • Horizontal, 1 column picture text
  • Horizontal, 2 column picture text
  • Vertical, 1 column picture text
  • Vertical, 2 column picture text
  • Horizontal, 1 column text only
  • Horizontal, 2 column text only
  • Vertical, 1 column text only
  • Vertical, 2 column text only
  • Adding levels or factors ? exponential
    combinations
  • This can make experiments expensive!

16
Reducing conditions
  • Vary only one independent variable at a time
  • But can miss interactions
  • Factor experiment into series of steps
  • Prune branches if no significant effects found

17
Choosing subjects
  • Balance sample reflecting diversity of target
    user population (random variable)
  • Novices, experts
  • Age group
  • Gender
  • Culture
  • Example
  • 30 college-age, normal vision or corrected to
    normal, with demographic distributions of gender,
    culture

18
Population as variable
  • Population as an independent variable
  • Identifies interactions
  • Adds conditions
  • Population as controlled variable
  • Consistency across experiment
  • Misses relevant features
  • Statistical post-hoc analysis can suggest need
    for further study
  • Collect all the relevant demographic info

19
Recruiting participants
  • Subject pools
  • Volunteers
  • Paid participants
  • Students (e.g., psych undergrads) for course
    credit
  • Friends, acquaintances, family, lab members
  • Public space participants - e.g., observing
    people walking through a museum
  • Must fit user population (validity)
  • Motivation is a big factor - not only but also
    explaining the importance of the research

20
Current events Population sampling issue
  • Currently, election polling conducted on
    land-line phones
  • Legacy
  • Laws about manual dialing of cell phones
  • Higher refusal rates
  • Cell phone users pay for incoming phone
    calls?have to compensate recipients
  • What bias is introduced by excluding cell phone
    only users?
  • 7 of population (2004), growing to 15 (2008)
  • Which candidate claims polls underrepresent?

http//www.npr.org/templates/story/story.php?story
Id14863373
21
Between subjects design
  • Different groups of subjects use different
    designs
  • 15 subjects use text only
  • 15 subjects use text pictures

22
Within subjects design
  • All subjects try all conditions

15 subjects
15 subjects
23
Within Subjects Designs
  • More efficient
  • Each subject gives you more data - they complete
    more blocks or sessions
  • More statistical power
  • Each person is their own control, less confounds
  • Therefore, can require fewer participants
  • May mean more complicated design to avoid order
    effects
  • Participant may learn from first condition
  • Fatigue may make second performance worse

24
Between Subjects Designs
  • Fewer order effects
  • Simpler design analysis
  • Easier to recruit participants (only one session,
    shorter time)
  • Subjects cant compare across conditions
  • Need more subjects
  • Control more for confounds

25
Within Subjects Ordering effects
  • Countering order effects
  • Equivalent tasks (less sensitive to learning)
  • Randomize order of conditions (random variable)
  • Counterbalance ordering (ensure all orderings
    covered)
  • Latin Square ordering (partial counterbalancing)

26
Study setting
  • Lab setting
  • Complete control through isolation
  • Uniformity across subjects
  • Field study
  • Ecological validity
  • Variations across subjects

27
Before Study
  • Always pilot test first
  • Reveals unexpected problems
  • Cant change experiment design after collecting
    data
  • Make sure they know you are testing software, not
    them
  • (Usability testing, not User testing)
  • Maintain privacy
  • Explain procedures without compromising results
  • Can quit anytime
  • Administer signed consent form

28
During Study
  • Always follow same stepsuse checklist
  • Make sure participant is comfortable
  • Session should not be too long
  • Maintain relaxed atmosphere
  • Never indicate displeasure or anger

29
After Study
  • State how session will help you improve system
    (debriefing)
  • Show participant how to perform failed tasks
  • Dont compromise privacy (never identify people,
    only show videos with explicit permission)
  • Data to be stored anonymously, securely, and/or
    destroyed

30
Exercise Quantitative test
  • Pair up with someone who has computer, downloaded
    the files
  • DO NOT OPEN THE FILE (yet)
  • Make sure one of you has a stopwatch
  • Cell phone
  • Watch
  • Computer user will run test, observer will time
    event

31
Exercise Task
  • Open the file
  • Find the item in the list
  • Highlight that entry like this

32
Example Variables
  • Independent variables
  • Dependent variables
  • Control variables
  • Random variables
  • Confound

33
Data Inspection
  • Look at the results
  • First look at each participants data
  • Were there outliers, people who fell asleep,
    anyone who tried to mess up the study, etc.?
  • Then look at aggregate results and
    descriptive statistics
  • What happened in this study? relative to
    hypothesis, goals

34
Descriptive Statistics
  • For all variables, get a feel for results
  • Total scores, times, ratings, etc.
  • Minimum, maximum
  • Mean, median, ranges, etc.

What is the difference between mean median? Why
use one or the other?
  • e.g. Twenty participants completed both
    sessions (10 males, 10 females mean age 22.4,
    range 18-37 years).
  • e.g. The median time to complete the task in
    the mouse-input group was 34.5 s (min19.2,
    max305 s).

35
Subgroup Stats
  • Look at descriptive stats (means, medians,
    ranges, etc.) for any subgroups
  • e.g. The mean error rate for the mouse-input
    group was 3.4. The mean error rate for the
    keyboard group was 5.6.
  • e.g. The median completion time (in seconds)
    for the three groups were novices 4.4, moderate
    users 4.6, and experts 2.6.

36
Plot the Data
  • Look for the trends graphically

37
Other Presentation Methods
Scatter plot
Box plot
Middle 50
Age
low
high
Mean
0
20
Time in secs.
38
Experimental Results
  • How does one know if an experiments results mean
    anything or confirm any beliefs?
  • Example 40 people participated, 28 preferred
    interface 1, 12 preferred interface 2
  • What do you conclude?

39
Inferential (Diagnostic) Stats
  • Tests to determine if what you see in the data
    (e.g., differences in the means) are reliable
    (replicable), and if they are likely caused by
    the independent variables, and not due to random
    effects
  • e.g. t-test to compare two means
  • e.g. ANOVA (Analysis of Variance) to compare
    several means
  • e.g. test significance level of a correlation
    between two variables

40
Means Not Always Perfect
Experiment 1 Group 1 Group 2 Mean 7
Mean 10 1,10,10 3,6,21
Experiment 2 Group 1 Group 2 Mean 7
Mean 10 6,7,8 8,11,11
41
Inferential Stats and the Data
  • Ask diagnostic questions about the data

Are these really different? What would that mean?
42
Hypothesis Testing
  • Going back to the hypothesiswhat do the data
    say?
  • Translate hypothesis into expected difference in
    measure
  • If First name is faster, then
  • TimeFirst lt TimeLast
  • If null hypothesis there should be no
    difference between the completion times
  • H0 TimeFirst TimeLast

43
Hypothesis Testing
  • Significance level (p)
  • The probability that your hypothesis was wrong,
    simply by chance
  • The cutoff or threshold level of p (alpha
    level) is often set at 0.05, or 5 of the time
    youll get the result you saw, just by chance
  • e.g. If your statistical t-test (testing the
    difference between two means) returns a t-value
    of t4.5, and a p-value of p.01, the difference
    between the means is statistically significant

44
Errors
  • Errors in analysis do occur
  • Main Types
  • Type I/False positive - You conclude there is a
    difference, when in fact there isnt
  • Type II/False negative - You conclude there is no
    different when there is

45
Drawing Conclusions
  • Make your conclusions based on the descriptive
    stats, but back them up with inferential stats
  • e.g., The expert group performed faster than
    the novice group t(1,34) 4.6, p lt .01.
  • Translate the stats into words that regular
    people can understand
  • e.g., Thus, those who have computer experience
    will be able to perform better, right from the
    beginning

46
Feeding Back Into Design
  • Your study was designed to yield information you
    can use to redesign your interface
  • What were the conclusions you reached?
  • How can you improve on the design?
  • What are quantitative redesign benefits?
  • e.g. 2 minutes saved per transaction, 24
    increase in production, or 45,000,000 per year
    in increased profit
  • What are qualitative, less tangible benefit(s)?
  • e.g. workers will be less bored, less tired, and
    therefore more interested --gt better customer
    service

47
Remote usability testing
  • Telephone or video communication
  • Screen-sharing technology
  • Microsoft NetMeeting
  • https//www.microsoft.com/downloads/details.aspx?F
    amilyID26c9da7c-f778-4422-a6f4-efb8abba021eDispl
    ayLangen
  • VNC
  • http//www.realvnc.com/
  • Greater flexibility in recruiting subjects,
    environments

48
Usage logging
  • Embed logging mechanisms into code
  • Study usage in actual deployment
  • Some code can even phone home
  • facebook usage metrics

49
Example Rhythmic Work Activity
  • Drawn from about 50 Awarenex (IM) users
  • Bi-coastal teams (3-hour time difference)
  • Work from home team members
  • Based on up to 2 years of collected data
  • Sun Microsystems Laboratories James "Bo" Begole,
    Randall Smith, and Nicole Yankelovich

50
Activity Data Collected
  • Activity information
  • Input device activity (1-minute granularity)
  • Device location (office, home, mobile)
  • Email fetching and sending
  • Online calendar appointments
  • Activity ? Availability

51
Time of Day
Date
Computer Activity
Actogram of an Individual's Computer Activity
52
Computer Activity
Aggregate Activity
T
53
Appointment
Computer Activity
Aggregate Activity with Appointments
T
54
Comparing Aggregates Among 3 Individuals
a.
b.
c.
55
Project deployment issues
  • May have to be careful about widespread
    deployment of application
  • Were only looking for a usability study with 4
    people
  • Widespread deployment would be cool
  • BUT, widespread deployment may run into
    provisioning issues
  • Provide feedback on server provisioning

56
Quantitative study of your project
  • What are your measures?
  • Task measures, performance time, errors
  • Usage measures (facebook utilities)
  • Compute summary statistics
  • Discussion section
  • Identify independent, dependent, control variables

57
Privacy issues in collecting user data
  • Collecting data involves respecting users privacy

58
Informed consent
  • Legal condition whereby a person can be said to
    have given consent based upon an appreciation and
    understanding of the facts and implications of an
    action
  • EULA?
  • But what about actions in public places?
  • What about recording in public places?

59
Consent
  • Why important?
  • People can be sensitive about this process and
    issues
  • Errors will likely be made, participant may feel
    inadequate
  • May be mentally or physically strenuous
  • What are the potential risks (there are always
    risks)?
  • Vulnerable populations need special care
    consideration
  • Children disabled pregnant students

60
Controlling data for privacy
  • What data is being collected?
  • How will the data be used?
  • How can I delete data?
  • Who will have access to the data?
  • How can I review data before public
    presentations?
  • What if I have questions afterwards?

61
Contact info for questions
How data will be used
What activity observed
What data collected
How to Delete data
Who can access data
Review before show publicly
62
Human subjects review, participants, ethics
  • Academic, government research must go through
    human subjects review process
  • Committee for Protection of Human Subjects
  • http//cphs.berkeley.edu/
  • Reviews all research involving human (or animal)
    participants
  • Safeguarding the participants, and thereby the
    researcher and university
  • Not a science review (i.e., not to assess your
    research ideas) only safety ethics
  • Complete Web-based forms, submit research
    summary, sample consent forms, training, etc.
  • Practices in industry vary

63
(No Transcript)
64
(No Transcript)
65
The participants perspective
  • User testing can be intimidating
  • Pressure to perform, please observer
  • Fear of embarassment
  • Fear of critiquing (cultural)
  • You must remain unbiased and inviting
  • More tips in the Conducting the Test reading by
    Rubin

66
Ethics
  • Testing can be arduous
  • Each participant should consent to be in
    experiment (informal or formal)
  • Know what experiment involves, what to expect,
    what the potential risks are
  • Must be able to stop without danger or penalty
  • All participants to be treated with respect

67
Assignment Storyboard Implementation
  • Create storyboard for main tasks of application
  • Test with at least one non-CS160 user
  • Reflect on what you learned
  • How will you change interface?
  • Implement initiation of facebook application and
    database

68
Next time
  • Lecture on implementinghardware, sensors
  • Tom Zimmerman, guest lecture
Write a Comment
User Comments (0)
About PowerShow.com