Experimental Design, Statistical Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Experimental Design, Statistical Analysis

Description:

... example, a study of the effect of font size on time-to-complete or error rates ... larger sample size - smaller standard error. Variance ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 20
Provided by: scie355
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Experimental Design, Statistical Analysis


1
Experimental Design, Statistical Analysis
  • CSCI 4800/6800
  • University of Georgia
  • Spring 2007
  • Eileen Kraemer

2
Terminology
  • empirical study
  • based on observations and measurements
  • probabilistic
  • based on probabilities, inferences
  • causal
  • based on cause-effect relationships

3
Types of research questions
  • Descriptive.
  • designed primarily to describe what is going on
    or what exists.
  • For example, a study in which you observe and
    note the current practice of users in performing
    a task of interest
  • Relational.
  • designed to look at the relationships between two
    or more variables.
  • For example, a study in which you look at the
    relationship between users preferred background
    color and font type
  • Causal
  • a study is designed to determine whether one or
    more variables (e.g., a program or treatment
    variable) causes or affects one or more outcome
    variables.
  • For example, a study of the effect of font size
    on time-to-complete or error rates in a task of
    interest

4
Time in research
  • cross-sectional versus longitudinal studies.
  • A cross-sectional study is one that takes place
    at a single point in time.
  • A longitudinal study is one that takes place over
    time -- we have at least two (and often more)
    waves of measurement in a longitudinal design.
  • repeated measures - two or a few waves of
    measurement
  • time series - many waves of measurement over time
  • Analysis considerations Time series analysis
    requires that you have at least twenty or so
    observations. Repeated measures analyses (like
    repeated measures ANOVA) aren't often used with
    as many as twenty waves of measurement.

5
Relationships between variables
  • correlational v. causal
  • positive
  • negative (inverse)

6
Variables
  • variable
  • any entity that can take on different values.
  • attribute
  • a specific value of a variable.
  • independent variable
  • the thing you change
  • dependent variable
  • the thing you expect to change as a result of
    your manipulation of the independent variable
    the thing you measure
  • Example I perform an experiment in which I give
    apply fertilizer at concentrations of 5, 10,
    and 20 to some number of otherwise identical
    plants. I measure the growth of the plants. The
    concentration of fertilizer is the independent
    variable, the height of the plant is the
    dependent variable.

7
Well-chosen variables/attributes
  • exhaustive - should include all possible
    responses (all possible values of plant height,
    for example)
  • mutually exclusive - no response should be able
    to have two attributes simultaneously (2-4 inches
    and 4-6 inches shouldnt be two possible
    attributes exactly 4 in. would fall into two
    categories)

8
Hypotheses
  • hypothesis
  • a specific statement of prediction
  • For the hypothetical-deductive model two
    hypotheses
  • one that describes your prediction (alternative
    hypothesis) (H1)
  • one that describes all the other possible
    outcomes (null hypothesis) (H0)
  • One-tailed v. two-tailed
  • One-tailed the prediction specifies a direction
  • H1 increase fertilizer application will increase
    the height of the plant (H1)
  • H0 increased fertilizer application will not
    increase the height of the plant
  • Two-tailed the prediction does not specify a
    direction
  • H1 increased fertilizer application will affect
    the height of the plant
  • H0 increased fertilizer application will not
    affect the height of the plant
  • In either case, the goal of testing and analysis
    is to accept one hypothesis and reject the other

9
Sampling
  • the process of selecting units from a population
    of interest in a way that permits us to study
    those samples and then generalize our results
    back to the population
  • external validity
  • the degree to which study conclusions would hold
    for other experimenters in other similar studies
  • the approximate truth of the inferences and
    conclusions that result from the study

10
Sampling Model for generalization
  • identify the population you wish to generalize to
  • draw a fair sample of that population
  • conduct research with sample
  • generalize back to population from which sample
    is drawn

11
Sampling terminology
  • theoretical population
  • group you wish to generalize to
  • accessible population
  • subset of that population that is accessible to
    the experimenter
  • sampling frame
  • list of the accessible population youll draw
    your sample from
  • sample
  • group selected to be in your study

12
Selection, assignment
  • selection
  • how you draw your sample from a population
  • assignment
  • how you assign members of the sample to groups in
    your study
  • Goal avoid systematic error, bias

13
Statistical sampling terminology, continued
  • response
  • specific measurement value that a sampling unit
    provides
  • statistic
  • calculated across the response from the sample
  • parameter
  • calculated across the population
  • a statistic provides an estimate of a parameter

14
Terminology, continued
  • sampling distribution - the distribution of an
    infinite number of samples of the same size as
    the sample in our study
  • standard deviation
  • the spread of scores around the average in a
    single sample
  • standard error, sampling error
  • the spread of averages around the average of
    averages in a sample distribution
  • indicates the precision of our statistical
    estimate
  • calculated based on standard deviation of sample
  • larger sample size -gt smaller standard error

15
Variance
  • variance a measure of how spread out a
    distribution is. The average squared deviation
    of each value from its mean. Example values
    1, 2, 3
  • ?2 ( (1-2)2 (2-2)2 (3-2)2)/3 0.667
  • for a population
  • ?2 S(X µ)2 / N
  • µ mean, N number of samples
  • for a sample
  • s2 S (X M)2/(N-1)
  • M mean of the sample

16
Standard deviation
  • for a population
  • ? sqrt(?2)
  • for a sample
  • s sqrt(s2)

17
Normal distribution 68-95-99 rule
  • normal dist bell-shaped curve
  • 68 of cases fall w/in one S.D.
  • 95 of cases fall w/in two S.D.
  • 99 fall w/in three S.D.

18
Basic sampling concept
  • If we had a sampling distribution, we would be
    able to predict the 68, 95 and 99 confidence
    intervals for where the population parameter
    should be!
  • We don't actually have the sampling distribution
  • We do have the distribution for the sample
    itself.
  • From that distribution we can estimate the
    standard error (the sampling error) because it is
    based on the standard deviation and we have that.
  • We still don't actually know the population
    parameter value, but we can use our best estimate
    for that -- the sample statistic.
  • Now, we can use
  • the mean of our sample as the mean of the
    sampling distribution
  • standard deviation and sample size to estimate
    the standard error
  • SE s / sqrt(N)
  • to estimate confidence intervals for the
    population parameter.

19
An example
  • We draw a 100 member sample from a population
  • M 3.75
  • s 0.25
  • SE 0.25 / 10 0.025
  • p(3.725 lt µ lt 3.775) 0.68
  • p(3.700 lt µ lt 3.800) 0.95
  • p(3.675 lt µ lt 3.825) 0.99
  • these are known as confidence intervals
Write a Comment
User Comments (0)
About PowerShow.com