Title: Lecture 27: Analysis
1Lecture 27 Analysis
09/10/99
2Goals
- Introduction to Data Analysis
- How to use a statistical Package
- Review of basic statistical concepts and methods
3Steps in Data Analysis
- Enter the Data
- Choose the Analysis
- Run the Analysis
- Interpret the results
- Explore more analyses
- Learn and Communicate
4Why Use Statistics?
- Because Eyeballing data can be misleading
- Because Decisions require objective proof
- We need a consistent method for establishing
proof - Statistics, combined with good experimental
design, helps researchers overcome their biases,
assumptions and expectations - e.g., the Canals on Mars problem
- Statistics help us overcome
- Experimental Error
- Confusion of correlation with causation
- complexity of effects studied
5Types of Statistics
- Inferential statistics
- Concerned with making comparisons
- e.g., A is bigger than B
- e.g., There is a positive relationship (slope)
between A and B - Inferential statistics emphasize tests of
significance - Descriptive statistics
- concerned with describing patterns and properties
of data - e.g., there are three types of patient that
differ on these dimensions - e.g., these three items in the personality test
appear to be measuring the same underlying
construct (e.g., extraversion)
6Distributions
Yield Time order Method Yield 1 A 89.7 2 A
81.4 3 A 84.5 4 A 84.8 5 A 87.3 6 B 79.7 7 B
85.1 8 B 81.7 9 B 83.7 10 B 84.5
- Data
- Time Order X Yield
- Method A vs. Method B
- Does Method affect Result?
Method A Method B Difference 85.54 82.94 2..6
7How the Distributions Look
- A looks bigger
- Lots of variation
8Ways of Visualizing Distributions
- Dot Diagrams
- Histograms
- Data Runs (previous slide)
- Stem and Leaf Plots (Tukey)
- and others
9Ways of Describing Distributions
- Measures of Location
- Mean
- Median
- Mode
- Measures of Variability
- Standard Deviation (Variance)
- Kurtosis (peakiness)
10Randomization Test
- Imagine that we had the 10 observations shown
earlier and randomly shuffled them into two
groups of five. What are the chances that the
difference between the means of the two groups
would be greater than or equal to the observed
difference? - If we calculate this probability across all
possible shuffles of the data, this is called a
randomization test.
11Significance Tests
- Randomization test is a significance test
- compute a statistic to test a hypothesis (e.g.,
means are significantly different) - create reference distribution (for true
hypothesis) - calculate probability of observed discrepancy
occurring by chance - If probability is low enough, reject hypothesis
and assert statistically significant difference
12Normal and t-distributions
- Normal distribution is bell-curve shaped
- t-distribution reflects difference between 2
Normal distributions (approximates Normal as
number of observations increases) - Reasons why normal distribution is important
- Central Limit Theorem
- Robustness of statistical procedures to
departures from normality
13Characterizing Normal Distributions
- mean
- variance
- probabilities under the distribution
- z-values
- z-values measure position in standard deviation
units. - t-values are sample approximations of z values
- probability that data point will differ by more
than two standard deviations from the centre of a
normal distribution is roughly 5
14t-tests
- See standard texts for calculation of t-statistic
- divide difference in means by measure of
variability - refer result to tables of t values
- select entry based on degrees of freedom and
check out assigned probability of difference - Paired vs. Independent samples t-test
15Entering Data in SPSS
16Putting in Numbers
17Setting the Variable Type
18Number with two decimal places
19Data Matrix
20Choosing an Analysis Method
21Analysis Options
22Sample Output
23Choosing a Variable
24Setting Options
25Chart Output
26Inferential Statistics and Analysis of Variance
- Why use statistics?
- Randomization and Randomization tests
- Tests of Significance
- Confidence Intervals
- Accounting for Error Variation
- Analysis of Variance
27Sample Data for Paired (dependent) t-test
Mnths_6 Mnths_24 124 114 94 88 115 102 110 2 116 2
139 2 116 2 110 2 129 2 120 2 105 2 88 2 120 2 12
0 2 116 2 105 2 ... ... ... ... 123 132
28T-test Results
29Anova Example Data Matrix
30Deviations from Grand Mean
31Total Sums of Squares
32One-way ANOVA
33Two-way Data Matrix
34Table of Means
35Two-way Anova Summary
36Application to Design
- Targetted experiments (with analyses) can help
guide design - Analysis of results can be used to support
proposed design during reviews - Efficient experiments dont have to be expensive
37Cautions
- Formal experiments are often overkill
- Many quick iterations are probably more effective
than a few thorough iterations - Design sufficient power in the experiment
38Summary
- Design often requires observation
- Methods of observation need to be carefully
controlled (including sampling) - Formal experiments provide control
- Careful analysis helps interpretation/insight
- Descriptive Statistics
- Inferential Statistics
- T-tests and ANOVAs are typically used to analyze
experiments