Title: Basic Descriptive and Inferential Statistics
1Basic Descriptive and Inferential Statistics
- Analytical Techniques for Public Service
- The Evergreen State College
- Winter 2010
2 Where are we?You Have
- Identified your problem/research question
- Described why the issue is worth studying
- Conducted a literature review to see what others
have done and to shed more light on your
question - Identified and operationalized your measures
- Identified a research design that is capable of
answering to some degree your research question - You will soon be in the field collecting your
data - Now What??
3Preparing Your Data for Analysis
- Prepare code categories (e.g., 1 female 2
male) - Prepare a codebook (this tells you the location
of data and the meaning of the codes in a data
file). - Create a data file based upon the codebook (e.g.
Excel, SPSS, SAS, JMP8, ASCII). - Once the data are entered and cleaned and made
analysis ready we are ready for analysis.
4Valuable Books for Your Arsenal
- Stevens, J. (2002) Applied Multivariate
Statistics for the Social Sciences. Lawrence
Erlbaum Associates 4th Ed. - Blalock, H. (1979) Social Statistics McGraw Hill
Revised 2nd Ed. - Kraemer, H and Thiemann, S. (1987) How Many
Subjects Statistical Power Analysis in Research.
Sage Publications. - Babbie, E. (2009). The Practice of Social
Research. Wadsworth Publishing 12th Edition.
5What will be Considered
- Descriptive vs Inferential Statistics
- Basic Terminology
- Levels of measurement
- Strength of Association
- Hypothesis testing
- Type I and Type II Errors and Statistical Power
6Subject Matter of Statistics
- Descriptive Statistics - Tools and issues
involved in describing collections of statistical
observations, whether they are samples or total
populations. - Inferential Statistics (inductive statistics) -
Deals with the logic and procedures for
evaluating risks of inference from descriptions
of samples to descriptions of populations (finite
or infinite).
Loether, H and McTavish, D. (1976) Descriptive
and Inferential Statistics
7Basic Terms
- Variable (an attribute of a person or object that
can take on different values). - Distribution of a variable.
- Continuous or discrete variables
- Central tendency
- Range
- Dispersion/confidence intervals
- Levels of Analysis
8Measures of Central Tendency
- Central tendency
- The 3-Ms Mode, Median, Mode.
- Mode most frequent response.
- Median mid-point of the distribution
- Mean arithmetic average.
9Standard Deviation
- Normal Distribution Bell-shaped curve
- 68.26 of the variation is within 1 standard
deviation of the mean - 95 of the variation is within 1.96 standard
deviations of the mean
10Applying the Standard Deviation
- Average test score 60.
- The standard deviation is 10.
- Therefore, 95 of the scores are between 40 and
80. - Calculation
- 602080 60-2040.
11ExerciseConfidence Intervals
N Range Min Max
Age 825 11.93 9.49 21.43
Mean SE Mean SE Std. Deviation Variance
14.5548 .06925 1.98898 3.956
Calculate and interpret a 95 confidence interval
for these data.
-
12Variable types
- Continuous variable Attributes are a steady
progression (income, age). No gaps. - Discrete variable Attributes are separated
gappiness (gender, religious affiliation, race)
13Analysis
- Univariate Analysis Single Variable
- Bivariate Analysis Analysis of two variable
simultaneously - Multivariate Analysis Analysis of simultaneous
relationship of several variables.
14Level of Measurement
- Nominal Data Categorical (e.g., gender, race)
- Ordinal Data Nominal More/less than (e.g.,
social class, religiosity) - Interval Data Nominal Ordinal How much
more/less than. Categories have a standard unit
of measure (e.g., Fahrenheit). - Ratio Data Nominal Ordinal Interval a true
zero (e.g., age, height).
15Levels of Measurement and Statistics Tests
1st Variable
2nd Variable-?
?
Single Variable Dichotomy Nominal Ordinal Interval/Ratio
Dichotomy Proportions, Percentages, ratios Diff of proportions, Chi Square Taub
Nominal (r cat) Proportions, Percentages, ratios Chi Square, Contingency C Phi/Cramers V Taub Yules Q Chi Square, Contingency C Cramers V Taub Yules Q
Ordinal Medians, quartiles, deciles, q deviations Mann-Whitney, runs, signed ranks Anova with ranks Gamma, Rank order corr, Kendallss Tau
Interval/ Ratio Means, Medians, SD Diff of Means Anova, Eta2, intraclass correlations Correlation and Regression
Blalock, Social Statistics
16Teen Pregnancy Risk Factors
YDP Program Participants (n) Healthy Youth Survey (n)
Participants getting mostly Ds Fs in school Grade 8 21.9 (137) 9.6 (7,923)
Participants getting mostly Ds Fs in school Grade 10 23.1 (78) 8.8 (7,673)
Participants getting mostly Ds Fs in school Grade 12 25 (56) 5.3 (5,684)
Participants who reported their mother did not finish high school Grade 8 27.7 (94) 7.3 (7,938)
Participants who reported their mother did not finish high school Grade 10 36.6 (71) 8.6 (7,688)
Participants who reported their mother did not finish high school Grade 12 23.1 (52) 9 (5,695)
Participants who used alcohol in the 30 days before pre-test Grade 8 31.7 (145) 18 (8,223)
Participants who used alcohol in the 30 days before pre-test Grade 10 45.2 (84) 32.6 (7,860)
Participants who used alcohol in the 30 days before pre-test Grade 12 53.4 (58) 42.6 (5,795)
17Measures of Association
- A class of statistical tests that are used to
show the magnitude or strength of a relationship
between variables. - Significance tests are used to establish whether
or not a relationship exists, and measures of
association show the size of the relationship
(weak, moderate, strong). - Some also show the direction of the relationship
( for ordinal and interval-ratio variables).
18Measures of Association for Cross Tabulations
(examples)
- Lambda The strength of a relationship between
two nominal variables. - Phi The strength of a relationship between two
dichotomous variables. - Gamma The strength of a relationship between two
ordinal variables. - Values Range between 0 and or 1.
- Negative and positive values show the direction
of the relationship, where applicable. - The closer the value is to one, the stronger the
relationship.
19Proportional Reduction of Error (PRE)
- PRE Proportional Reduction of Error The concept
underlying these tests where - The errors of prediction made when the
independent variables is ignored (E1) and the
errors of prediction made when the prediction is
based on the independent variable (E2) are taken
into account. - If you know information about one variable, to
what extent will that data help you predict
information about another variable?
20General PRE formula
-
- of errors not knowing ind var (minus)
- of errors knowing ind var
- --------------------------------------------------
-------------- - of errors not knowing ind var
21Are homeless people reporting mental health
problems more likely to request case management
than those who dont?
Exercise Calculate Lambda and Interpret
No Mental Health Problems Mental Health Problems TOTAL
Does Not Want Case Mgt 355 (69) 293 (45) 648
Wants Case Mgt 157 (31) 359 (55) 516
Total 512 (100) 652 (100) 1164
22Reading Tables
- Independent Variable Mental health problems
- Dependent variable Wants Case management
- Are those that have mental health problems more
likely to say they will want case management? - For each category of the independent variable,
what is the percent distributions across the
dependent variable? - Percent distribution down columns
23Lambda
- An asymmetrical measure of association the value
varies depending on which variable is
independent. - Ranges from 0 to 1
- Formula
- Lambda E1-E2
- E1
24Instructions to Calculate Lambda
- 1. Calculate E1 Find the mode of the dependent
variable (the attribute that occurs the most
often) and subtract it from N (sample size).
E1N-ƒ of the mode - 2. Calculate E2 Find the mode in each row (i.e.,
category of the independent variable). Subtract
each value from the row (category) total and add
them together. E2(Row total row mode) (Row
total row mode) for all attributes of the
independent variable.
25Are homeless people reporting mental health
problems more likely to request case management
than those who dont?
No Mental Health Mental Health TOTAL
Does Not Want Case Mgt 355 293 648
Wants Case Mgt 157 359 516
Total 512 652 1164
E1 1164-648516 E2 (512-355)
(652-359)450 Lambda.128
26Gender
N
Female 642 73.2
Male 235 26.8
Total 882 100.0
27How likely is it that you will have sexual
intercourse in the next year?
N
1 I definitely will 123 15.3
2 I probably will 149 18.5
3 I don't know 183 22.7
4 I probably will not 87 10.8
5 I definitely will not 263 32.7
Total 805 100.0
28How likely is it that you will have sexual
intercourse in the next year? By Gender
Total
1 female 2 male Total
1 I definitely will 11.5 25.7 15.2
2 I probably will 17.7 21.0 18.6
3 I don't know 20.6 28.6 22.7
4 I probably will not 12.2 7.1 10.8
5 I definitely will not 38.0 17.6 32.7
Total Total 592 210 802
Total Total 100.0 100.0 100.0
?2 49 p lt .001 Lambda .04
29How likely is it that you will have sexual
intercourse in the next year?By Drink Alcohol
Drank alcohol in the last 30 days Drank alcohol in the last 30 days Drank alcohol in the last 30 days Drank alcohol in the last 30 days Drank alcohol in the last 30 days Drank alcohol in the last 30 days
No No Yes Yes Total Total
N N N
I Will 127 22.4 145 57.8 272 30.8
I Don't Know 128 22.6 52 20.7 180 20.7
I Won't 297 52.5 50 19.9 347 39.7
Total 552 100.0 247 100.0 799 100.0
?2 108 p lt .001 Lambda .21
30Testing Hypotheses
31Steps in Conducting Hypothesis Testing
- State the null hypothesis and alternative.
- Determine if the test will be one or two tailed
- Determine the level of measurement of your
variables - Set the alpha level (consider power of the test).
- Identify the statistical test and assumptions for
each relationship.
32Common Distributions
- Population Distribution
- Sample Frame
- Sample Distribution
- Sampling Distribution
33Common Sampling Distributions
- Chi Square
- Students t
- F Distribution
- Normal Distribution
34Chi Square
- Chi square is computed based on a comparison of
actual frequencies observed in a sample to that
which would be expected to occur by chance alone.
If there is a large difference between the
observed vs. the expected frequencies, a large
value for Chi square will be obtained.
35T-test
- Definition The t-test is used to determine
whether the difference between means of two
groups or conditions is due to the independent
variable, or if the difference is simply due to
chance. - (test of independence and paired samples tests)
36One-way ANOVA
- Definition As with the t-test, ANOVA also tests
for significant differences between groups. But
while the t-test is limited to the comparison of
only two groups, one-way ANOVA can be used to
test differences in three or more groups.
37Sexual Behavior IntentScale Score(5 High Risk)
Drank alcohol in the last 30 days Mean Std. Deviation N
no 2.3862 .95639 563
yes 3.2723 .91401 251
Total 2.6595 1.02802 814
P lt .001 Eta2 .16
38Sexual Behavior IntentScale Score
ANOVA ANOVA ANOVA ANOVA ANOVA ANOVA
Sum of Squares df Mean Square F Sig.
Between Groups 136.301 1 136.301 153.101 .000
Within Groups 722.901 812 .890
Total 859.203 813
39Other Interesting Terms
- Assumption (e.g., normal distribution)
- Assumption Robustness (Leeway one has in
violating an assumption)
40Type I and Type II Errors
- Type I Rejecting a null hypothesis when it is
true Saying groups differ when they do not. - Type II The probability of accepting a null
hypothesis when it is false Saying groups do
not differ when they do. - Power The probability of rejecting a false null
when it is false the probability of making a
correct decision.
41Setting Type I and Type II Errors
- H0 Drug is unsafe.
- H1 Drug is safe.
- H0 Defendant is innocent.
- H1 Defendant is guilty.
42Alpha, Beta, and Power(N 15)
- a ß 1-ß
- .10 .37 .63
- .05 .52 .48
- .01 .78 .22
Stevens Applied Multivariate Statistics for the
Social Sciences
43Power and N size
- n (subjects per group) power
-
- 10 .18
- 20 .33
- 50 .70
- 100 .94
Stevens Applied Multivariate Statistics for the
Social Sciences
44(No Transcript)
45Exercise True or FalseTo achieve the same
power
- More subjects are needed for a 1 level test than
for a 5 level test. - Two-tailed tests require larger sample sizes than
one-tailed tests. - The smaller the critical effect size, the larger
the necessary sample size. - The larger the power required, the larger the
necessary sample size. - The smaller the sample size, the smaller the
power the greater the chance of failure.