Basic Descriptive and Inferential Statistics - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Basic Descriptive and Inferential Statistics

Description:

... Applied Multivariate Statistics for the Social Sciences. ... Kraemer, H and Thiemann, S. (1987) How Many Subjects: Statistical Power Analysis in Research. – PowerPoint PPT presentation

Number of Views:733
Avg rating:3.0/5.0
Slides: 46
Provided by: academicE2
Category:

less

Transcript and Presenter's Notes

Title: Basic Descriptive and Inferential Statistics


1
Basic Descriptive and Inferential Statistics
  • Analytical Techniques for Public Service
  • The Evergreen State College
  • Winter 2010

2
Where are we?You Have
  • Identified your problem/research question
  • Described why the issue is worth studying
  • Conducted a literature review to see what others
    have done and to shed more light on your
    question
  • Identified and operationalized your measures
  • Identified a research design that is capable of
    answering to some degree your research question
  • You will soon be in the field collecting your
    data
  • Now What??

3
Preparing Your Data for Analysis
  • Prepare code categories (e.g., 1 female 2
    male)
  • Prepare a codebook (this tells you the location
    of data and the meaning of the codes in a data
    file).
  • Create a data file based upon the codebook (e.g.
    Excel, SPSS, SAS, JMP8, ASCII).
  • Once the data are entered and cleaned and made
    analysis ready we are ready for analysis.

4
Valuable Books for Your Arsenal
  • Stevens, J. (2002) Applied Multivariate
    Statistics for the Social Sciences. Lawrence
    Erlbaum Associates 4th Ed.
  • Blalock, H. (1979) Social Statistics McGraw Hill
    Revised 2nd Ed.
  • Kraemer, H and Thiemann, S. (1987) How Many
    Subjects Statistical Power Analysis in Research.
    Sage Publications.
  • Babbie, E. (2009). The Practice of Social
    Research. Wadsworth Publishing 12th Edition.

5
What will be Considered
  • Descriptive vs Inferential Statistics
  • Basic Terminology
  • Levels of measurement
  • Strength of Association
  • Hypothesis testing
  • Type I and Type II Errors and Statistical Power

6
Subject Matter of Statistics
  • Descriptive Statistics - Tools and issues
    involved in describing collections of statistical
    observations, whether they are samples or total
    populations.
  • Inferential Statistics (inductive statistics) -
    Deals with the logic and procedures for
    evaluating risks of inference from descriptions
    of samples to descriptions of populations (finite
    or infinite).

Loether, H and McTavish, D. (1976) Descriptive
and Inferential Statistics
7
Basic Terms
  • Variable (an attribute of a person or object that
    can take on different values).
  • Distribution of a variable.
  • Continuous or discrete variables
  • Central tendency
  • Range
  • Dispersion/confidence intervals
  • Levels of Analysis

8
Measures of Central Tendency
  • Central tendency
  • The 3-Ms Mode, Median, Mode.
  • Mode most frequent response.
  • Median mid-point of the distribution
  • Mean arithmetic average.

9
Standard Deviation
  • Normal Distribution Bell-shaped curve
  • 68.26 of the variation is within 1 standard
    deviation of the mean
  • 95 of the variation is within 1.96 standard
    deviations of the mean

10
Applying the Standard Deviation
  • Average test score 60.
  • The standard deviation is 10.
  • Therefore, 95 of the scores are between 40 and
    80.
  • Calculation
  • 602080 60-2040.

11
ExerciseConfidence Intervals
N Range Min Max
Age 825 11.93 9.49 21.43
Mean SE Mean SE Std. Deviation Variance
14.5548 .06925 1.98898 3.956
Calculate and interpret a 95 confidence interval
for these data.
-
12
Variable types
  • Continuous variable Attributes are a steady
    progression (income, age). No gaps.
  • Discrete variable Attributes are separated
    gappiness (gender, religious affiliation, race)

13
Analysis
  • Univariate Analysis Single Variable
  • Bivariate Analysis Analysis of two variable
    simultaneously
  • Multivariate Analysis Analysis of simultaneous
    relationship of several variables.

14
Level of Measurement
  • Nominal Data Categorical (e.g., gender, race)
  • Ordinal Data Nominal More/less than (e.g.,
    social class, religiosity)
  • Interval Data Nominal Ordinal How much
    more/less than. Categories have a standard unit
    of measure (e.g., Fahrenheit).
  • Ratio Data Nominal Ordinal Interval a true
    zero (e.g., age, height).

15
Levels of Measurement and Statistics Tests
1st Variable
2nd Variable-?
?
Single Variable Dichotomy Nominal Ordinal Interval/Ratio
Dichotomy Proportions, Percentages, ratios Diff of proportions, Chi Square Taub
Nominal (r cat) Proportions, Percentages, ratios Chi Square, Contingency C Phi/Cramers V Taub Yules Q Chi Square, Contingency C Cramers V Taub Yules Q
Ordinal Medians, quartiles, deciles, q deviations Mann-Whitney, runs, signed ranks Anova with ranks Gamma, Rank order corr, Kendallss Tau
Interval/ Ratio Means, Medians, SD Diff of Means Anova, Eta2, intraclass correlations Correlation and Regression
Blalock, Social Statistics
16
Teen Pregnancy Risk Factors
YDP Program Participants (n) Healthy Youth Survey (n)
Participants getting mostly Ds Fs in school Grade 8 21.9 (137) 9.6 (7,923)
Participants getting mostly Ds Fs in school Grade 10 23.1 (78) 8.8 (7,673)
Participants getting mostly Ds Fs in school Grade 12 25 (56) 5.3 (5,684)
Participants who reported their mother did not finish high school Grade 8 27.7 (94) 7.3 (7,938)
Participants who reported their mother did not finish high school Grade 10 36.6 (71) 8.6 (7,688)
Participants who reported their mother did not finish high school Grade 12 23.1 (52) 9 (5,695)
Participants who used alcohol in the 30 days before pre-test Grade 8 31.7 (145) 18 (8,223)
Participants who used alcohol in the 30 days before pre-test Grade 10 45.2 (84) 32.6 (7,860)
Participants who used alcohol in the 30 days before pre-test Grade 12 53.4 (58) 42.6 (5,795)
17
Measures of Association
  • A class of statistical tests that are used to
    show the magnitude or strength of a relationship
    between variables.
  • Significance tests are used to establish whether
    or not a relationship exists, and measures of
    association show the size of the relationship
    (weak, moderate, strong).
  • Some also show the direction of the relationship
    ( for ordinal and interval-ratio variables).

18
Measures of Association for Cross Tabulations
(examples)
  • Lambda The strength of a relationship between
    two nominal variables.
  • Phi The strength of a relationship between two
    dichotomous variables.
  • Gamma The strength of a relationship between two
    ordinal variables.
  • Values Range between 0 and or 1.
  • Negative and positive values show the direction
    of the relationship, where applicable.
  • The closer the value is to one, the stronger the
    relationship.

19
Proportional Reduction of Error (PRE)
  • PRE Proportional Reduction of Error The concept
    underlying these tests where
  • The errors of prediction made when the
    independent variables is ignored (E1) and the
    errors of prediction made when the prediction is
    based on the independent variable (E2) are taken
    into account.
  • If you know information about one variable, to
    what extent will that data help you predict
    information about another variable?

20
General PRE formula
  • of errors not knowing ind var (minus)
  • of errors knowing ind var
  • --------------------------------------------------
    --------------
  • of errors not knowing ind var

21
Are homeless people reporting mental health
problems more likely to request case management
than those who dont?
Exercise Calculate Lambda and Interpret
No Mental Health Problems Mental Health Problems TOTAL
Does Not Want Case Mgt 355 (69) 293 (45) 648
Wants Case Mgt 157 (31) 359 (55) 516
Total 512 (100) 652 (100) 1164
22
Reading Tables
  • Independent Variable Mental health problems
  • Dependent variable Wants Case management
  • Are those that have mental health problems more
    likely to say they will want case management?
  • For each category of the independent variable,
    what is the percent distributions across the
    dependent variable?
  • Percent distribution down columns

23
Lambda
  • An asymmetrical measure of association the value
    varies depending on which variable is
    independent.
  • Ranges from 0 to 1
  • Formula
  • Lambda E1-E2
  • E1

24
Instructions to Calculate Lambda
  • 1. Calculate E1 Find the mode of the dependent
    variable (the attribute that occurs the most
    often) and subtract it from N (sample size).
    E1N-ƒ of the mode
  • 2. Calculate E2 Find the mode in each row (i.e.,
    category of the independent variable). Subtract
    each value from the row (category) total and add
    them together. E2(Row total row mode) (Row
    total row mode) for all attributes of the
    independent variable.

25
Are homeless people reporting mental health
problems more likely to request case management
than those who dont?
No Mental Health Mental Health TOTAL
Does Not Want Case Mgt 355 293 648
Wants Case Mgt 157 359 516
Total 512 652 1164
E1 1164-648516 E2 (512-355)
(652-359)450 Lambda.128
26
Gender
N
Female 642 73.2
Male 235 26.8
Total 882 100.0
27
How likely is it that you will have sexual
intercourse in the next year?
N
1 I definitely will 123 15.3
2 I probably will 149 18.5
3 I don't know 183 22.7
4 I probably will not 87 10.8
5 I definitely will not 263 32.7
Total 805 100.0
28
How likely is it that you will have sexual
intercourse in the next year? By Gender

Total
1 female 2 male Total
1 I definitely will 11.5 25.7 15.2
2 I probably will 17.7 21.0 18.6
3 I don't know 20.6 28.6 22.7
4 I probably will not 12.2 7.1 10.8
5 I definitely will not 38.0 17.6 32.7
Total Total 592 210 802
Total Total 100.0 100.0 100.0
?2 49 p lt .001 Lambda .04
29
How likely is it that you will have sexual
intercourse in the next year?By Drink Alcohol
Drank alcohol in the last 30 days Drank alcohol in the last 30 days Drank alcohol in the last 30 days Drank alcohol in the last 30 days Drank alcohol in the last 30 days Drank alcohol in the last 30 days
No No Yes Yes Total Total
N N N
I Will 127 22.4 145 57.8 272 30.8
I Don't Know 128 22.6 52 20.7 180 20.7
I Won't 297 52.5 50 19.9 347 39.7
Total 552 100.0 247 100.0 799 100.0

?2 108 p lt .001 Lambda .21
30
Testing Hypotheses
31
Steps in Conducting Hypothesis Testing
  1. State the null hypothesis and alternative.
  2. Determine if the test will be one or two tailed
  3. Determine the level of measurement of your
    variables
  4. Set the alpha level (consider power of the test).
  5. Identify the statistical test and assumptions for
    each relationship.

32
Common Distributions
  • Population Distribution
  • Sample Frame
  • Sample Distribution
  • Sampling Distribution

33
Common Sampling Distributions
  • Chi Square
  • Students t
  • F Distribution
  • Normal Distribution

34
Chi Square
  • Chi square is computed based on a comparison of
    actual frequencies observed in a sample to that
    which would be expected to occur by chance alone.
    If there is a large difference between the
    observed vs. the expected frequencies, a large
    value for Chi square will be obtained.

35
T-test
  • Definition The t-test is used to determine
    whether the difference between means of two
    groups or conditions is due to the independent
    variable, or if the difference is simply due to
    chance.
  • (test of independence and paired samples tests)

36
One-way ANOVA
  • Definition As with the t-test, ANOVA also tests
    for significant differences between groups. But
    while the t-test is limited to the comparison of
    only two groups, one-way ANOVA can be used to
    test differences in three or more groups.

37
Sexual Behavior IntentScale Score(5 High Risk)
Drank alcohol in the last 30 days Mean Std. Deviation N
no 2.3862 .95639 563
yes 3.2723 .91401 251
Total 2.6595 1.02802 814
P lt .001 Eta2 .16
38
Sexual Behavior IntentScale Score
ANOVA ANOVA ANOVA ANOVA ANOVA ANOVA

Sum of Squares df Mean Square F Sig.
Between Groups 136.301 1 136.301 153.101 .000
Within Groups 722.901 812 .890
Total 859.203 813
39
Other Interesting Terms
  • Assumption (e.g., normal distribution)
  • Assumption Robustness (Leeway one has in
    violating an assumption)

40
Type I and Type II Errors
  • Type I Rejecting a null hypothesis when it is
    true Saying groups differ when they do not.
  • Type II The probability of accepting a null
    hypothesis when it is false Saying groups do
    not differ when they do.
  • Power The probability of rejecting a false null
    when it is false the probability of making a
    correct decision.

41
Setting Type I and Type II Errors
  • H0 Drug is unsafe.
  • H1 Drug is safe.
  • H0 Defendant is innocent.
  • H1 Defendant is guilty.

42
Alpha, Beta, and Power(N 15)
  • a ß 1-ß
  • .10 .37 .63
  • .05 .52 .48
  • .01 .78 .22

Stevens Applied Multivariate Statistics for the
Social Sciences
43
Power and N size
  • n (subjects per group) power
  •  
  • 10 .18
  • 20 .33
  • 50 .70
  • 100 .94

Stevens Applied Multivariate Statistics for the
Social Sciences
44
(No Transcript)
45
Exercise True or FalseTo achieve the same
power
  • More subjects are needed for a 1 level test than
    for a 5 level test.
  • Two-tailed tests require larger sample sizes than
    one-tailed tests.
  • The smaller the critical effect size, the larger
    the necessary sample size.
  • The larger the power required, the larger the
    necessary sample size.
  • The smaller the sample size, the smaller the
    power the greater the chance of failure.
Write a Comment
User Comments (0)
About PowerShow.com