Last week - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Last week

Description:

When the data are scarce, the usual chi-square test can give a misleading result. ... There are countless normal distributions. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 75
Provided by: colin111
Category:
Tags: countless | last | week

less

Transcript and Presenter's Notes

Title: Last week


1
Last week
  • How to run chi-square tests of association on
    SPSS.
  • When the data are scarce, the usual chi-square
    test can give a misleading result.
  • If there are warnings about low expected
    frequencies, run an EXACT TEST.

2
Regression
  • Regression is the other side of the associative
    coin.
  • We use a correlation coefficient to measure the
    STRENGTH of an association.
  • We can use regression to PREDICT values of one
    variable from those of the other.

3
The regression line of Violence upon Preference
  • The REGRESSION LINE is the line that fits the
    points best according to the LEAST SQUARES
    criterion.

4
The regression line
Y (Violence)
P
Q
2.091
intercept
(0, 0)
X (Exposure)
5
Errors or residuals
Y (Violence)
Intercept 2.091
(0, 0)
9
X (Exposure)
6
Variance accounted for
  • An important aspect of regression is accounting
    for VARIANCE in the target or criterion variable
    in terms of regression upon the independent
    variable or predictor.

7
The total variance of the target variable
MY
These are the deviations from the MY line from
which the total variance of the scores on the
target variable can be calculated.
8
Prediction without regression
  • When the variables show no association, the slope
    of the regression line is zero and the line runs
    horizontally through the mean MY of the criterion
    or dependent variable.
  • The intercept (B0) is MY in this case.

9
Deviations from MY of points on the regression
line
From these deviations, the variance attributable
to regression can be calculated.
10
Regression line residuals
Here are the errors or residuals when regression
is used. A variance estimate can be calculated
from the residuals.
11
Variance accounted for by regression
  • The deviations from MY of points on the line are
    the basis of the variance of the criterion
    variable Y accounted for by regression on X.

12
Coefficient of determination
  • The PROPORTION of the variance of the target or
    criterion variable (Actual Violence) accounted
    for by regression on the regressor or independent
    variable (Exposure) is known as the COEFFICIENT
    OF DETERMINATION and is the square of the Pearson
    correlation.

13
Coefficient of determination (r2)
14
In our example
15
Theoretical importance of r2
  • Often the predictions of individual cases from
    regression arent very accurate.
  • Researchers are usually more interested in the
    coefficient of determination, because it provides
    a measure of the extent to which a predictor or
    regressor variable accounts for variance in the
    target variable.
  • If r2 is high, the inference is that the
    abilities involved in the regressor are
    implicated in the target variable.

16
Lecture 11RUNNING SIMPLE REGRESSION ON
SPSS
17
Finding regression
18
Saving the predictions and residuals
  • Its useful to save the predicted values and
    residuals (errors) from the regression equation.
  • Click the Save button at the foot of the Linear
    Regression dialog.

Click the Savebutton to save the predicted
values.
19
The regression coefficients
  • The only entries that concern us just now are
    those under the B column (Unstandardised
    Coefficients).
  • We see that B0 2.091 and B 0.736. These are,
    respectively, the intercept and the slope of the
    regression equation.

20
Coefficient of determination
  • The values given for the Pearson correlation and
    the coefficient of determination r2 have been
    ringed.

21
Data View
  • I have renamed the estimates of Y and the
    residuals in the two rightmost columns.
  • Notice that the residuals are Actual Violence
    scores minus the predictions from regression.
  • When a residual has a negative sign, it means
    that the actual Y value was below the regression
    line.

22
Importance of regression
  • Regression is used extensively in some areas of
    psychological research, such as health
    psychology.
  • You will be hearing much more about it next year.

23
Revision of the normal curve
24
Hypothesis testing
  • In my final lecture, I shall be considering the
    t-testing procedure that you used to analyse the
    data from the inverted faces and landscapes
    experiment.
  • This will involve a more general discussion of
    HYPOTHESIS TESTING.
  • This afternoon, I shall try to lay the
    foundations.

25
Normal distribution
  • A NORMAL DISTRIBUTION is symmetrical and
    bell-shaped.
  • If a variable is normally distributed, 95 of
    values lie within 1.96 standard deviations (2
    approx.) on EITHER side of the mean.

0.95 (95)
2 ½ .025
2 ½ .025
mean
mean 1.96SD
mean 1.96SD
26
Specifying a normal distribution
  • Suppose that a variable X has a normal
    distribution with mean µ and standard deviation
    s.
  • We write this as shown.

27
The standard normal variable z
  • If X is a normal variable, that is,
  • XN(µ,s),
  • z will also be normally distributed.
  • z is known as the STANDARD NORMAL VARIABLE.

28
Standardisation
  • Strictly speaking, z is defined in relation to
    the theoretical normal population mean µ.
  • However, any set of scores X can be STANDARDISED
    by subtracting the sample mean from each score
    and dividing by the sample standard deviation.

29
Effects of standardisation
  • Standardising a set of scores (or a population
    of scores) has two effects
  • The mean becomes zero
  • The standard deviation becomes 1.

30
The standard normal distribution
  • In the notation I introduced earlier, we can
    represent the standard normal distribution as
    follows.

31
Distribution of z
  • Standardising a set of scores does NOT make them
    normally distributed.
  • If theres a tail to the right (ve skew) before
    transforming X to z, there will be one after the
    transformation.
  • Nevertheless, whatever the shape of the original
    distribution, the mean standardised score will be
    zero and the standard deviation will be 1.

32
Referring to z
  • A question about the probability of a range of
    values of ANY normally distributed variable X
    with mean µ and standard deviation s can be
    translated into a question about an equivalent
    range of the standard normal variable z.

33
Using z
Probability that X (IQ) lies between 70 and
130 AND ALSO Probability that z lies between
-1.96 and 1.96.
Probability that X (IQ) is at least 130 AND ALSO
Probability that z is at least 1.96
0.95
X (IQ) 70 100
130 100 1.96SD z
-1.96 0
1.96
34
Referring questions from X to z
  • What is the probability of an IQ of at least 130?
  • This is to ask about the probability that X is at
    least 130, where X N(100, 15).
  • Transform X to z z (130 100)/15 2.
  • We know from memory that the probability of z
    greater than 2 (actually 1.96) .025.

35
Probability of an IQ between 100 and 130?
  • Convert these values to values of z.
  • If X 100, z 0.
  • If X 30, z 2.
  • Pr(z between 0 and 2) 0.95/2 0.475.

0.95 (95)
2 ½ .025
2 ½ .025
µ
µ 1.96SD
X µ 1.96SD
z -1.96 0 1.96
36
Finding the probability of a range of values of
X
  • In the problems we have considered, the value of
    z has always been around 2 (about 1.96), so that
    we can find the probability from memory.
  • Suppose z 1, 0.5, or any value other than
    1.96?
  • Just standardise the value of X by converting it
    to z z (X mean)/SD.
  • The are available tables in standard statistics
    textbooks which give probabilities of ANY
    specified range of values of z. You can also use
    the SPSS cumulative distribution function CDF to
    find such probabilities.

37
Tables
  • There are countless normal distributions.
  • But there is only ONE standard normal
    distribution, to which any of the others can be
    transformed by z (X mean)/SD.
  • So only the probabilities of ranges of values of
    z need to be tabled.
  • It would not be feasible to table the
    probabilities for ALL possible normal
    distributions.

38
To sum up
  • If we know the DISTRIBUTION of some variable, we
    can assign a probability of obtaining a value
    within a specified range.
  • We can visualise the probability of such a value
    as the area under the curve of the distribution.
  • If the distribution is normal, we can translate
    probability questions in the original units of
    measurement into questions about ranges of z,
    which, provided X is normally distributed, has
    the STANDARD NORMAL DISTRIBUTION.

39
Random variables
  • A RANDOM VARIABLE is defined in the context of an
    experiment of chance, such as rolling a die or
    tossing a coin.
  • Let X be the result of rolling a die. X is a
    random variable.
  • Let Y be the number of heads in 10 tosses of a
    coin. Y is a random variable.

40
Gathering data
  • The gathering of data is an experiment of chance.
  • Essentially, you are SELECTING observations at
    random from POPULATIONS of possible observations.
  • The value of a variable that you are observing is
    a RANDOM variable.
  • In the Caffeine experiment, the value of a score
    under the Caffeine condition is a random
    variable, as is that of a score under the Placebo
    condition.

41
Drawing tickets
  • This barrel contains numbered tickets.
  • I draw a ticket at random. Let X be the number
    on the ticket. X is a random variable.

42
Drawing tickets
  • Early studies of probability involved drawing
    real tickets from real barrels. Nowadays, we use
    the computer.
  • Computers enable us to specify the distribution
    from which we are drawing our ticket.

43
Using the computer
  • The 4000 IQs were sampled by first placing 4000
    numbers in the column named Ones. Any numbers
    will do.
  • Now we are going to draw another sample of 4000
    IQs from the same population.

44
Resampling from the IQ population
45
Descriptives
  • I put some more ones in the first column 48,
    000, actually!
  • As expected with samples this size, their means
    and SDs are very similar and close to the
    theoretical mean of 100 and standard deviation of
    15.

46
Independent random variables
  • Here are two barrels, each containing numbered
    tickets.
  • Let X be the number of the ticket that I draw
    from the first barrel. Let Y be the number of the
    ticket that I draw from the second barrel.
  • X and Y are INDEPENDENT random variables, because
    the value of X does not determine the value of Y
    or vice versa.

Let X be a value drawn from this barrel.
Let Y be a value drawn from this barrel.
47
The sum of random variables
  • X and Y are random variables.
  • Let SUM X Y
  • SUM is also a random variable.

X
Y
SUM X Y
48
The variance of the sum
  • The variance of the sum of INDEPENDENT random
    variables is the sum of their variances.

X Y
Y
X
49
Create the SUM
  • In a fourth column, the sum of IQ and IQtwo will
    appear.
  • You can see that the variance of SUM is very
    close to twice the variance of either IQ or
    IQtwo.

50
Taking a sample
51
Variance of S X
52
Adding and multiplying by a constant
s2
Adding a constant k
M
M k
k2s2
s2
Multiplying by a constant k
M
kM
53
Effect upon the variance of multiplying by a
constant
  • When you multiply by a constant, you multiply the
    variance by the SQUARE of the constant.

54
Variance of the mean
The variance of the means is the original
variance s2 divided by n, the size of the
sample.
55
The variance of the mean
  • The barrel on the right contains means of samples
    of size n drawn from the barrel on the left.
  • The variance of the means is the original
    variance divided by n.

X
M
s2
s2/n
56
Standard error of the mean
  • The STANDARD ERROR OF THE MEAN sM is the square
    root of the variance of the mean.

57
Sampling distribution
  • The distribution of a STATISTIC such as the mean
    is known as its SAMPLING DISTRIBUTION.
  • If X is normally distributed, then the mean M is
    also normally distributed.
  • The sampling distribution of M is normal and
    centred on the mean µ of the variable X.
  • The variance of the sampling distribution of M is
    s2/n.
  • The standard deviation of the sampling
    distribution of the mean (the standard error of
    the mean) sM is s/vn .

58
Sampling distribution of the mean
59
Effect of increasing n
  • IQ N(100, 15)
  • Let M4, M16 and M64 be means of samples of size n
    4, 16 and 64 respectively.
  • The values of sM are, respectively, 7.5, 3.75 and
    1.88.

60
Effect of sample size n
61
Effect of increasing the sample size n
µ
62
Using z
Probability that X (IQ) lies between 70 and
130 AND ALSO Probability that z lies between
-1.96 and 1.96.
Probability that X (IQ) is at least 130 AND ALSO
Probability that z is at least 1.96
0.95
X s 1.96s µ
µ 1.96s M µ
1.96sM µ µ 1.96sM z
-1.96 0
1.96
63
Referring to z
  • A question about a range of values of ANY
    normally distributed variable can always be
    translated into a question about a range of
    values of the standard normal variable z.
  • Just subtract the mean and divide by the standard
    deviation.
  • If your question is about a range of values for
    the MEAN, you must divide by the STANDARD ERROR,
    not the original population SD.

64
Question
  • If I select 9 IQs at random and take their mean
    M, what is the probability that M is at least
    110?

65
Answer
66
Important
  • If your question is about MEANS, divide by the
    STANDARD ERROR OF THE MEAN sM, not the standard
    deviation of the original population.

67
The difference between 2 random variables
  • Let D X Y
  • D is also a random variable.

X
Y
D X Y
68
The variance of the difference
  • The variance of the difference between 2
    independent random variables is the SUM of their
    variances.

D X -Y
Y
X
69
Explanation
70
Demonstration
  • The Descriptives procedure shows that the
    variance of the difference is approximately equal
    to the variance of the sum.
  • In the population, they are EXACTLY equal.

71
Summary
  • Review of regression.
  • The COEFFICIENT OF DETERMINATION r2 is the
    proportion of the variance of the target,
    criterion or dependent variable accounted for by
    regression.
  • The gathering of data is an EXPERIMENT OF CHANCE,
    to which the concept of RANDOM VARIABLE is
    applicable.

72
Summary
  • The variance of the sum of INDEPENDENT random
    variables (separate barrels) is the sum of their
    variainces.
  • The variance of the difference between 2 random
    variables is also the SUM of their variances.
  • The distribution of means is the SAMPLING
    DISTRIBUTION OF THE MEAN.
  • The STANDARD ERROR OF THE MEAN is the SD of the
    sampling distribution of the mean.

73
Summary
  • If the parent population is normal, so is the
    sampling distribution of the mean.
  • A question about the probability of a range of
    values of ANY normal variable can be referred to
    a question about ranges of z, the STANDARD NORMAL
    VARIABLE.
  • When standardising a value of M, however, use the
    standard ERROR, not the SD of the parent
    population.

74
Question
  • What is the probability that a random sample of
    16 IQs will have a mean between 92.5 and 100?
Write a Comment
User Comments (0)
About PowerShow.com