Last week - PowerPoint PPT Presentation

1 / 74

About This Presentation

Title:

Last week

Description:

When the data are scarce, the usual chi-square test can give a misleading result. ... There are countless normal distributions. ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 75

Provided by: colin111

Category:

more less

Transcript and Presenter's Notes

Title: Last week

1
Last week

How to run chi-square tests of association on
SPSS.
When the data are scarce, the usual chi-square
test can give a misleading result.
If there are warnings about low expected
frequencies, run an EXACT TEST.

2
Regression

Regression is the other side of the associative
coin.
We use a correlation coefficient to measure the
STRENGTH of an association.
We can use regression to PREDICT values of one
variable from those of the other.

3
The regression line of Violence upon Preference

The REGRESSION LINE is the line that fits the
points best according to the LEAST SQUARES
criterion.

4
The regression line
Y (Violence)
P
Q
2.091
intercept
(0, 0)
X (Exposure)
5
Errors or residuals
Y (Violence)
Intercept 2.091
(0, 0)
9
X (Exposure)
6
Variance accounted for

An important aspect of regression is accounting
for VARIANCE in the target or criterion variable
in terms of regression upon the independent
variable or predictor.

7
The total variance of the target variable
MY
These are the deviations from the MY line from
which the total variance of the scores on the
target variable can be calculated.
8
Prediction without regression

When the variables show no association, the slope
of the regression line is zero and the line runs
horizontally through the mean MY of the criterion
or dependent variable.
The intercept (B0) is MY in this case.

9
Deviations from MY of points on the regression
line
From these deviations, the variance attributable
to regression can be calculated.
10
Regression line residuals
Here are the errors or residuals when regression
is used. A variance estimate can be calculated
from the residuals.
11
Variance accounted for by regression

The deviations from MY of points on the line are
the basis of the variance of the criterion
variable Y accounted for by regression on X.

12
Coefficient of determination

The PROPORTION of the variance of the target or
criterion variable (Actual Violence) accounted
for by regression on the regressor or independent
variable (Exposure) is known as the COEFFICIENT
OF DETERMINATION and is the square of the Pearson
correlation.

13
Coefficient of determination (r2)
14
In our example
15
Theoretical importance of r2

Often the predictions of individual cases from
regression arent very accurate.
Researchers are usually more interested in the
coefficient of determination, because it provides
a measure of the extent to which a predictor or
regressor variable accounts for variance in the
target variable.
If r2 is high, the inference is that the
abilities involved in the regressor are
implicated in the target variable.

16
Lecture 11RUNNING SIMPLE REGRESSION ON
SPSS
17
Finding regression
18
Saving the predictions and residuals

Its useful to save the predicted values and
residuals (errors) from the regression equation.
Click the Save button at the foot of the Linear
Regression dialog.

Click the Savebutton to save the predicted
values.
19
The regression coefficients

The only entries that concern us just now are
those under the B column (Unstandardised
Coefficients).
We see that B0 2.091 and B 0.736. These are,
respectively, the intercept and the slope of the
regression equation.

20
Coefficient of determination

The values given for the Pearson correlation and
the coefficient of determination r2 have been
ringed.

21
Data View

I have renamed the estimates of Y and the
residuals in the two rightmost columns.
Notice that the residuals are Actual Violence
scores minus the predictions from regression.
When a residual has a negative sign, it means
that the actual Y value was below the regression
line.

22
Importance of regression

Regression is used extensively in some areas of
psychological research, such as health
psychology.
You will be hearing much more about it next year.

23
Revision of the normal curve
24
Hypothesis testing

In my final lecture, I shall be considering the
t-testing procedure that you used to analyse the
data from the inverted faces and landscapes
experiment.
This will involve a more general discussion of
HYPOTHESIS TESTING.
This afternoon, I shall try to lay the
foundations.

25
Normal distribution

A NORMAL DISTRIBUTION is symmetrical and
bell-shaped.
If a variable is normally distributed, 95 of
values lie within 1.96 standard deviations (2
approx.) on EITHER side of the mean.

0.95 (95)
2 ½ .025
2 ½ .025
mean
mean 1.96SD
mean 1.96SD
26
Specifying a normal distribution

Suppose that a variable X has a normal
distribution with mean µ and standard deviation
s.
We write this as shown.

27
The standard normal variable z

If X is a normal variable, that is,
XN(µ,s),
z will also be normally distributed.
z is known as the STANDARD NORMAL VARIABLE.

28
Standardisation

Strictly speaking, z is defined in relation to
the theoretical normal population mean µ.
However, any set of scores X can be STANDARDISED
by subtracting the sample mean from each score
and dividing by the sample standard deviation.

29
Effects of standardisation

Standardising a set of scores (or a population
of scores) has two effects
The mean becomes zero
The standard deviation becomes 1.

30
The standard normal distribution

In the notation I introduced earlier, we can
represent the standard normal distribution as
follows.

31
Distribution of z

Standardising a set of scores does NOT make them
normally distributed.
If theres a tail to the right (ve skew) before
transforming X to z, there will be one after the
transformation.
Nevertheless, whatever the shape of the original
distribution, the mean standardised score will be
zero and the standard deviation will be 1.

32
Referring to z

A question about the probability of a range of
values of ANY normally distributed variable X
with mean µ and standard deviation s can be
translated into a question about an equivalent
range of the standard normal variable z.

33
Using z
Probability that X (IQ) lies between 70 and
130 AND ALSO Probability that z lies between
-1.96 and 1.96.
Probability that X (IQ) is at least 130 AND ALSO
Probability that z is at least 1.96
0.95
X (IQ) 70 100
130 100 1.96SD z
-1.96 0
1.96
34
Referring questions from X to z

What is the probability of an IQ of at least 130?
This is to ask about the probability that X is at
least 130, where X N(100, 15).
Transform X to z z (130 100)/15 2.
We know from memory that the probability of z
greater than 2 (actually 1.96) .025.

35
Probability of an IQ between 100 and 130?

Convert these values to values of z.
If X 100, z 0.
If X 30, z 2.
Pr(z between 0 and 2) 0.95/2 0.475.

0.95 (95)
2 ½ .025
2 ½ .025
µ
µ 1.96SD
X µ 1.96SD
z -1.96 0 1.96
36
Finding the probability of a range of values of
X

In the problems we have considered, the value of
z has always been around 2 (about 1.96), so that
we can find the probability from memory.
Suppose z 1, 0.5, or any value other than
1.96?
Just standardise the value of X by converting it
to z z (X mean)/SD.
The are available tables in standard statistics
textbooks which give probabilities of ANY
specified range of values of z. You can also use
the SPSS cumulative distribution function CDF to
find such probabilities.

37
Tables

There are countless normal distributions.
But there is only ONE standard normal
distribution, to which any of the others can be
transformed by z (X mean)/SD.
So only the probabilities of ranges of values of
z need to be tabled.
It would not be feasible to table the
probabilities for ALL possible normal
distributions.

38
To sum up

If we know the DISTRIBUTION of some variable, we
can assign a probability of obtaining a value
within a specified range.
We can visualise the probability of such a value
as the area under the curve of the distribution.
If the distribution is normal, we can translate
probability questions in the original units of
measurement into questions about ranges of z,
which, provided X is normally distributed, has
the STANDARD NORMAL DISTRIBUTION.

39
Random variables

A RANDOM VARIABLE is defined in the context of an
experiment of chance, such as rolling a die or
tossing a coin.
Let X be the result of rolling a die. X is a
random variable.
Let Y be the number of heads in 10 tosses of a
coin. Y is a random variable.

40
Gathering data

The gathering of data is an experiment of chance.
Essentially, you are SELECTING observations at
random from POPULATIONS of possible observations.
The value of a variable that you are observing is
a RANDOM variable.
In the Caffeine experiment, the value of a score
under the Caffeine condition is a random
variable, as is that of a score under the Placebo
condition.

41
Drawing tickets

This barrel contains numbered tickets.
I draw a ticket at random. Let X be the number
on the ticket. X is a random variable.

42
Drawing tickets

Early studies of probability involved drawing
real tickets from real barrels. Nowadays, we use
the computer.
Computers enable us to specify the distribution
from which we are drawing our ticket.

43
Using the computer

The 4000 IQs were sampled by first placing 4000
numbers in the column named Ones. Any numbers
will do.
Now we are going to draw another sample of 4000
IQs from the same population.

44
Resampling from the IQ population
45
Descriptives

I put some more ones in the first column 48,
000, actually!
As expected with samples this size, their means
and SDs are very similar and close to the
theoretical mean of 100 and standard deviation of
15.

46
Independent random variables

Here are two barrels, each containing numbered
tickets.
Let X be the number of the ticket that I draw
from the first barrel. Let Y be the number of the
ticket that I draw from the second barrel.
X and Y are INDEPENDENT random variables, because
the value of X does not determine the value of Y
or vice versa.

Let X be a value drawn from this barrel.
Let Y be a value drawn from this barrel.
47
The sum of random variables

X and Y are random variables.
Let SUM X Y
SUM is also a random variable.

X
Y
SUM X Y
48
The variance of the sum

The variance of the sum of INDEPENDENT random
variables is the sum of their variances.

X Y
Y
X
49
Create the SUM

In a fourth column, the sum of IQ and IQtwo will
appear.
You can see that the variance of SUM is very
close to twice the variance of either IQ or
IQtwo.

50
Taking a sample
51
Variance of S X
52
Adding and multiplying by a constant
s2
Adding a constant k
M
M k
k2s2
s2
Multiplying by a constant k
M
kM
53
Effect upon the variance of multiplying by a
constant

When you multiply by a constant, you multiply the
variance by the SQUARE of the constant.

54
Variance of the mean
The variance of the means is the original
variance s2 divided by n, the size of the
sample.
55
The variance of the mean

The barrel on the right contains means of samples
of size n drawn from the barrel on the left.
The variance of the means is the original
variance divided by n.

X
M
s2
s2/n
56
Standard error of the mean

The STANDARD ERROR OF THE MEAN sM is the square
root of the variance of the mean.

57
Sampling distribution

The distribution of a STATISTIC such as the mean
is known as its SAMPLING DISTRIBUTION.
If X is normally distributed, then the mean M is
also normally distributed.
The sampling distribution of M is normal and
centred on the mean µ of the variable X.
The variance of the sampling distribution of M is
s2/n.
The standard deviation of the sampling
distribution of the mean (the standard error of
the mean) sM is s/vn .

58
Sampling distribution of the mean
59
Effect of increasing n

IQ N(100, 15)
Let M4, M16 and M64 be means of samples of size n
4, 16 and 64 respectively.
The values of sM are, respectively, 7.5, 3.75 and
1.88.

60
Effect of sample size n
61
Effect of increasing the sample size n
µ
62
Using z
Probability that X (IQ) lies between 70 and
130 AND ALSO Probability that z lies between
-1.96 and 1.96.
Probability that X (IQ) is at least 130 AND ALSO
Probability that z is at least 1.96
0.95
X s 1.96s µ
µ 1.96s M µ
1.96sM µ µ 1.96sM z
-1.96 0
1.96
63
Referring to z

A question about a range of values of ANY
normally distributed variable can always be
translated into a question about a range of
values of the standard normal variable z.
Just subtract the mean and divide by the standard
deviation.
If your question is about a range of values for
the MEAN, you must divide by the STANDARD ERROR,
not the original population SD.

64
Question

If I select 9 IQs at random and take their mean
M, what is the probability that M is at least
110?

65
Answer
66
Important

If your question is about MEANS, divide by the
STANDARD ERROR OF THE MEAN sM, not the standard
deviation of the original population.

67
The difference between 2 random variables

Let D X Y
D is also a random variable.

X
Y
D X Y
68
The variance of the difference

The variance of the difference between 2
independent random variables is the SUM of their
variances.

D X -Y
Y
X
69
Explanation
70
Demonstration

The Descriptives procedure shows that the
variance of the difference is approximately equal
to the variance of the sum.
In the population, they are EXACTLY equal.

71
Summary

Review of regression.
The COEFFICIENT OF DETERMINATION r2 is the
proportion of the variance of the target,
criterion or dependent variable accounted for by
regression.
The gathering of data is an EXPERIMENT OF CHANCE,
to which the concept of RANDOM VARIABLE is
applicable.

72
Summary

The variance of the sum of INDEPENDENT random
variables (separate barrels) is the sum of their
variainces.
The variance of the difference between 2 random
variables is also the SUM of their variances.
The distribution of means is the SAMPLING
DISTRIBUTION OF THE MEAN.
The STANDARD ERROR OF THE MEAN is the SD of the
sampling distribution of the mean.

73
Summary

If the parent population is normal, so is the
sampling distribution of the mean.
A question about the probability of a range of
values of ANY normal variable can be referred to
a question about ranges of z, the STANDARD NORMAL
VARIABLE.
When standardising a value of M, however, use the
standard ERROR, not the SD of the parent
population.

74
Question