Title: Last week
1Last week
- How to run chi-square tests of association on
SPSS. - When the data are scarce, the usual chi-square
test can give a misleading result. - If there are warnings about low expected
frequencies, run an EXACT TEST.
2Regression
- Regression is the other side of the associative
coin. - We use a correlation coefficient to measure the
STRENGTH of an association. - We can use regression to PREDICT values of one
variable from those of the other.
3The regression line of Violence upon Preference
- The REGRESSION LINE is the line that fits the
points best according to the LEAST SQUARES
criterion. -
-
4The regression line
Y (Violence)
P
Q
2.091
intercept
(0, 0)
X (Exposure)
5Errors or residuals
Y (Violence)
Intercept 2.091
(0, 0)
9
X (Exposure)
6Variance accounted for
- An important aspect of regression is accounting
for VARIANCE in the target or criterion variable
in terms of regression upon the independent
variable or predictor.
7The total variance of the target variable
MY
These are the deviations from the MY line from
which the total variance of the scores on the
target variable can be calculated.
8Prediction without regression
- When the variables show no association, the slope
of the regression line is zero and the line runs
horizontally through the mean MY of the criterion
or dependent variable. - The intercept (B0) is MY in this case.
9Deviations from MY of points on the regression
line
From these deviations, the variance attributable
to regression can be calculated.
10Regression line residuals
Here are the errors or residuals when regression
is used. A variance estimate can be calculated
from the residuals.
11Variance accounted for by regression
- The deviations from MY of points on the line are
the basis of the variance of the criterion
variable Y accounted for by regression on X.
12Coefficient of determination
- The PROPORTION of the variance of the target or
criterion variable (Actual Violence) accounted
for by regression on the regressor or independent
variable (Exposure) is known as the COEFFICIENT
OF DETERMINATION and is the square of the Pearson
correlation.
13Coefficient of determination (r2)
14In our example
15Theoretical importance of r2
- Often the predictions of individual cases from
regression arent very accurate. - Researchers are usually more interested in the
coefficient of determination, because it provides
a measure of the extent to which a predictor or
regressor variable accounts for variance in the
target variable. - If r2 is high, the inference is that the
abilities involved in the regressor are
implicated in the target variable.
16Lecture 11RUNNING SIMPLE REGRESSION ON
SPSS
17Finding regression
18Saving the predictions and residuals
- Its useful to save the predicted values and
residuals (errors) from the regression equation. - Click the Save button at the foot of the Linear
Regression dialog.
Click the Savebutton to save the predicted
values.
19The regression coefficients
- The only entries that concern us just now are
those under the B column (Unstandardised
Coefficients). - We see that B0 2.091 and B 0.736. These are,
respectively, the intercept and the slope of the
regression equation.
20Coefficient of determination
- The values given for the Pearson correlation and
the coefficient of determination r2 have been
ringed.
21Data View
- I have renamed the estimates of Y and the
residuals in the two rightmost columns. - Notice that the residuals are Actual Violence
scores minus the predictions from regression. - When a residual has a negative sign, it means
that the actual Y value was below the regression
line.
22Importance of regression
- Regression is used extensively in some areas of
psychological research, such as health
psychology. - You will be hearing much more about it next year.
23Revision of the normal curve
24Hypothesis testing
- In my final lecture, I shall be considering the
t-testing procedure that you used to analyse the
data from the inverted faces and landscapes
experiment. - This will involve a more general discussion of
HYPOTHESIS TESTING. - This afternoon, I shall try to lay the
foundations.
25Normal distribution
- A NORMAL DISTRIBUTION is symmetrical and
bell-shaped. - If a variable is normally distributed, 95 of
values lie within 1.96 standard deviations (2
approx.) on EITHER side of the mean.
0.95 (95)
2 ½ .025
2 ½ .025
mean
mean 1.96SD
mean 1.96SD
26Specifying a normal distribution
- Suppose that a variable X has a normal
distribution with mean µ and standard deviation
s. - We write this as shown.
27The standard normal variable z
- If X is a normal variable, that is,
- XN(µ,s),
- z will also be normally distributed.
- z is known as the STANDARD NORMAL VARIABLE.
28Standardisation
- Strictly speaking, z is defined in relation to
the theoretical normal population mean µ. - However, any set of scores X can be STANDARDISED
by subtracting the sample mean from each score
and dividing by the sample standard deviation.
29Effects of standardisation
- Standardising a set of scores (or a population
of scores) has two effects - The mean becomes zero
- The standard deviation becomes 1.
30The standard normal distribution
- In the notation I introduced earlier, we can
represent the standard normal distribution as
follows.
31Distribution of z
- Standardising a set of scores does NOT make them
normally distributed. - If theres a tail to the right (ve skew) before
transforming X to z, there will be one after the
transformation. - Nevertheless, whatever the shape of the original
distribution, the mean standardised score will be
zero and the standard deviation will be 1.
32Referring to z
- A question about the probability of a range of
values of ANY normally distributed variable X
with mean µ and standard deviation s can be
translated into a question about an equivalent
range of the standard normal variable z.
33Using z
Probability that X (IQ) lies between 70 and
130 AND ALSO Probability that z lies between
-1.96 and 1.96.
Probability that X (IQ) is at least 130 AND ALSO
Probability that z is at least 1.96
0.95
X (IQ) 70 100
130 100 1.96SD z
-1.96 0
1.96
34Referring questions from X to z
- What is the probability of an IQ of at least 130?
- This is to ask about the probability that X is at
least 130, where X N(100, 15). - Transform X to z z (130 100)/15 2.
- We know from memory that the probability of z
greater than 2 (actually 1.96) .025.
35Probability of an IQ between 100 and 130?
- Convert these values to values of z.
- If X 100, z 0.
- If X 30, z 2.
- Pr(z between 0 and 2) 0.95/2 0.475.
0.95 (95)
2 ½ .025
2 ½ .025
µ
µ 1.96SD
X µ 1.96SD
z -1.96 0 1.96
36Finding the probability of a range of values of
X
- In the problems we have considered, the value of
z has always been around 2 (about 1.96), so that
we can find the probability from memory. - Suppose z 1, 0.5, or any value other than
1.96? - Just standardise the value of X by converting it
to z z (X mean)/SD. - The are available tables in standard statistics
textbooks which give probabilities of ANY
specified range of values of z. You can also use
the SPSS cumulative distribution function CDF to
find such probabilities.
37Tables
- There are countless normal distributions.
- But there is only ONE standard normal
distribution, to which any of the others can be
transformed by z (X mean)/SD. - So only the probabilities of ranges of values of
z need to be tabled. - It would not be feasible to table the
probabilities for ALL possible normal
distributions.
38To sum up
- If we know the DISTRIBUTION of some variable, we
can assign a probability of obtaining a value
within a specified range. - We can visualise the probability of such a value
as the area under the curve of the distribution. - If the distribution is normal, we can translate
probability questions in the original units of
measurement into questions about ranges of z,
which, provided X is normally distributed, has
the STANDARD NORMAL DISTRIBUTION.
39Random variables
- A RANDOM VARIABLE is defined in the context of an
experiment of chance, such as rolling a die or
tossing a coin. - Let X be the result of rolling a die. X is a
random variable. - Let Y be the number of heads in 10 tosses of a
coin. Y is a random variable.
40Gathering data
- The gathering of data is an experiment of chance.
- Essentially, you are SELECTING observations at
random from POPULATIONS of possible observations. - The value of a variable that you are observing is
a RANDOM variable. - In the Caffeine experiment, the value of a score
under the Caffeine condition is a random
variable, as is that of a score under the Placebo
condition.
41Drawing tickets
- This barrel contains numbered tickets.
- I draw a ticket at random. Let X be the number
on the ticket. X is a random variable.
42Drawing tickets
- Early studies of probability involved drawing
real tickets from real barrels. Nowadays, we use
the computer. - Computers enable us to specify the distribution
from which we are drawing our ticket.
43Using the computer
- The 4000 IQs were sampled by first placing 4000
numbers in the column named Ones. Any numbers
will do. - Now we are going to draw another sample of 4000
IQs from the same population.
44Resampling from the IQ population
45Descriptives
- I put some more ones in the first column 48,
000, actually! - As expected with samples this size, their means
and SDs are very similar and close to the
theoretical mean of 100 and standard deviation of
15.
46Independent random variables
- Here are two barrels, each containing numbered
tickets. - Let X be the number of the ticket that I draw
from the first barrel. Let Y be the number of the
ticket that I draw from the second barrel. - X and Y are INDEPENDENT random variables, because
the value of X does not determine the value of Y
or vice versa.
Let X be a value drawn from this barrel.
Let Y be a value drawn from this barrel.
47The sum of random variables
- X and Y are random variables.
- Let SUM X Y
- SUM is also a random variable.
X
Y
SUM X Y
48The variance of the sum
- The variance of the sum of INDEPENDENT random
variables is the sum of their variances.
X Y
Y
X
49Create the SUM
- In a fourth column, the sum of IQ and IQtwo will
appear. - You can see that the variance of SUM is very
close to twice the variance of either IQ or
IQtwo.
50Taking a sample
51Variance of S X
52Adding and multiplying by a constant
s2
Adding a constant k
M
M k
k2s2
s2
Multiplying by a constant k
M
kM
53Effect upon the variance of multiplying by a
constant
- When you multiply by a constant, you multiply the
variance by the SQUARE of the constant.
54Variance of the mean
The variance of the means is the original
variance s2 divided by n, the size of the
sample.
55The variance of the mean
- The barrel on the right contains means of samples
of size n drawn from the barrel on the left. - The variance of the means is the original
variance divided by n.
X
M
s2
s2/n
56Standard error of the mean
- The STANDARD ERROR OF THE MEAN sM is the square
root of the variance of the mean.
57Sampling distribution
- The distribution of a STATISTIC such as the mean
is known as its SAMPLING DISTRIBUTION. - If X is normally distributed, then the mean M is
also normally distributed. - The sampling distribution of M is normal and
centred on the mean µ of the variable X. - The variance of the sampling distribution of M is
s2/n. - The standard deviation of the sampling
distribution of the mean (the standard error of
the mean) sM is s/vn .
58Sampling distribution of the mean
59Effect of increasing n
- IQ N(100, 15)
- Let M4, M16 and M64 be means of samples of size n
4, 16 and 64 respectively. - The values of sM are, respectively, 7.5, 3.75 and
1.88.
60Effect of sample size n
61Effect of increasing the sample size n
µ
62Using z
Probability that X (IQ) lies between 70 and
130 AND ALSO Probability that z lies between
-1.96 and 1.96.
Probability that X (IQ) is at least 130 AND ALSO
Probability that z is at least 1.96
0.95
X s 1.96s µ
µ 1.96s M µ
1.96sM µ µ 1.96sM z
-1.96 0
1.96
63Referring to z
- A question about a range of values of ANY
normally distributed variable can always be
translated into a question about a range of
values of the standard normal variable z. - Just subtract the mean and divide by the standard
deviation. - If your question is about a range of values for
the MEAN, you must divide by the STANDARD ERROR,
not the original population SD.
64Question
- If I select 9 IQs at random and take their mean
M, what is the probability that M is at least
110?
65Answer
66Important
- If your question is about MEANS, divide by the
STANDARD ERROR OF THE MEAN sM, not the standard
deviation of the original population.
67The difference between 2 random variables
- Let D X Y
- D is also a random variable.
X
Y
D X Y
68The variance of the difference
- The variance of the difference between 2
independent random variables is the SUM of their
variances.
D X -Y
Y
X
69Explanation
70Demonstration
- The Descriptives procedure shows that the
variance of the difference is approximately equal
to the variance of the sum. - In the population, they are EXACTLY equal.
71Summary
- Review of regression.
- The COEFFICIENT OF DETERMINATION r2 is the
proportion of the variance of the target,
criterion or dependent variable accounted for by
regression. - The gathering of data is an EXPERIMENT OF CHANCE,
to which the concept of RANDOM VARIABLE is
applicable.
72Summary
- The variance of the sum of INDEPENDENT random
variables (separate barrels) is the sum of their
variainces. - The variance of the difference between 2 random
variables is also the SUM of their variances. - The distribution of means is the SAMPLING
DISTRIBUTION OF THE MEAN. - The STANDARD ERROR OF THE MEAN is the SD of the
sampling distribution of the mean.
73Summary
- If the parent population is normal, so is the
sampling distribution of the mean. - A question about the probability of a range of
values of ANY normal variable can be referred to
a question about ranges of z, the STANDARD NORMAL
VARIABLE. - When standardising a value of M, however, use the
standard ERROR, not the SD of the parent
population.
74Question
- What is the probability that a random sample of
16 IQs will have a mean between 92.5 and 100?