Title: ARCH 21266126
1ARCH 2126/6126
- Session 7 Hypothesis-testing with numbers
2Sequence of topics so far
- Defining variables values
- Measuring recording
- Visualizing and summarizing data
- Measures of central tendency
- Measures of dispersion
- In fact, descriptive statistics
- Sampling
- Probability
3Descriptive statistics already begin the search
for pattern
- More explicitly theoretical approaches extend
that search - A hypothesis is a proposition put forward to
explain observed facts - To be useful in empirical studies, a hypothesis
must be testable - Testable means open to disproof
- A theory is a set of hypotheses which have been
tested and not disproved
4M. R. Dawkins
- Statistical description is the quest for
patterns or rules which permit reduction in the
quantity of data without undue loss of
information. Some reduction is essential if the
description is to be useful. One cannot publish
ones field notebook. Clearly some data must be
thrown away. (1974)
5Inferential statistics
- Depend on a notion of probability
- Sometimes we may have a model where all outcomes
are equally likely, e.g. dice - Even simple models have non-obvious outcomes when
put together, e.g. 2 dice - Artificial examples e.g. dice are used because
simple to understand but there are real-world
counterparts - sex ratio
6More complicated distributions
- What kind of pattern emerges in dice-rolling?
- Resemblance to the normal distribution?
- Any resemblance not coincidental
- Many real-world outcomes which result from
multiple independent processes approximate this
normal distribution
7Distributions and models
- Different variables may have different
distributions - Different models may lead us to expect different
distributions - Thus when testing the fit of a model to data, or
when testing the resemblance of one data-set to
another, our test will, where possible, take
distributions into account, may make
assumptions - On many models, not all outcomes are equally
likely, even with random sampling
8A sample statistic estimates a population
parameter
- E.g. a percentage or a mean
- Can we get an idea how minor or major the error
in estimation can be? - An idea/model of distribution can help
- There are different ways in different cases to
estimate confidence intervals
9Confidence intervals
- Confidence intervals (or error ranges) are ranges
around the sample statistic - They have defined probabilities of covering the
population parameter - The higher the confidence the user requires of
covering the population parameter, the wider the
confidence interval will be
10In the case of a mean ...
- We are helped particularly by the finding that
the means of all possible re-samplings from a
distribution have a normal distribution
themselves - Central limit theorem
- Using this, it can be shown that for large
samples this distribution has a SD of s/?n (
standard error of mean)
11Confidence intervals for population mean
- From SE of mean we can get confidence intervals
for mean - 95 CI is given, in large samples, by mean ?
(SE1.96) - 99 CI given by mean ? (SE2.58)
- 99.9 CI given by mean ? (SE3.29)
- These state the level of confidence with which we
can say that the population mean lies within the
stated limits - Conditions apply SRS and large n
12(No Transcript)
13Central limit theorem
- Special batch consists of the means of all
possible samples of a given size that could be
drawn from a given population - Mean of special batch mean of population
- SD of special batch SD of population / square
root of sample size - Special batch has normal distribution, for large
sample size - Large in this case means over about 30
14So lets use Excel to calculate a standard error
of a mean
- Enter an invented data set 20 numbers
- Calculate mean and SD as before
- Take square root of N
- Divide SD by ?N to get SE
- Multiply SE by 1.96
- Add that number to mean to get upper limit
subtract it to get lower limit - From upper to lower limit is 95 confidence
interval for large N, chance that true
population mean lies in between those numbers is
95
15So what kinds of hypothesis can we test?
- A common kind of hypothesis posits average
differences between groups, e.g. lengths of
scrapers, statures of people - These might be important because they might
suggest different artefact traditions, different
nutrition/health conditions etc. - Will use these as examples but many of the
points are more general - E.g. we might have a hypothesis that longer
scrapers are broader, or made in a different stone
16Types of hypothesis
- In inferential statistics, we start from a base
you may find counter-intuitive - We begin with the simplest possibility (the
so-called Occams razor) - This may be that, for example, two samples are
similar enough to have been drawn from the same
population - Thus we have the null hypothesis or H0 the
proposition of no difference, no effect or
no association
17Testing the null hypothesis
- If the null hypothesis can be disproved, then the
(more interesting) alternative hypothesis H1 -
that there is an effect, difference, association
etc. - becomes the simplest available hypothesis - We cannot actually prove H1
- We also cannot show causes - we may be right for
the wrong reason - But we can test hypotheses
18Imagine, then, two batches of numbers...
- ... representing, say, lengths of scrapers
excavated from two sites - Are they the same or different?
- Well, you wouldnt expect them to be exactly the
same, would you? - Why not? Sampling error, even if sampling from
one population - So are they basically the same? Or are they
different in more than just sampling error?
19Statistical significance
- Or as a statistician would say, are the lengths
significantly different? - Significance has become a term of art in
statistics, has a specific meaning, not
equivalent to social or biological or
archaeological significance - Refers to low probability that the two batches
could have been drawn from the same population
(or 2 populations with same mean)
20Suppose we do re-sample from one population
- We will find that sample means vary
- But in a regular way that depends on sample size
population distribution - We can invent an artificial population and
re-sample from it to simulate this - Can also show the mathematical properties of this
- Sample means cluster around the population mean
with given dispersion
21So the question is ...
- Are the two real sample means only as different
as you would expect by sampling error? Or more
different? - In reality this is a question of probability how
likely is it that the difference we observe could
have arisen by re-sampling from the same
population? - A dramatic difference is obvious by eye but
often they arent so dramatic, so we test
statistically
22Significance and p-values
- If it is very unlikely that a difference as large
as we observe could have been produced by
re-sampling, we may be inclined to reject the
null hypothesis say there is a significant
difference - Convention is that if the probability (p) that
the null hypothesis is correct is less than (lt)
5 (0.05), we say the result is significant at
the 5 level
23More on p-values
- Significance at the 5 level plt0.05
- Similarly for plt0.01, plt0.001 etc.
- The smaller the number, the stronger the
rejection of the null hypothesis - In many publications you will see results
asterisked or bolded if they reach 5 or stated
significance level and ignored, omitted or
N.S. if not
24The problem of many tests
- To accept any result where plt0.05 as
significant also implies that on about 1/20
occasions a result will appear significant even
if H0 is true - Psychiatrists seeking a difference between
schizophrenics others did 77 tests and found
significant (plt0.05) differences in 2 of these
tests - Corrections can be made or we can decide to
accept only plt0.01 or less - But still beware seeking significance!
25How likely is our test to mislead us?
- Risk of rejecting H0 when it is true is known as
a Type I Error - It is given by the p-value we choose
- Risk of accepting the H0 when it is false is
known as a Type II Error - It is sometimes hard to calculate but clearly
rises, the lower the p-value - Its converse is the power of the test
26What is so magic about 5?
- Nothing
- Its just the number Sir R.A. Fisher thought of,
to represent low probability - He felt that if that kind of difference only
occurred by chance once in 20 times, it was rare
enough to indicate against H0 - But of course this is arbitrary
27Classical statistics versus exploratory data
analysis
- The classical approach to hypothesis testing
just sketched was developed 1920-1950 by Fisher
others - Gives a clear yes/no, accept/reject
- But there is no certainty in statistics
- To force yes/no from analysis which really says
highly likely or probably not is artificial - Exploratory data analysis advocates simply
stating the p-values, whatever