Statistics - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Statistics

Description:

How much pollution do cars produce for each mile traveled? ... checking only a portion of the dogwood trees or cars. ... Pictures courtesy of Dr. Gary Lewis ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 41
Provided by: ITS2
Category:
Tags: statistics

less

Transcript and Presenter's Notes

Title: Statistics


1
Statistics
2
OUTLINE
  • Why learn statistics?
  • Sampling
  • Measures of Central Tendency
  • Margin of Error, Confidence Levels, and Standard
  • Deviation
  • Sample Size
  • Correlation and Causation
  • Systematic and Random Errors
  • Precision and Accuracy

3
WHY LEARN STATISTICS
  • Using statistics can help us make decisions.
  • For example, if you are choosing between two
    careers, you might want to know
  • a) which one has better prospects for finding a
    job?
  • b) which one pays better?
  • c) which one provides better job
    satisfaction?

4
Simply asking people who work in those jobs gives
a lot of different answers.
But statistically analyzing the data from a
survey gives you a firm basis for making your
decision.
5
Mars meteorite
O.J. Simpson Trial
Louise Woodward Trial
Did scientists agree in all of these cases?
6
Why do they disagree?
Scientists may believe that other investigators
made mistakes in doing one of the following
7
By learning some basics about how to analyze
data, you will be in a better position to
understand the disagreements between scientists
and to draw your own conclusions.
8
SAMPLING
Lets say you want to find out about a career as
an INFORMATION TECHNOLOGIST.
You cannot question all of the information
technologists in the country! You cannot even
question all of those in Atlanta!
Instead, you must do a survey of a few
information technologists.
The same problem occurs in finding scientific
information.
9
SAMPLING FOR SCIENTIFIC STUDY
  • What proportion of the dogwood trees in Georgia
    are infected with the disease anthracnose?
  • How much pollution do cars produce for each mile
    traveled?
  • How much water does the average person use each
    day?
  • How much carbon dioxide do our power plants
    release into the air each year?

10
It would be very expensive, very time consuming,
or impossible to survey all of the dogwood trees
or measure pollution levels of all cars.
Instead, we take a SAMPLE, checking only a
portion of the dogwood trees or cars. We
assume this sample is representative of all of
the dogwood trees in Georgia, or of all cars.
That is, if 50 of the dogwood trees in our
sample are infected, then 50 of the dogwoods in
Georgia are infected.
11
BUT ...
  • how do we know this is true?
  • If the only trees we examined were in Cobb
    County, could we really extrapolate our findings
    to all of Georgia?
  • No, we must have a RANDOM SAMPLE.
  • That is, every dogwood tree in the state must
    have an equal chance of being part of our sample.
  • If the sample is not random , the data are not
    useful.

12
How do we make sure we have a RANDOM SAMPLE?
  • Ensuring that samples are random is difficult and
    a common error.
  • For sampling trees, one might divide the state up
    into 10 acre plots, number them, decide which
    ones to check by drawing numbers from a hat, then
    surveying only those plots.
  • How might you sample pollution rate of cars?

13
MEASURES OF CENTRAL TENDENCY
When you take an exam, what is the first piece of
information you look for after you find out your
grade?
Most students want to know the average.
Why? Because they want to know how their grade
compares with that of most of the other students.
14
When scientists do a study, they also want to
know about the average of something - the average
size, the average number, the average whatever.
For example, consider the study of the proportion
of infected dogwood trees in Georgia.
  • Not all of the plots checked will have the same
    percent of trees infected.
  • Graphing the number of plots that show each of
    the different percentages produces a bell-shaped
    curve.

Examination of this curve (which is from
imaginary data) indicates that the percent of
infected trees in most plots is somewhere near
50.
15
Three measures of this central tendency are
theMEAN, the MEDIAN, and the MODE.
  • MEAN the average
  • the sum of the results from each sample
  • divided by the number of
    samples
  • MEDIAN the middle value, or middle result
  • 50 of the samples have that value
    or
  • more and 50 have that value
    or less.
  • MODE the most common result.

16
Example
Average age in this class 21.7 Median age in
this class 20 Mode of ages in this class 19
17
MARGIN OF ERROR, CONFIDENCE LEVELS, AND STANDARD
DEVIATION
  • Margin of Error
  • Averaging the results of the samples gives an
    answer that is close to the actual percent of
    infected trees.
  • But because the answer comes only from samples,
    it is imprecise. It is not exact.
  • Yet the real answer is probably close to it.

18
For example, if our sample shows that an average
of 50 of the dogwood trees in our sample are
infected, then we can be fairly sure that the
real answer is between 40 and 60. This range
is called the margin of error and is usually
expressed as the average (plus or minus) some
number. Here we would say that 50 10 are
infected.
19
Lets say that you hear that the proportion of the
population favoring a particular candidate is 43
7. This means that the actual number favoring
that candidate lies between what two numbers?
20
Answer Between 36 and 50.
If the average number of offspring for the
females of a breed of rabbit was presented as 12
5, within what range is the actual average?
Answer Between 7 and 17.
21
CONFIDENCE LEVEL
  • Lets say that, based on our sample, we decided
    that the proportion of dogwood trees that are
    infected with anthracnose is 50 10.
  • Next we must ask how much we trust these figures.
    A measure of our trust in their reliability is
    called the confidence level.
  • For example, we may be 90 sure, or 60 sure, or
    only half sure.
  • Confidence levels are usually expressed in
    decimals, so 90 sure is a confidence level of
    0.90.
  • Scientists usually require a confidence level of
    0.95.

22
How do we find the confidence level?
One way is to use the standard deviation.
  • Like the margin of error, the standard deviation
    is a certain number of units on each side of the
    mean.
  • The standard deviation is found using a
    mathematical formula.
  • The confidence level can be found by checking how
    many standard deviations there are within the
    margin of error we have chosen.

23
The lines enclosing the blue area represents one
standard deviation above and below the mean.





1 SD (blue) 67 Confidence Level 2 SD (green)
95 Confidence Level 3 SD (yellow) 99
Confidence Level
24
Normal Distribution
In order to make the connection between
confidence levelsand standard deviation, your
data must be close to a normal distribution.
25
How are margin of error and standard deviation
related?
If the margin of error extends to one standard
deviation from the mean, the confidence level is
0.67. If the margin of error extends to two
standard deviations from the mean, the confidence
level is 0.95. If the margin of error extends to
three standard deviations from the mean, the
confidence level is 0.99.
26
EXAMPLE
  • If the mean of our samples shows that 50 of the
    trees in Georgia have anthracnose, and the
    standard deviation is found mathematically to be
    5, then
  • we have a confidence level of 0.67 that the
    actual proportion of infected trees in Georgia is
    between 45 and 55 (or 50 5).
  • we have a confidence level of 0.95 that the
    actual proportion is between 40 and 60 (or 50
    10).
  • we have a confidence level of 0.99 that the
    actual proportion is between 35 and 65 (or 50
    15).

27
SAMPLE SIZE
  • The last section showed us that the size of the
    margin of error is affected by the standard
    deviation. The smaller the standard deviation,
    the smaller the margin of error.
  • To increase precision, we want to have a smaller
    margin of error. (An infection rate of 50 5
    is much more precise than one of 50 15.)
  • To have a smaller margin of error, we need to
    have a smaller standard deviation.
  • Is there a way to decrease the size of the
    standard deviation?
  • Yes, we can increase the sample size.(if the
    distribution is normal)

28
Sample Size and Standard Deviation
  • The larger the sample size, the smaller the
    standard deviation.
  • Using the infection rate of trees as an example
  • Sample size (number of
  • plots counted) 50 100
  • Standard Deviation 7 2.5
  • Margin of Error (for 0.95 14 5
  • confidence levels)
  • By doubling the sample size, the infection rate
    has been changed from 50 14 to 50 5.

29
CORRELATION AND CAUSATION
You may have heard that
People who listen to classical music make better
grades.
Plants grow better when people talk to them.
People who use cell phones a lot get brain cancer
more.
These are all examples of correlations that imply
causation.
30
CORRELATIONS
  • Two items are correlated when they vary or change
    in synch with each other.

If both increase or decrease together, there is a
positive correlation. For example, during the
summer in Atlanta, temperatures and smog both
increase.
If they move in opposite directions, there is a
negative correlation. For example, more exercise
leads to a lower rate of death from heart disease.
31
How do we know if two things are correlated?
  • Typically, we set up an investigation or an
    experiment.

We may survey students to see who listens to
classical music and compare the GPAs of those who
do and who do not.
We may talk regularly to one set of plants but
not to another and then measure their growth.
We may compare the smog levels on hot days and on
cold days in various parts of the country.
32
Then we check the results and see how different
the results for the two groups of data are.
If smog only occurs on days above 90 degrees,
then temperature and smog are positively
correlated.
If the growth of the two groups of plants is
exactly the same, then there is clearly no
correlation.
But what if the GPAs of students who listen to
classical music are only a little higher than
those who dont? Is there really a correlation?
33
Statistics to the rescue!
Just as there are ways to measure confidence
levels for finding averages, there are tests to
measure the probability of correlations.
Like confidence levels, these probabilities are
expressed in decimals. A 30 probability, for
example, is given as 0.30.
Again, scientists will accept two variables as
being correlated only if the probability of a
correlation is 95, or 0.95.
34
The t-test
One kind of statistical test used in the
laboratory is called a t-test.
The t-test is a bit backward in that the results
tell us the likelihood that two variables are not
correlated.
For example, if the t-test produces a result of
0.20, then there is a 20 chance that the two
variables are not correlated. This means there
is an 80 chance that they are correlated.
For a t-test, the lower the result, the more
likely the two variables are to be correlated.
35
DOES CORRELATION MEAN CAUSATION?
If two variables are correlated, does this mean
that one causes the other?
If the students listening to classical music have
higher GPAs, does this mean listening to the
music HELPED the students study better?
Not necessarily. It could be that their parents
value education more, so the students study
harder.
Or maybe they are more intelligent and therefore
appreciate the complexity of classical music.
36
When two variables are correlated, it is always
tempting to assume that changes in one causes
changes in the other, but this usually requires
further investigation.
37
Errors
In every scientific investigation, there are
errors in the data that might obscure the
results. These errors occur due to factors that
are not well controlled.
Ex. Measurement of the height of a ball that is
dropped
Ex. Not enough genetic diversity in a population
38
Random Errors
Due to uncontrolled factors that randomly change
sign and magnitude each time the experiment is
run usually due to the inaccuracy of measuring
devices effect can be minimized by running the
experiment many times and averaging the results
Ex. Imprecise reading of thermometer measurement
Ex. The amount of watergiven to the ferns
39
Systematic Errors
Due to factors for which there is no accounting
in the experimentthat always bias the result in
a particular direction there is no way to
remove their effect best that can be done is to
estimate the effect
Ex. Miscalibrated ruler
Ex. Air resistance in a ballistics experiment
40
Precision and Accuracy
Precision - how close the results from various
runs are to oneanother Accuracy - how close the
average result is to the intended result
Random errors affect precision systematic errors
affect accuracy
Pictures courtesy of Dr. Gary Lewis
Write a Comment
User Comments (0)
About PowerShow.com