Qualitative (Categorical) Data - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Qualitative (Categorical) Data

Description:

What's wrong with this study? Example: Measurement Bias ... The dogs shake and it is impossible to measure their weight to better than an accuracy of 5 ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 56
Provided by: cente56
Category:

less

Transcript and Presenter's Notes

Title: Qualitative (Categorical) Data


1
Practical Applications of Statistics or So you've
got data... NOW WHAT?
With special thanks to Herb McGrath and Laura
Simon of the Penn State Stats Dept.
2
What is Statistics?Statistics is a branch of
mathematics concerned with interpreting data
  • Statistics guides all aspects of your Science
    Fair project
  • Experiment Design/Data Collection
  • Summarizing Data
  • Interpreting Data
  • Drawing Conclusions from Data

3
Data is a good start now for a good finish!
  • Turning data into information
  • Proving that an experimental factor had an effect
  • Inferring relationships between factors
  • Determining how confident you can be in your
    determinations

4
Segment I Understanding Error
  • It is important to minimize error and understand
    it!
  • Some data is inherently biased
  • Selection bias
  • Measurement bias

5
Example Sampling Bias
  • Dog Obedience Experiment 1000 people are invited
    to a public park to demonstrate the ability of
    their dog to respond to commands. 500 turn out
    with their dogs. The experimenters document the
    number of commands given to each dog and the
    number that they respond to appropriately. They
    find that the dogs respond an astonishing 71 of
    the time! Whats wrong with this study?

6
Example Measurement Bias
  • The dogs are weighed to determine if large dogs
    respond to commands better than small dogs. It is
    later determined that the scale was a postage
    scale and couldnt weigh packages above 10 lbs.
    All dogs weighing more than 10 lbs were recorded
    as weighing 10 lbs.

7
Segment I Understanding Error
  • Some data is inherently biased
  • Selection bias
  • Measurement bias
  • All non-biased data has a STANDARD ERROR OF
    MEASUREMENT
  • This is the precision to which the
    instrument/experimenter can accurately record the
    data
  • Data is randomly distributed around the true
    value
  • Note Most systems have also randomness of their
    own

8
Example Random Error
  • On the other hand, when have you ever seen a dog
    stand still on a scale. The dogs shake and it is
    impossible to measure their weight to better than
    an accuracy of 5.

The Bottom Line Minimize your sources of error
and understand the error that you cant get rid
of.
9
Segment I Understanding Error
  • Some data is inherently biased
  • Selection bias
  • Measurement bias
  • All non-biased data has a STANDARD ERROR OF
    MEASUREMENT
  • This is the precision to which the
    instrument/experimenter can accurately record the
    data
  • Data is randomly distributed around the true
    value
  • Note Most systems have also randomness of their
    own
  • The average data point differs from the mean of
    the data by the Standard Deviation

10
Example Fish Catches
Vessel A and Vessel B caught the same
average number of fish every day but obviously
have very different typical days. There are
ways to quantify these differences.
11
The Frequency Histogram
250
200
150
Number of members in bin
100
50
0
0
2
4
6
8
10
Average fish weight in lbs. (binwidth 0.5 lbs.)
12
The Gaussian Distribution
250
200
150
Number of members in bin
100
50
0
0
2
4
6
8
10
Weight in Lbs. (binwidth 0.5 Lbs.)
13
The Gaussian Distribution
0.4
0.35
0.3
0.25
0.2
Likelihood (arbitrary units)
0.15
0.1
50
0.05
0
0
2
4
6
8
10
Weight in Lbs (no bins)
14
The Gaussian Distribution
15
The Gaussian Distribution
34.1
34.1
13.6
13.6
2.14
2.14
16
Three types of The Average
  • The mean Bill Gates walks into a coffee shop and
    the average person in the shop is rich!
  • The median Half of all people are taller than me
    and half are shorter. I am of the average
    height.
  • The mode The largest number of golfers shoot an
    88.

Important In most data sets, most data is not
the average.
17
Example Fish Catches
If the average data point is not at the
average of the data, then were is it?
Deviation- The distance of a measurement
from the mean. Variance- The sum of the
squared deviations of n measurements from
the mean. Standard deviation- The variance
divided by n-1
18
What is a standard deviation?
  • It is the typical (standard) difference
    (deviation) of an observation from the mean.
  • Think of it as the average distance a data point
    is from the mean, although this is not strictly
    true.

To calculate the standard deviation (s)
-Calculate the variance subtract each datapoint
from the average and square the
difference. -Sum the squares. -Divide
the sum by the number of samples, minus one
19
So how do our fishing boats compare?
20
So you have the mean (µ) and the standard
deviation (s) Now what?
The standard score (z) (Data
Point-Mean)/Standard Deviation
21
Common Question What value and error bars should
I report for this data?
  • For each data set, find the mean.
  • For each data set, find the standard deviation.
  • Plot the mean as a dot.
  • Plot 2 standard deviations above and below the
    mean.

22
Lab 1 Make A Scatter Plot of Data with Error Bars
  • Open c\StatsLab.xls
  • In sheet 1 is some data. Use the function
    average() and stdev to put the means and
    standard deviations where they are indicated.
  • Make a scatter plot of the means versus the
    dependent variable, the quantity of fertilizer.
  • Add error bars of 2s.

23
Common Question Is there really a difference
between these two data sets?
  • Experimental Design The control versus the
    experimental.
  • Collect data from both samples.
  • Find the mean and standard deviation of each.
  • Determine the standard score (z) of the one
    versus the other.
  • If one is 2 standard deviations different from
    the others mean, then the sets probably show a
    difference.
  • The confidence with which you can say that the
    two are different depends on z
  • Z 1 69 confidence, Z2, 95 confidence, Z3,
    99.5 confidence
  • To find your confidence to report, look up a
    table of STANDARD NORMAL PROBABILITIES

But what if you have very few data points?
24
Lab 2 Determine whether data set differences are
statistically significant.
  • The second worksheet contains the data from
    Lab 1.
  • Compute the standard score (Z) for the
    differences between each data set.
  • Which of the data points cannot be shown to be
    different from the others with 95 confidence?

25
Common Question Is there really a difference
between these two data sets?
  • Experimental Design The control versus the
    experimental.
  • Collect data from both samples.
  • Find the mean and standard deviation of each.
  • Determine the standard score (z) of the one
    versus the other.
  • If one is 2 standard deviations different from
    the others mean, then the sets probably show a
    difference.
  • The confidence with which you can say that the
    two are different depends on z
  • Z 1 69 confidence, Z2, 95 confidence, Z3,
    99.5 confidence
  • To find your confidence to report, look up a
    table of STANDARD NORMAL PROBABILITIES

But what if you have very few data points?
26
  • W.S. Student Gossett (1876 - 1937)

27
Students T-Test
  • Theres a certain chance you just found a weird
    sample
  • It goes up as the sample size goes down.
  • Too few samples wont be explained by a Gaussian
    curve.
  • Student published an alternative to this.

28
Students T-Table
29
Lab 3 Students T-Test
  • In Lab 3, another, much smaller data set from
    another fertilizer is to be analyzed. Can it be
    concluded that the other fertilizer had a
    statistical effect on the growth?

30
Now to Change Gears Tests for Correlation
  • Sometimes its difficult to perform a controlled
    experiment.
  • Samples collected from nature
  • Studies involving the weather
  • Anything where you ask people to do something
    over a long period of time
  • So what do you do? You look for correlation.

31
Positive Correlation Two Phenomena Rise and Fall
Together
32
Negative (Inverse) Correlation Two Phenomena
Rise and Fall Together
33
Strong vs. Weak Correlations
34
Common Question Are these phenomena correlated?
  • Pearsons Product Moment Coefficient
  • A.K.A. Correlation Coefficient or r
  • Never mind about this formula, its easy in Excel
  • What is the statistical significance?
  • Use Students T table but with n-2 degrees of
    freedom

35
Common Problem Fitting Data
  • Find a trend in the data
  • How much more do plants grow when fertilized?
  • Can be very complicated Requires deep
    understanding of the physical phenomenon
  • Simplest fit, Linear Least Squares
  • Accessible in Microsoft Excel

36
Lab 4 Determining Correlation
  • Make a scatter plot of the hours studied versus
    exam score columns.
  • Find the correlation coefficient by using the
    function gtInsert gtFunction gtStatistical
    gtPearson
  • Plot the linear least squares fit of the plot and
    the r2 value on the graph.
  • Determine how confident we can be that the
    phenomena are linked.

37
Shifting Gears Again Discrete Data
  • Until now, we assumed that all data was
    measurement data.
  • Now, we deal with nominal data
  • Home runs hit by player.
  • Ice cream preference by country.

38
Categorical data classified asNominal, Ordinal,
and/or Binary
Categorical data
Ordinal data
Nominal data
Not binary
Binary
Binary
Not binary
39
Heart Attacks and Aspirin
Does taking aspirin affect the chances of
suffering a Second heart attack? How sure can you
be?
40
?2 analysis Expectation vs. observation
  • Compute the percent distribution if all the data
    was random.

41
?2 analysis Expectation vs. observation
The ?2 can now be looked up on a table to see its
significance. ?2 has degrees of freedom, like
Students T and Pearsons Product. The degrees of
freedom (rows-1 columns-1) The Excel
command is CHITEST
42
Lab 5 ?2 Analysis
  • Wally likes to fish, rain or shine. Some days he
    catches fish and other days he doesnt. Find out
    if there is a link between rainy days and Wallys
    luck with the rod and reel. What is the
    probability of such an event occurring by
    accident?

43
To Recap
  • Statistics is a branch of mathematics that is
    used to guide experimental design.
  • Larger sample sizes are better if unbiased.
  • Students T-tests can check if small data sets
    are significantly different.
  • -For larger sets, use Gaussian Z-tests.
  • All of this is available in Excel.

44
Stats Resources
  • CliffsQuickReview Statistics (10)
  • Spreadsheet programs
  • Lots of stats built into Excel
  • http//mail.pittstate.edu/winters/tutorial/
  • http//www.statsoft.com/textbook/stathome.html

45
(No Transcript)
46
Lab 1 Make A Scatter Plot of Data with Error Bars
  • Open c\StatsLab.xls
  • In sheet 1 is some data. Use the function
    average() and stdev to put the means and
    standard deviations where they are indicated.
  • Make a scatter plot of the means versus the
    dependent variable, the quantity of fertilizer.
  • Add error bars of 2s.

47
Lab 2 Determine whether data set differences are
statistically significant.
  • The second worksheet contains the data from
    Lab 1.
  • Compute the standard score (Z) for the
    differences between each data set.
  • Which of the data points cannot be shown to be
    different from the others with 95 confidence?

48
Lab 3 Students T-Test
  • In Lab 3, another, much smaller data set from
    another fertilizer is to be analyzed. Can it be
    concluded that the other fertilizer had a
    statistical effect on the growth?

49
Lab 4 Determining Correlation
  • Make a scatter plot of the hours studied versus
    exam score columns.
  • Find the correlation coefficient by using the
    function gtInsert gtFunction gtStatistical
    gtPearson
  • Plot the linear least squares fit of the plot and
    the r2 value on the graph.
  • Determine how confident we can be that the
    phenomena are linked.

50
Lab 5 ?2 Analysis
  • Wally likes to fish, rain or shine. Some days he
    catches fish and other days he doesnt. Find out
    if there is a link between rainy days and Wallys
    luck with the rod and reel. What is the
    probability of such an event occurring by
    accident?

51
(No Transcript)
52
Population
  • The set of data (numerical or otherwise)
    corresponding to the entire collection of units
    about which information is sought

53
Population Examples
  • Unemployment - Status of ALL employable people
    (employed, unemployed) in the U.S.
  • SAT Scores - Math SAT scores of EVERY person that
    took the SAT during 1997
  • Responses of ALL currently enrolled underage
    college students as to whether they have consumed
    alcohol in the last 24 hours

Traits of a Population are called Parameters
54
Sample
  • A subset of the population data that are actually
    collected in the course of a study.

Population
Sample
100 people with heart conditions given aspirin.
55
Sample Examples
  • Unemployment - Status of the 1000 employable
    people interviewed.
  • SAT Scores - Math SAT scores of 20 people that
    took the SAT during 1997
  • Responses of 538 currently enrolled underage
    college students as to whether they have consumed
    alcohol in the last 24 hours

Traits of Samples are called Statistics
Write a Comment
User Comments (0)
About PowerShow.com