Title: Statistics, central tendency, samples and surves
1Statistics
2- What is statistics
- What is statistics
- Why teaching statistics in schools?
- 3 questions of statistics
- Hacking online dating
- Racism and Voting
- The mathematics of Love
- Collecting data on your body
- Dont be fooled by statistics
- Misleading statistics
- Using Statistics to make a Ted Talk
3- Statistics the collection, analysis,
interpretation, and presentation of data
4Measures of Central tendency Median Mean
Mode
These measures help us make sense and understand
what data means. We must understand how
they work or we might be mislead
5The median
The median is the middle most value of the data
set when the values are arranged in order, lowest
to highest
If there is an even number of data values, the
median is the mean or average of the two middle
data points.
For example, the median of 3, 6, 7, 9, 9
For example, the median of 3, 6, 7, 8, 9, 9
Is 7
Is 78 or 7.5 2
6The mean
The mean is most commonly referred to as the
average.
To calculate the mean of a set of values we add
together the values and divide by the total
number of values.
For example, the mean of 3, 6, 7, 9 and 9 is
7The mode
The mode is the data value that occurs the most
frequently.
A data set can have no mode, 1 mode, bimodal, or
more than one mode
Mode is usually only used when the data cant be
ranked.
For example, the mode of m m
colors red, blue, red, green Yellow, blue,
green, red, yellow
It is possible to have more than one mode.
Is 3 red
8An outlier is a element of data that distinctly
stands out from the rest of the data. Outliers
have more affect over the mean than the median
9Outliers, and their effect on the mean
Here are Joes 1500 meter race results in minutes.
6.26 6.28 6.30 6.39 5.38 4.54
10.59 6.35 7.01
- Are there any extreme values, outliers?
10.59
- Will the mean be increased or reduced by the
extreme value?
Increase it
- the mean with the extreme value is 6.57 min
- the mean without the extreme value is 6.06 min,
- a difference in .51 minutes or about 30
seconds
10Outliers, and their effect on the median
Here are Joes 1500 meter race results in minutes.
6.26 6.28 6.30 6.39 5.38 4.54
10.59 6.35 7.01
4.54 5.38 6.26 6.28 6.30 6.39
6.35 7.01 10.59
- Will the median be increased or reduced by
the extreme value?
- the median with the extreme value is 6.30
minutes.
- the median without the extreme value is 6.29
minutes, this is a difference of only .01 minutes
or about .6 of a second
Outliers extreme values, inconsistencies, in
the data
Outliers will affect the median much less than
the mean
11Measures of Spread describe the variance in
data values, how consistent they are
- RANGE
- STANDARD DEVIATION
- SKEWNESS
12Measures of Spread
- RANGE
- highest value the lowest value range
- the larger the range, the more values vary in
size - STANDARD DEVIATION
- how closely do values cluster around the mean
value - SKEWNESS
- refers to symmetry of curve
13The range
Here are the high jump scores for two girls in
meters.
Joanna 1.62 1.41 1.35 1.20 1.15
Kirsty 1.59 1.45 1.41 1.30 1.30
Find the range for each girls results and use
this to find out who is consistently better.
Joannas range 1.62 1.15 0.47
Kristys range 1.59 1.30 0.29 Kristys
scores are more consistent, her Range is smaller
14Measures of Spread
- RANGE
- highest value the lowest value range
- the larger the range, the more values vary in
size - STANDARD DEVIATION
- how closely values cluster around the mean value
- SKEWNESS
- refers to symmetry of curve
15Standard Deviation
Video
Curve A, the green, data is more closely around
the median, it has a smaller standard deviation
Curve A
Curve B
?B
?A
16If the Standard Deviation is large, it means the
numbers are spread out from their mean.If the
Standard Deviation is small, it means the numbers
are close to their mean.
large,
small,
17Measures of Spread
- RANGE
- highest value the lowest value range
- the larger the range, the more values vary in
size - STANDARD DEVIATION
- how closely do values cluster around the mean
value - SKEWNESS
- refers to symmetry of curve
18In a normal distribution, the mean, median and
mode are very close to equal. Not skewed at all.
Symmetrical
19Skewed to the left, the bulk of data values are
to the left of the mean.
20Skewed to the right, the bulk of the data values
are to the right of the mean
21- Quartile ranked data divided into 4 equal parts
by the median, and the median of each side of the
median - Inter-Quartile Range is also a measure of spread
- IQR Q3 - Q1
- Like the median, the Inter-Quartile Range (IQR)
is more resistant to outliers
22Box and Whisker Plot
- Minimum the lowest data value
- First Quartile (Q1) is the median of the smaller
half of the data (bottom 25 point) - Median
- Third Quartile (Q3) is the median of the larger
half of the data (top 25 point) - Maximum the highest data value
23Box and Whisker Plot
- 3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7
- First, order your numbers from least to greatest
- 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
24Median
- 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11,
13, 14 - Then find the median (from the ordered list)
- Cross off one number from each side until you
reach the middle number (or numbers). - The median is 6.5
-
25Quartiles
- split the numbers on left and right sides of the
median - Find the median for each half
- 1, 2, 3, 4, 5, 6, 6 7, 8, 9, 10, 11, 13, 14
- Left Right
- Median 4 Median 10
- The left median is called the lower quartile or
1st quartile. - The right median is called the upper quartile or
the 3rd quartile.
26Quartiles (page 3)
- 1, 2, 3, 4, 5, 6, 6 7, 8, 9, 10, 11, 13, 14
- Left Right
- Median 4 Median 10
- The interquartile range is
- 3rd or upper quartile lower or 1st quartile
- 10 4 6
27Box and Whisker Plot
28Box and Whisker Plot
- 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
- Put dots at the LOWER and UPPER Quartiles.
- 1 2 3 4 5 6
7 8 9 10 11 12 13 14
29Box and Whisker Plot
- 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
- Draw a box connecting the dots at the LOWER and
UPPER Quartiles. - 1 2 3 4 5 6
7 8 9 10 11 12 13 14
30Box and Whisker Plot
- 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
- Put a dot at the median (6.5).
- 1 2 3 4 5 6
7 8 9 10 11 12 13 14
31Box and Whisker Plot
- 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
- Draw a line connecting the median to the box.
- 1 2 3 4 5 6
7 8 9 10 11 12 13 14
32Box and Whisker Plot
- 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
- Put circles at the high and low points.
- 1 2 3 4 5 6
7 8 9 10 11 12 13 14
33Box and Whisker Plot
- 1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11, 13, 14
- Draw lines that connect the high and low points
to the box. - 1 2 3 4 5
6 7 8 9 10 11 12 13 14
34Box and Whisker Plot
- 3, 5, 4, 2, 1, 6, 8, 11, 14, 13, 6, 9, 10, 7
- 1 2 3 4 5 6 7
8 9 10 11 12 13 14
What is the interquartile range?
Q3 Q1 10 4 6
35- Create a box and whisker plot on your Graphing
Calculator - Enter in data from water Temp St Petersburg Fla
in STAT L1 - In STAT PLOT select box and whisker, 2nd last
- Enter window values Window
-
Xmin 50 -
Xmax 90 -
Xscl 5 -
Ymin 0 -
Ymax 1 -
Yscl 1 -
Xres 1 - 4. Graph, use trace to find quartiles
36- Find Mean, median, std dev, IQR for water temp of
Key West - Enter in data from water Temp Key West in STAT L1
- STAT, CALC, 1 1 variable statistics
- mean
- standard deviation
- Med median
- Q1 1st quartile
- Q3 3rd quartile
-
-
37Finding Standard Deviation by Hand
38Samples and Surveys
- Population is all the members of a set
- Sample is part of the population
- you can get good statistical information about a
population by studying a sample
39Samples and Surveys
- Convenience Sample select any members of
population that are conveniently available - i.e survey teachers in the side entrance
- Self selected sample members of the population
must volunteer or self select - i.e send an email out to all teacher with a
link to a online survey they can take if they
want.
40Samples and Surveys
- Systematic Sample order the population in some
way, then select from it at regular intervals. - i.e. each teacher has a mailbox, they are
roughly in alphabetical order. Put a survey in
every 4th mail box - Random Sample all members of population are
equally likely to be chosen - i.e all teachers names are put in a hat, 10
names are drawn
41Bias
- Bias a systematic error introduced by the
sampling method - i.e sampling seniors about something that only
affects freshman and sophomores.
42Bias
- A poorly written survey question can introduce
bias. You should avoid - i.e you dont think that lazy seniors should get
to leave school early, do you? - Do you believe it is right to murder inmates
on death row? - What do you think about our incompetent
administrative staff at this school?
43Samples and Surveys
44Samples and Surveys
45Samples and Surveys
46Study Methods