Title: Thinking Mathematically
1Thinking Mathematically
2Thinking Mathematically
- Section 1
- Sampling, Frequency Distributions, and Graphs
3Statistics
- Statistics is the science of data. This
involves collecting, classifying, summarizing,
organizing, analyzing, and interpreting numerical
information.
4Types of Statistics
- Descriptive Statistics utilizes numerical and
graphical methods to look for patterns in a data
set, to summarize the information revealed in a
data set, and to present that information in a
convenient form. - Example analysis of scores on an exam to see how
hard it was and how you might "curve" the grades
- Inferential Statistics utilizes sample data to
make estimates, decisions, predictions, or other
generalizations about a larger set of data. - Example looking at census data to determine what
the make-up of the population will be in 2030.
5Random Samples
- A random sample is a sample obtained in such a
way that every element in the population has an
equal chance of being selected for the sample. - If we want to select a random sample from a large
city to determine how the citys citizens feel
about casino gambling we might - randomly select neighborhoods of the city and
then - randomly survey people within the selected
neighborhoods. - If we only select specific neighborhoods or the
first 200 people we find in the telephone
directory, then not everyone has an equal chance
of being selected.
6Describing Qualitative Data
class frequency
class
- A class is one of the categories into which
qualitative data can be classified. - The class frequency is the number of observations
in the data set falling in a particular class.
7Histogram
- A histogram is like a bar graph in that the
vertical axis gives the proportion (or relative
frequency) for each interval of data while the
horizontal axis is divided into specified
intervals of equal width known as measurement
classes. - However, in a histogram, each column shares a
side or touches while in a bar graph each column
is separated.
8Histogram
9Frequency Polygon
- A line graph called a frequency polygon can also
be used to visually convey information. - The axes are labeled just like those in a
histogram. Once a histogram has been
constructed, put a dot at the top of each
rectangle at its midpoint. - Connect each of these midpoints with a straight
line. - Finally, draw each endpoint down to touch the
horizontal axis.
10Frequency Polygon
11Stem-and-Leaf Display
- Two columns are created, one for the stem and one
for the leaf. The place value for the stem is
determined for the left column and the following
place value in each piece of data will be written
next to the appropriate stem in the leaf column.
As an example, if the data value is 32, the 3 may
be designated as the stem in which case the 2
would be the leaf. If the data point is 5.8, the
ones place may be the stem and the tenths place
the leaf so that the 5 would be in the stem
column with the 8 next to it in the leaf column.
12Stem-and-Leaf Display
7
7
4
3
5
2
2
3
13Thinking Mathematically
- Section 2
- Measures of Central Tendency
14The Mean
- The mean of a set of quantitative data is the sum
of the measurements divided by the number of
measurements contained in the data set. (the
average)
15The Mean
- The mean is the sum of the data items divided by
the number of items.
16Computing the Mean
The sum of these numbers is 2907
There are 40 numbers
The mean is 2907/40 72.675
17Calculating the Mean from a Frequency Distribution
18Calculating the Mean from a Frequency Distribution
(85)(3) 255 (75)(5) 375 (70)(6)
420 (55)(3) 165 (25)(1) 25
1240
18
Mean 1240/18 68.9
19Computing a GPA (weighted average)
9.99 5.33 16.00 9.00 9.32
49.64
16
GPA 49.64/16 3.10
20The Median
- To find the median of a group of data items,
- 1. Arrange the data items in order, from
smallest to largest. - 2. If the number of data items is odd, the
median is the item in the middle of the list. - 3. If the number of data items is even, the
median is the mean of the two middle data items.
21Position of the Median
- If n data items are arranged in order, from
smallest to largest, the median is the value in
the -
13/2 6.5, so median falls between 6th and 7th
position.
median (7060)/2 65
22Position of the Median
- If n data items are arranged in order, from
smallest to largest, the median is the value in
the -
- position.
14/2 7, so median is the 7th value.
median 70
23The Mode
- The mode is the data value that occurs most often
in a data set. - For example, the mode for the following set of
numbers 7, 2, 4, 7, 8, 7, 10 turns out to be 7,
because the number 7 occurs three times, more
than any other number.
24The Midrange
- The midrange is found by adding the lowest and
highest data values and dividing the sum by 2.
25The Midrange
midrange (2595)/2 120/2 60
26Thinking Mathematically
- Section 4
- Measures of Dispersion
27Dispersion
- The mean and the median were measures of central
tendency they talk about the typical value
who's in the middle. - We also want to know about the other values. Are
there a lot of values much higher than the mean?
Much lower? - That's called dispersion.
28The Range
- The range, the difference between the highest and
lowest data values in a data set, indicates the
total spread of the data. - Range highest data value - lowest data value
- For example, the ten most expensive markets for
new homes in the U.S. has the following mean home
cost in thousands of dollars 332, 256, 251,
235, 223, 215, 215, 213, 210, 210. - The range in costs is 332 - 210 122. (In
other words 122,000).
29Deviation
- The range tells us about the highest guy and the
lowest guy. But what about the others? - The deviation is the distance (positive or
negative) between a value and the mean. - If the mean is 62, a value of 98 is a distance of
36 from the mean and a value of 55 is a distance
of -7 from the mean. - The deviation of a set of data is the sum of the
deviations.
30Deviation
95 54 66 88 63 35 40 95 90 67
32 -9 3 25 0 -28 -23 32 27 4
so the deviation is 0
630
0
10 values, so the mean is 63
31Deviation
- Unfortunately, the deviation will always work out
to be 0. - It's totally useless
- For that and other reasons, dispersion is
computed by what is called the standard deviation
32Computing the Standard Deviation for a Data Set
- Find the mean of the data items.
- Find the deviation of each data item from the
mean data item - mean - Square each deviation (data item - mean)2
- Sum the squared deviations add up all of the
(data item - mean)2
33Computing the Standard Deviation for a Data Set
- Divide the sum in step 4 by n-1, where n
represents the number of data item.
- Take the square root of the quotient in step 5.
This value is the standard deviation for the data
set. Standard deviation
34Standard Deviation
95 54 66 88 35 63 40 95 90 67
32 -9 3 25 -28 0 -23 32 27 4
1024 81 9 625 784 0 529 1024 729 16
4821/9 535.67
the square root of 535.65 is 23.14
standard deviation 23.14
630
4821
10 values, so the mean is 63
35Standard Deviation
- Therefore we can say that the average value was
around 63, with typical values falling around 23
points above or below that average.
36Thinking Mathematically
- Section 5
- The Normal Distribution
37Normal Distribution
- Whenever there's a lot of data centering around
an average, the data is distributed in such a way
that most values are around the average (the
mean) - This type of distribution is called a normal
distribution or bell-shaped curve
38Normal Distribution
99.7
95
68
- 3
- 2
-1
2
3
1
39The 68-95-99.7 Rule for the Normal Distribution
- Approximately 68 of the measurements will fall
within 1 standard deviation of the mean. - Approximately 95 of the measurements will fall
within 2 standard deviations of the mean. - Approximately 99.7 (essentially all) the
measurements will fall within 3 standard
deviations of the mean.
40The 68-95-99.7 Rule for the Normal Distribution
99.7
95
68
- 3
- 2
-1
2
3
1
41Normal Distribution
- The best way to solve a "Normal Distribution
Problem is - Find the values of the mean and standard
deviation - Fill those numbers into the normal distribution
graph - See what range of values you're interested in
- Highlight that region of the graph
- Examine and find out the corresponding percentage.
42Normal Distribution
- On the 1992 SAT's, the mean value on the
quantitative portion was 510, with a standard
deviation of 90. - What percentage of the students scored between
510 and 690?
43The 68-95-99.7 Rule for the Normal Distribution
- mean 510, standard deviation of 90.
- percentage between 510 and 690?
99.7
95
47.5
68
- 3
510
600
420
690
330
780
240
- 2
-1
2
3
1
44Normal Distribution
- On the 1992 SAT's, the mean value on the
quantitative portion was 510, with a standard
deviation of 90. - What percentage of the students scored between
510 and 690? - 690 is 2 standard deviations above the mean
- 95 scored within 2 standard deviations
- half of those were above the mean
- 47.5
45Computing z-Scores
- A z-score describes how many standard deviations
a data item in a normal distribution lies above
or below the mean. The z-score can be obtained
using - Data items above the mean have positive z-scores.
Data items below the mean have negative z-scores.
The z-score for the mean is 0.
46Computing z-Scores
- In the previous example, a student scored 645.
What was his z-score? - z-score 135/90 1.5
47Percentiles
- If n of the items in a distribution are less
than a particular data item, we say that the data
item is in the nth percentile of the
distribution. - For example, if a student scored in the 93rd
percentile on the SAT, the student did better
than about 93 of all those who took the exam.
48Finding the Percentage of Data Items between Two
Given Items in a Normal Distribution
- Convert each given data item to a z-score
- Use the table to find the percentile
corresponding to each z-score in step 1. - Subtract the lesser percentile from the greater
percentile and attach a sign.
49Margin of Error in a Survey
- If a statistic is obtained from a random sample
of size n, there is a 95 probability that it
lies within of the true population statistic,
where is called the margin of error.
50Thinking Mathematically
- Section 6
- Scatter Plots, Correlation, and Regression Lines
51Correlation
- Often we are not so much interested in
characterizing one set of data, but trying to
determine a relationship between two sets of
data. - For example, people were surveyed and asked
three questions what is your gender? what is
your age? how strongly do you believe in your
religion?
52Correlation
- For example, people were surveyed and asked three
questions what is your gender? what is your
age? how strongly do you believe in your
religion? - Is there a relationship between gender and
religious practice? between age and religious
practice? - The strength of the relationship is called
correlation.
53Correlation
- One way to analyze correlation is by drawing a
scatter plot and determinining from it the
correlation coefficient. - A scatter plot measures two pieces of data
against each other. - Let's take a look
54Correlation Coefficient
r.7 moderate positive
r.85 strong positive
r1 perfect positive
55Correlation Coefficient
r -.5 moderate negative
r -.85 strong negative
r -1 perfect negative
56 Computing the Correlation Coefficient by Hand
n?xy - (?x)(?y)
r
n(?x2) - (?x)2 n(?y2) - (?y)2
- The formula is used to calculate the correlation
coefficient, r. - You have to be joking! This is MATH1101 !!!
57Writing the equation of the Regression Line
by Hand
- The equation of the regression line is
- y mxb
- where
58Values for Determining Correlationsin a
Population
?? .01
n
?? .05
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30
35 40 45 50 60 70 80 90 100
.999 .959 .917 .875 .834 .798 .765 .735 .708 .684
.661 .641 .623 .606 .590 .575 .561 .505 .463 .430
.402 .378 .361 .330 .305 .286 .269 .256
.950 .878 .811 .754 .707 .666 .632 .602 .576 .553
.532 .514 .497 .482 .468 .456 .444 .396 .361 .335
.312 .294 .279 .254 .236 .220 .207 .196