Title: What is statistics
1What is statistics
- Statistical training is necessary and important
for many reasons. In almost any area of work, you
must be able to read, interpret, and apply the
results of a statistical analysis of research
data. -
- What is Statistics all about? Statistics
involves the collection, organization,
interpretation and presentation of numerical
information. - Studying statistics will help you to understand
the information and to reach correct conclusion.
2Statistics statements
- Statistics proves that cigarette smoking will
cause lung cancer. - According to statistics, females live longer than
males. - Statistically speaking, tall parents have tall
children.
3Chapter 1 Population and Sample
- An experiment unit (or subject) is the smallest
entity that is of interest in a statistical
study. - A variable is any characteristics that can be
measured on each experiment unit in a statistical
study. - An observation is a value that the variable
assumes for a single unit. - The collection of observations assumed by the
variables in the study is called a data set.
4Population Versus Sample
- Population the whole group
- a collection of persons, objects, or items under
study - Sample a portion of the whole group
- a subset of the population
5- Example1.2 Now we are interested in the heights
of MSU students. We measured the heights (and/or
weight, gender) of all the students and recorded
as follows X1, X2,. Randomly select 50
students to measure their heights. Explain the
concepts above. - Population
- All MSU students
- Experiment unit
- A MSU student
- variables
- height,( and/or weight, gender)
- Observations
- any one of X1,X2,.
- Data set
- X1,X2,.
- Sample
- the 50 selected students
6- Bias is a systematic tendency of the sample to
misrepresent the population. - A simple random sample of size n consists of n
elements chosen from the population in such a way
that all samples of that size have the same
chance of being selected.
7- A Census is a sample consisting of the entire
population. - Why dont we always do a census?
- Time time consuming,
- Cost cost more,
- Inaccessible population units study all the
sunfish in Lake Michigan, say. - Destructive testing destroy the unit
8For example, (the lottery sampling)
- 100 balls are mixed thoroughly in a bag. draw 10
balls randomly from the bag. Try twice
with/without replacement.
9Exercise 1.1
- A stock market investor is interested in oil
stocks.She collects last years price/earnings
ratios on ten randomly selected oil stocks. The
data of ratios are X1,X2,, X10. - What is our population?
- What is the variable? Give one observation.
- What is our sample and sample size?
- What is our data set?
10Exercise1.2
- Want to know the average height of 2nd grade
students in East Lansing Public schools. Instead
of measure all 2nd grade students, we sample the
60 2nd grade students randomly, The data of
heights are X1,X2,, X60. - What is our population?
- What is the variable? Give one observation.
- What is our sample and sample size?
- What is our data set?
11Statistics terms
- experimental unit, subject
- variable
- population
- sample
- census
- s r s simple random sample
12Chapter 2 Univariate Data
- A univariate data set is a data set in which one
measurement (variable) has been made on each
experiment unit. - A bivariate data set is a data set in which two
measurements (variables) have been made on each
experiment unit. - A multivariate data set is a data set in which
several measurements (variables) have been made
on each experiment unit.
13- A Numerical variable (also called a quantitative
or measure variable) is a variable whose values
are numbers obtained by a count or measurement.
For example weight, height. - A discrete variable is a numerical variable that
can assume a finite number or at most a countable
infinite number of values (Countable means you
can associate the values with the counting number
1,2,3,that is, the values can be counted). - A continuous variable is a numerical variable
that can take any number on an interval of the
real number line. For example, height, weight.
14Types of Variables
- A categorical variable (also called a qualitative
variable) is a variable whose values are
classifications or categories. - For example, Gender Male, Female
- Occupation Student, doctor, teacher,
15Types of Data
Discrete Can only take on certain values in an
interval.
Numerical (Quantitative)
Continuous Can take on infinitely many values
in an interval.
Ordinal There is a sense of ordering.
Categorical (Qualitative)
Nominal There is no sense of ordering.
16Discrete Example
- number of people in a room This can only be 0,
1, 2, 3, etc. It cant be 0.2, 2.2333,4.5511,
etc. - Usually discrete variables are those where you
are counting.
17Continuous Examples
- age of a car Usually you will say your car is 1
year old, 2 years old, etc. But the age of a car
isnt just 1,2,3, etc. It would be more exact to
say its 1.32433535 years old. - height of an object We tend to measure in
inches 52, 43, etc. But an object can take
on a much more precise measurement than this. - Other examples weights, volume, etc.
18Ordinal Examples
- military private, sergeant, lieutenant,
general, etc. There is a sense of ordering. - degree high school, B.S., M.S., Ph.D., etc.
- classification freshman, sophomore, junior,
senior
19Nominal
- color of an object green, red, white
- maker of a vehicle chevy, ford, dodge
20Be cautious!
- Coding of categorical variable does not make it
numerical. - For example
- Gender Male -- 0 Female -- 1
- zip code, area code, telephone number, social
security number, etc. These are all nominal.
21Exercise 2.1 Classify the following as
categorical or numerical (discrete or
continuous).
- a. Age of freshmen in MSU
- b. Faculty rank
- c. Weight of newborn babies
- d. Murder rate in a major city
- e. Number of children in a family
- f. Brand of television set
22display categorical variable
- frequency table
- pie chart
- bar chart bar graph
23Frequency table
Example2 Time/CNN telephone poll of 500 adult
Americans Has the amount of crime in your
community increased in the past 5 years?
24Pie Chart
- A circle or pie is divided into pieces
corresponding to the categories of the variable
so that the size of the slice is proportional to
the relative frequency of the category.
25Pie Chart
26Bar Chart
- is a picture consisting of horizontal and
vertical axis with rectangles that represent the
frequency (relative frequency) of the categories
of a variable.
27Bar chart of CNN telephone poll
28Example3 Sampling 100 students from MSU students
to get their level information, here are the
data Fr 12, So 24, Jr 32, Sr 24, Gr 8.
29display numerical data
- stem leaf
- dot plot ok for small data set
- histogram
30Example 1
- A psychologist wishes to test a new method to
improve rote memorization by college students. A
sample of 20 college students were taught by this
method and then asked to memorize a list of 100
word phrases. The following numbers of correct
word phrases were recorded for the 20 students. - 84 59 82 78 74 96 44 76 85 66
- 77 91 62 54 72 65 84 38 76 70
31Dotplot
84 59 82 78 74 96 44 76 85 66 77 91
62 54 72 65 84 38 76 70
- Distribution of a variable specifies the
distinct values that the variable assumes and
how often these values occur. The distribution
illustrates the pattern of the variation in the
data.
32Stem and leaf Plot.
- 84 59 82 78 74 96 44 76 85 66
- 77 91 62 54 72 65 84 38 76 70
- Step1. One observation plotted
-
- 3
- 4
- 5
- 6
- 7
- 8 4
- 9
33- 84 59 82 78 74 96 44 76 85 66
- 77 91 62 54 72 65 84 38 76 70
- Step2. Fill stem and leaf plot
- 3 8
- 4 4
- 5 9 4
- 6 6 2 5
- 7 8 4 6 7 2 6 0
- 8 4 2 5 4
- 9 6 1
34- 84 59 82 78 74 96 44 76 85 66
- 77 91 62 54 72 65 84 38 76 70
- Step3. Ordered stem and leaf plot
- 3 8
- 4 4
- 5 4 9
- 6 2 5 6
- 7 0 2 4 6 6 7 8
- 8 2 4 4 5
- 9 1 6
35Example 2
- The following is the concentration of mercury in
25 lake trout caught in a major lake - 2.2 3.4 3.0 2.6 3.8 1.8 2.8 3.2
3.7 - 1.4 2.7 3.6 1.9 2.2 3.0 3.3 2.3
- 1.7 2.6 3.5 3.0 2.9 3.4 3.1 2.4
- Exercise create a stem leaf plot for this data.
36Stem and leaf plot
- Leaf unit 0.1
- 1 4 8 9 7
- 2 2 2 3 4 6 6 7 8 9
- 3 0 0 0 1 2 3 4 4 5 6 7 8
37double-stem stem and leaf plot
- Leaf unit 0.1
- 1 4
- 1 8 9 7
- 2 2 3 4
- 2 6 8 7 6 9
- 3 4 0 2 0 3 0 4 1
- 3 8 7 6 5
38Ordered double-stem stem and leaf plot
- Leaf unit 0.1
- 1 4
- 1 7 8 9
- 2 2 3 4
- 2 6 6 7 8 9
- 3 0 0 0 1 2 3 4 4
- 3 5 6 7 8
39Histogram
- The histogram is a graphical means of
displaying the numerical data. If we slice up the
entire span of values covered by the quantitative
variable into equal-width piles called bins
(classes), a histogram plots the bin counts
(class counts) as the heights of bars - It can be constructed from the stem and leaf
plot each stem defines an interval of values as
a class. The class limits are the smallest and
largest possible values for the interval. Now go
back to Example 1.
40 Steps of construction
- find class limits and class boundaries
- find class frequency and construct frequency
table - label horizontal axis using continuous scale
- label vertical axis for (relative) frequency
- draw bars using class boundaries and (relative)
frequency
41Histogram
- Used to represent continuous grouped data
- Does not have any gaps between bars
42Grouped frequency table
of correct word phrases in a rote memorization
study
43Constructed Histogram
44Relative frequency histogram
45Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
46Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Eg boundary between 154 and 155 is 154.5
Answer (a) find class limits and class boundaries
47Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class boundaries
Eg boundary between 154 and 155 is 154.5
48Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class boundaries
heights of 325 students
Height cm
49Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) label horizontal axis using
continuous scale
heights of 325 students
140
150
160
170
180
190
200
Height cm
50Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
10
51Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
52Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
10
53Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
50
10
54Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
50
10
55Histogram - example
Question The heights of 325 students were
measured to the nearest cm
- Draw a histogram to illustrate the above data.
Answer (a) find class widths
heights of 325 students
120
50
10
56Exercise create a histogram for this data set.
- The following is the concentration of mercury in
30 lake trout caught in a major lake - 2.2 3.4 3.0 2.6 3.8 1.8 2.8 3.2
3.7 3.5 - 1.4 2.7 3.6 1.9 2.2 3.0 3.3 2.3
3.3 3.6 - 1.7 2.6 3.5 3.0 2.9 3.4 3.1 2.4
3.4 3.8 - Use limits 1-1.4, 1. 5-1.9, 2-2.4, 2. 5-2.9,
3-3.4, 3.5-4.
57Frequency Polygons
58Frequency Polygons
59Population Frequency Curve
- When the number of observation is very large
and the class limits are reduced in size, the
distribution takes on the appearance of a
continuous curve similar to what might be
expected if the entire population of values are
graphed. Usually the entire population is not
available. However, A stem and leaf plot,
histogram, or frequency polygon obtained from the
representative sample should closely approximate
the shape of population frequency curve.
60Population Frequency Curve