What is statistics - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

What is statistics

Description:

In almost any area of work, you must be able to read, interpret, and apply the ... Inaccessible population units: study all the sunfish in Lake Michigan, say. ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 61
Provided by: mingx
Category:

less

Transcript and Presenter's Notes

Title: What is statistics


1
What is statistics
  • Statistical training is necessary and important
    for many reasons. In almost any area of work, you
    must be able to read, interpret, and apply the
    results of a statistical analysis of research
    data.
  •  
  • What is Statistics all about? Statistics
    involves the collection, organization,
    interpretation and presentation of numerical
    information.
  • Studying statistics will help you to understand
    the information and to reach correct conclusion.

2
Statistics statements
  • Statistics proves that cigarette smoking will
    cause lung cancer.
  • According to statistics, females live longer than
    males.
  • Statistically speaking, tall parents have tall
    children.

3
Chapter 1 Population and Sample
  • An experiment unit (or subject) is the smallest
    entity that is of interest in a statistical
    study.
  • A variable is any characteristics that can be
    measured on each experiment unit in a statistical
    study.
  • An observation is a value that the variable
    assumes for a single unit.
  • The collection of observations assumed by the
    variables in the study is called a data set.

4
Population Versus Sample
  • Population the whole group
  • a collection of persons, objects, or items under
    study
  • Sample a portion of the whole group
  • a subset of the population

5
  • Example1.2 Now we are interested in the heights
    of MSU students. We measured the heights (and/or
    weight, gender) of all the students and recorded
    as follows X1, X2,. Randomly select 50
    students to measure their heights. Explain the
    concepts above.
  • Population
  • All MSU students
  • Experiment unit
  • A MSU student
  • variables
  • height,( and/or weight, gender)
  • Observations
  • any one of X1,X2,.
  • Data set
  • X1,X2,.
  • Sample
  • the 50 selected students

6
  • Bias is a systematic tendency of the sample to
    misrepresent the population.
  • A simple random sample of size n consists of n
    elements chosen from the population in such a way
    that all samples of that size have the same
    chance of being selected.

7
  • A Census is a sample consisting of the entire
    population.  
  • Why dont we always do a census?
  • Time time consuming,
  • Cost cost more,
  • Inaccessible population units study all the
    sunfish in Lake Michigan, say.
  • Destructive testing destroy the unit

8
For example, (the lottery sampling)
  • 100 balls are mixed thoroughly in a bag. draw 10
    balls randomly from the bag. Try twice
    with/without replacement.

9
Exercise 1.1
  • A stock market investor is interested in oil
    stocks.She collects last years price/earnings
    ratios on ten randomly selected oil stocks. The
    data of ratios are X1,X2,, X10.
  • What is our population?
  • What is the variable? Give one observation.
  • What is our sample and sample size?
  • What is our data set?

10
Exercise1.2
  • Want to know the average height of 2nd grade
    students in East Lansing Public schools. Instead
    of measure all 2nd grade students, we sample the
    60 2nd grade students randomly, The data of
    heights are X1,X2,, X60.
  • What is our population?
  • What is the variable? Give one observation.
  • What is our sample and sample size?
  • What is our data set?

11
Statistics terms
  • experimental unit, subject
  • variable
  • population
  • sample
  • census
  • s r s simple random sample

12
Chapter 2 Univariate Data
  • A univariate data set is a data set in which one
    measurement (variable) has been made on each
    experiment unit.
  • A bivariate data set is a data set in which two
    measurements (variables) have been made on each
    experiment unit.
  • A multivariate data set is a data set in which
    several measurements (variables) have been made
    on each experiment unit.

13
  • A Numerical variable (also called a quantitative
    or measure variable) is a variable whose values
    are numbers obtained by a count or measurement.
    For example weight, height.
  • A discrete variable is a numerical variable that
    can assume a finite number or at most a countable
    infinite number of values (Countable means you
    can associate the values with the counting number
    1,2,3,that is, the values can be counted).
  • A continuous variable is a numerical variable
    that can take any number on an interval of the
    real number line. For example, height, weight.

14
Types of Variables
  • A categorical variable (also called a qualitative
    variable) is a variable whose values are
    classifications or categories.
  • For example, Gender Male, Female
  • Occupation Student, doctor, teacher,

15
Types of Data
Discrete Can only take on certain values in an
interval.
Numerical (Quantitative)
Continuous Can take on infinitely many values
in an interval.
Ordinal There is a sense of ordering.
Categorical (Qualitative)
Nominal There is no sense of ordering.
16
Discrete Example
  • number of people in a room This can only be 0,
    1, 2, 3, etc. It cant be 0.2, 2.2333,4.5511,
    etc.
  • Usually discrete variables are those where you
    are counting.

17
Continuous Examples
  • age of a car Usually you will say your car is 1
    year old, 2 years old, etc. But the age of a car
    isnt just 1,2,3, etc. It would be more exact to
    say its 1.32433535 years old.
  • height of an object We tend to measure in
    inches 52, 43, etc. But an object can take
    on a much more precise measurement than this.
  • Other examples weights, volume, etc.

18
Ordinal Examples
  • military private, sergeant, lieutenant,
    general, etc. There is a sense of ordering.
  • degree high school, B.S., M.S., Ph.D., etc.
  • classification freshman, sophomore, junior,
    senior

19
Nominal
  • color of an object green, red, white
  • maker of a vehicle chevy, ford, dodge

20
Be cautious!
  • Coding of categorical variable does not make it
    numerical.
  • For example
  • Gender Male -- 0 Female -- 1
  • zip code, area code, telephone number, social
    security number, etc. These are all nominal.

21
Exercise 2.1 Classify the following as
categorical or numerical (discrete or
continuous).
  • a. Age of freshmen in MSU
  • b. Faculty rank
  • c. Weight of newborn babies
  • d. Murder rate in a major city
  • e. Number of children in a family
  • f. Brand of television set

22
display categorical variable
  • frequency table
  • pie chart
  • bar chart bar graph

23
Frequency table
Example2 Time/CNN telephone poll of 500 adult
Americans Has the amount of crime in your
community increased in the past 5 years?

24
Pie Chart
  • A circle or pie is divided into pieces
    corresponding to the categories of the variable
    so that the size of the slice is proportional to
    the relative frequency of the category.

25
Pie Chart
26
Bar Chart
  • is a picture consisting of horizontal and
    vertical axis with rectangles that represent the
    frequency (relative frequency) of the categories
    of a variable.

27
Bar chart of CNN telephone poll
28
Example3 Sampling 100 students from MSU students
to get their level information, here are the
data Fr 12, So 24, Jr 32, Sr 24, Gr 8.
29
display numerical data
  • stem leaf
  • dot plot ok for small data set
  • histogram

30
Example 1
  • A psychologist wishes to test a new method to
    improve rote memorization by college students. A
    sample of 20 college students were taught by this
    method and then asked to memorize a list of 100
    word phrases. The following numbers of correct
    word phrases were recorded for the 20 students.
  • 84 59 82 78 74 96 44 76 85 66
  • 77 91 62 54 72 65 84 38 76 70

31
Dotplot
84 59 82 78 74 96 44 76 85 66 77 91
62 54 72 65 84 38 76 70
  • Distribution of a variable specifies the
    distinct values that the variable assumes and
    how often these values occur. The distribution
    illustrates the pattern of the variation in the
    data.

32
Stem and leaf Plot.
  • 84 59 82 78 74 96 44 76 85 66
  • 77 91 62 54 72 65 84 38 76 70
  • Step1. One observation plotted
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8 4
  • 9

33
  • 84 59 82 78 74 96 44 76 85 66
  • 77 91 62 54 72 65 84 38 76 70
  • Step2. Fill stem and leaf plot
  • 3 8
  • 4 4
  • 5 9 4
  • 6 6 2 5
  • 7 8 4 6 7 2 6 0
  • 8 4 2 5 4
  • 9 6 1

34
  • 84 59 82 78 74 96 44 76 85 66
  • 77 91 62 54 72 65 84 38 76 70
  • Step3. Ordered stem and leaf plot
  • 3 8
  • 4 4
  • 5 4 9
  • 6 2 5 6
  • 7 0 2 4 6 6 7 8
  • 8 2 4 4 5
  • 9 1 6

35
Example 2
  • The following is the concentration of mercury in
    25 lake trout caught in a major lake
  • 2.2 3.4 3.0 2.6 3.8 1.8 2.8 3.2
    3.7
  • 1.4 2.7 3.6 1.9 2.2 3.0 3.3 2.3
  • 1.7 2.6 3.5 3.0 2.9 3.4 3.1 2.4
  • Exercise create a stem leaf plot for this data.

36
Stem and leaf plot
  • Leaf unit 0.1
  • 1 4 8 9 7
  • 2 2 2 3 4 6 6 7 8 9
  • 3 0 0 0 1 2 3 4 4 5 6 7 8

37
double-stem stem and leaf plot
  • Leaf unit 0.1
  • 1 4
  • 1 8 9 7
  • 2 2 3 4
  • 2 6 8 7 6 9
  • 3 4 0 2 0 3 0 4 1
  • 3 8 7 6 5

38
Ordered double-stem stem and leaf plot
  • Leaf unit 0.1
  • 1 4
  • 1 7 8 9
  • 2 2 3 4
  • 2 6 6 7 8 9
  • 3 0 0 0 1 2 3 4 4
  • 3 5 6 7 8

39
Histogram
  • The histogram is a graphical means of
    displaying the numerical data. If we slice up the
    entire span of values covered by the quantitative
    variable into equal-width piles called bins
    (classes), a histogram plots the bin counts
    (class counts) as the heights of bars
  • It can be constructed from the stem and leaf
    plot each stem defines an interval of values as
    a class. The class limits are the smallest and
    largest possible values for the interval. Now go
    back to Example 1.

40
Steps of construction
  • find class limits and class boundaries
  • find class frequency and construct frequency
    table
  • label horizontal axis using continuous scale
  • label vertical axis for (relative) frequency
  • draw bars using class boundaries and (relative)
    frequency

41
Histogram
  • Used to represent continuous grouped data
  • Does not have any gaps between bars

42
Grouped frequency table
of correct word phrases in a rote memorization
study
43
Constructed Histogram
44
Relative frequency histogram
45
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

46
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Eg boundary between 154 and 155 is 154.5
Answer (a) find class limits and class boundaries
47
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class boundaries
Eg boundary between 154 and 155 is 154.5
48
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class boundaries
heights of 325 students
Height cm
49
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) label horizontal axis using
continuous scale
heights of 325 students
140
150
160
170
180
190
200
Height cm
50
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
10
51
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
52
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
10
53
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
50
10
54
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
50
10
55
Histogram - example
Question The heights of 325 students were
measured to the nearest cm
  • Draw a histogram to illustrate the above data.

Answer (a) find class widths
heights of 325 students
120
50
10
56
Exercise create a histogram for this data set.
  • The following is the concentration of mercury in
    30 lake trout caught in a major lake
  • 2.2 3.4 3.0 2.6 3.8 1.8 2.8 3.2
    3.7 3.5
  • 1.4 2.7 3.6 1.9 2.2 3.0 3.3 2.3
    3.3 3.6
  • 1.7 2.6 3.5 3.0 2.9 3.4 3.1 2.4
    3.4 3.8
  • Use limits 1-1.4, 1. 5-1.9, 2-2.4, 2. 5-2.9,
    3-3.4, 3.5-4.

57
Frequency Polygons
58
Frequency Polygons
59
Population Frequency Curve
  • When the number of observation is very large
    and the class limits are reduced in size, the
    distribution takes on the appearance of a
    continuous curve similar to what might be
    expected if the entire population of values are
    graphed. Usually the entire population is not
    available. However, A stem and leaf plot,
    histogram, or frequency polygon obtained from the
    representative sample should closely approximate
    the shape of population frequency curve.

60
Population Frequency Curve
Write a Comment
User Comments (0)
About PowerShow.com