Title: Some definitions
1Some definitions
2A sample
- Is a subset of the population
3In statistics
- One draws conclusions about the population based
on data collected from a sample
4Reasons
It is less costly to collect data from a sample
then the entire population
Accuracy
5Accuracy
Data from a sample sometimes leads to more
accurate conclusions then data from the entire
population
Costs saved from using a sample can be directed
to obtaining more accurate observations on each
case in the population
6Types of Samples
- different types of samples are determined by how
the sample is selected.
7Convenience Samples
- In a convenience sample the subjects that are
most convenient to the researcher are selected as
objects in the sample. - This is not a very good procedure for inferential
Statistical Analysis but is useful for
exploratory preliminary work.
8Quota samples
- In quota samples subjects are chosen conveniently
until quotas are met for different subgroups of
the population. - This also is useful for exploratory preliminary
work.
9Random Samples
- Random samples of a given size are selected in
such that all possible samples of that size have
the same probability of being selected.
10- Convenience Samples and Quota samples are useful
for preliminary studies. It is however difficult
to assess the accuracy of estimates based on this
type of sampling scheme. - Sometimes however one has to be satisfied with a
convenience sample and assume that it is
equivalent to a random sampling procedure
11(No Transcript)
12Some other definitions
13A population statistic (parameter)
- Any quantity computed from the values of
variables for the entire population.
14A sample statistic
- Any quantity computed from the values of
variables for the cases in the sample.
15- Since only cases from the sample are observed
- only sample statistics are computed
- These are used to make inferences about
population statistics - It is important to be able to assess the accuracy
of these inferences
16To download lectures
- Go to the stats 244 web site
- Through PAWS or
- by going to the website of the department of
Mathematics and Statistics -gt people -gt faculty
-gt W.H. Laverty -gt Stats 244-. Lectures. - Then
- select the lecture
- Right click and choose Save as
17To print lectures
- Open the lecture using MS Powerpoint
- Select the menu item File -gt Print
18- The following dialogue box appear
19- In the Print what box, select handouts
20- Set Slides per page to 6 or 3.
216 slides per page will result in the least amount
of paper being printed
1
2
3
4
5
6
223 slides per page leaves room for notes.
1
2
3
23Organizing and describing Data
24Techniques for continuous variables
25The Grouped frequency tableThe Histogram
26To Construct
- A Grouped frequency table
- A Histogram
27- Find the maximum and minimum of the observations.
- Choose non-overlapping intervals of equal width
(The Class Intervals) that cover the range
between the maximum and the minimum. - The endpoints of the intervals are called the
class boundaries. - Count the number of observations in each interval
(The cell frequency - f). - Calculate relative frequency
- relative frequency f/N
28Â Data Set 3 The following table gives data on
Verbal IQ, Math IQ, Initial Reading Acheivement
Score, and Final Reading Acheivement Score for 23
students who have recently completed a reading
improvement program  Initial Final Verbal
Math Reading Reading Student IQ IQ Acheivement
Acheivement  1 86 94 1.1 1.7 2 104 103 1.5 1.7
3 86 92 1.5 1.9 4 105 100 2.0 2.0 5 118 115 1.9
3.5 6 96 102 1.4 2.4 7 90 87 1.5 1.8 8 95 100
1.4 2.0 9 105 96 1.7 1.7 10 84 80 1.6 1.7 11 94
87 1.6 1.7 12 119 116 1.7 3.1 13 82 91 1.2 1.8
14 80 93 1.0 1.7 15 109 124 1.8 2.5 16 111 119
1.4 3.0 17 89 94 1.6 1.8 18 99 117 1.6 2.6 19 9
4 93 1.4 1.4 20 99 110 1.4 2.0 21 95 97 1.5 1.3
22 102 104 1.7 3.1 23 102 93 1.6 1.9
29In this example the upper endpoint is included in
the interval. The lower endpoint is not.
30Histogram Verbal IQ
31Histogram Math IQ
32Example
- In this example we are comparing (for two drugs A
and B) the time to metabolize the drug. - 120 cases were given drug A.
- 120 cases were given drug B.
- Data on time to metabolize each drug is given on
the next two slides
33Drug A
34Drug B
35Grouped frequency tables
36Histogram drug A(time to metabolize)
37Histogram drug B(time to metabolize)
38Some comments about histograms
- The width of the class intervals should be chosen
so that the number of intervals with a frequency
less than 5 is small. - This means that the width of the class intervals
can decrease as the sample size increases
39- If the width of the class intervals is too small.
The frequency in each interval will be either 0
or 1 - The histogram will look like this
40- If the width of the class intervals is too large.
One class interval will contain all of the
observations. - The histogram will look like this
41- Ideally one wants the histogram to appear as seen
below. - This will be achieved by making the width of the
class intervals as small as possible and only
allowing a few intervals to have a frequency less
than 5.
42- As the sample size increases the histogram will
approach a smooth curve. - This is the histogram of the population
43N 25
44N 100
45N 500
46N 2000
47N 8
48Comment the proportion of area under a histogram
between two points estimates the proportion of
cases in the sample (and the population) between
those two values.
49Example The following histogram displays the
birth weight (in Kgs) of n 100 births
50Find the proportion of births that have a
birthweight less than 0.34 kg.
51Proportion (11310111917)/100 0.62
52The Characteristics of a Histogram
- Central Location (average)
- Spread (Variability, Dispersion)
- Shape
53Central Location
54Spread, Dispersion, Variability
55Shape Bell Shaped (Normal)
56Shape Positively skewed
57Shape Negatively skewed
58Shape Platykurtic
59Shape Leptokurtic
60Shape Bimodal
61The Stem-Leaf Plot
- An alternative to the histogram
62- Each number in a data set can be broken into two
parts - A stem
- A Leaf
63- Example
- Verbal IQ 84
- 84
- Stem 10 digit 8
- Leaf Unit digit 4
Leaf
Stem
64- Example
- Verbal IQ 104
- 104
- Stem 10 digit 10
- Leaf Unit digit 4
Leaf
Stem
65To Construct a Stem- Leaf diagram
- Make a vertical list of all stems
- Then behind each stem make a horizontal list of
each leaf
66Example
- The data on N 23 students
- Variables
- Verbal IQ
- Math IQ
- Initial Reading Achievement Score
- Final Reading Achievement Score
67Â Data Set 3 The following table gives data on
Verbal IQ, Math IQ, Initial Reading Acheivement
Score, and Final Reading Acheivement Score for 23
students who have recently completed a reading
improvement program  Initial Final Verbal
Math Reading Reading Student IQ IQ Acheivement
Acheivement  1 86 94 1.1 1.7 2 104 103 1.5 1.7
3 86 92 1.5 1.9 4 105 100 2.0 2.0 5 118 115 1.9
3.5 6 96 102 1.4 2.4 7 90 87 1.5 1.8 8 95 100
1.4 2.0 9 105 96 1.7 1.7 10 84 80 1.6 1.7 11 94
87 1.6 1.7 12 119 116 1.7 3.1 13 82 91 1.2 1.8
14 80 93 1.0 1.7 15 109 124 1.8 2.5 16 111 119
1.4 3.0 17 89 94 1.6 1.8 18 99 117 1.6 2.6 19 9
4 93 1.4 1.4 20 99 110 1.4 2.0 21 95 97 1.5 1.3
22 102 104 1.7 3.1 23 102 93 1.6 1.9
68- We now construct
- a stem-Leaf diagram
- of Verbal IQ
69- A vertical list of the stems
- 8
- 9
- 10
- 11
- 12
We now list the leafs behind stem
708
6
10
4
8
6
10
5
11
8
9
6
9
0
9
5
10
5
8
4
9
4
11
9
8
2
8
0
10
9
11
1
8
9
9
9
9
4
9
9
9
5
10
2
10
2
718
6
10
4
8
6
10
5
11
8
9
6
9
0
9
5
10
5
8
4
9
4
11
9
8
2
8
0
10
9
11
1
8
9
9
9
9
4
9
9
9
5
10
2
10
2
72- 8 6 6 4 2 0 9
- 9 6 0 5 4 9 4 9 5
- 10 4 5 5 9 2 2
- 11 8 9 1
- 12
73The leafs may be arranged in order
- 8 0 2 4 6 6 9
- 9 0 4 4 5 5 6 9 9
- 10 2 2 4 5 5 9
- 11 1 8 9
- 12
74The stem-leaf diagram is equivalent to a histogram
- 8 0 2 4 6 6 9
- 9 0 4 4 5 5 6 9 9
- 10 2 2 4 5 5 9
- 11 1 8 9
- 12
75The stem-leaf diagram is equivalent to a histogram
- 8 0 2 4 6 6 9
- 9 0 4 4 5 5 6 9 9
- 10 2 2 4 5 5 9
- 11 1 8 9
- 12
76Rotating the stem-leaf diagram we have
80
90
100
110
120
77The two part stem leaf diagram
- Sometimes you want to break the stems into two
parts - for leafs 0,1,2,3,4
- for leafs 5,6,7,8,9
-
78Stem-leaf diagram for Initial Reading Acheivement
- 01234444455556666677789
- 0
- This diagram as it stands does not
- give an accurate picture of the
- distribution
79- We try breaking the stems into
- two parts
- 1. 012344444
- 1. 55556666677789
- 2. 0
- 2.
-
80The five-part stem-leaf diagram
- If the two part stem-leaf diagram is not adequate
you can break the stems into five parts - for leafs 0,1
- t for leafs 2,3
- f for leafs 4, 5
- s for leafs 6,7
- for leafs 8,9
-
81- We try breaking the stems into
- five parts
- 1. 01
- 1.t 23
- 1.f 444445555
- 1.s 66666777
- 1. 89
- 2. 0
-
82- Stem leaf Diagrams
- Verbal IQ, Math IQ, Initial RA, Final RA
83Some Conclusions
- Math IQ, Verbal IQ seem to have approximately the
same distribution - bell shaped centered about 100
- Final RA seems to be larger than initial RA and
more spread out - Improvement in RA
- Amount of improvement quite variable
84Numerical Measures
- Measures of Central Tendency (Location)
- Measures of Non Central Location
- Measure of Variability (Dispersion, Spread)
- Measures of Shape
85Measures of Central Tendency (Location)
Central Location
86Measures of Non-central Location
Non - Central Location
- Quartiles, Mid-Hinges
- Percentiles
87Measure of Variability (Dispersion, Spread)
- Variance, standard deviation
- Range
- Inter-Quartile Range
Variability
88Measures of Shape
89Measures of Central Location (Mean)
- Summation Notation
- Let x1, x2, x3, xn denote a set of n numbers.
- Then the symbol
- denotes the sum of these n numbers
- x1 x2 x3 xn
90- Example
- Let x1, x2, x3, x4, x5 denote a set of 5 denote
the set of numbers in the following table.
91- Then the symbol
- denotes the sum of these 5 numbers
- x1 x2 x3 x4 x5
- 10 15 21 7 13
- 66
92- Meaning of parts of summation notation
Final value for i
each term of the sum
Quantity changing in each term of the sum
Starting value for i
93- Example
- Again let x1, x2, x3, x4, x5 denote a set of 5
denote the set of numbers in the following table.
94- Then the symbol
- denotes the sum of these 3 numbers
- 153 213 73
- 3375 9261 343
- 12979
95Mean
- Let x1, x2, x3, xn denote a set of n numbers.
- Then the mean of the n numbers is defined as
96- Example
- Again let x1, x2, x3, x4, x5 denote a set of 5
denote the set of numbers in the following table.
97- Then the mean of the 5 numbers is
98Interpretation of the Mean
- Let x1, x2, x3, xn denote a set of n numbers.
- Then the mean, , is the centre of gravity of
those the n numbers. - That is if we drew a horizontal line and placed a
weight of one at each value of xi , then the
balancing point of that system of mass is at the
point .
99xn
x1
x2
x3
x4
100In the Example
21
10
7
15
13
20
10
0
101The mean, , is also approximately the center
of gravity of a histogram
102The Median
- Let x1, x2, x3, xn denote a set of n numbers.
- Then the median of the n numbers is defined as
the number that splits the numbers into two equal
parts. - To evaluate the median we arrange the numbers in
increasing order.
103- If the number of observations is odd there will
be one observation in the middle. - This number is the median.
- If the number of observations is even there will
be two middle observations. - The median is the average of these two
observations
104- Example
- Again let x1, x2, x3, x3 , x4, x5 denote a set of
5 denote the set of numbers in the following
table.
105- The numbers arranged in order are
- 7 10 13 15 21
Unique Middle observation the median
106- Example 2
- Let x1, x2, x3 , x4, x5 , x6 denote the 6 denote
numbers - 23 41 12 19 64 8
- Arranged in increasing order these observations
would be - 8 12 19 23 41 64
Two Middle observations
107- Median
- average of two middle observations
108Example
- The data on N 23 students
- Variables
- Verbal IQ
- Math IQ
- Initial Reading Achievement Score
- Final Reading Achievement Score
109Â Data Set 3 The following table gives data on
Verbal IQ, Math IQ, Initial Reading Acheivement
Score, and Final Reading Acheivement Score for 23
students who have recently completed a reading
improvement program  Initial Final Verbal
Math Reading Reading Student IQ IQ Acheivement
Acheivement  1 86 94 1.1 1.7 2 104 103 1.5 1.7
3 86 92 1.5 1.9 4 105 100 2.0 2.0 5 118 115 1.9
3.5 6 96 102 1.4 2.4 7 90 87 1.5 1.8 8 95 100
1.4 2.0 9 105 96 1.7 1.7 10 84 80 1.6 1.7 11 94
87 1.6 1.7 12 119 116 1.7 3.1 13 82 91 1.2 1.8
14 80 93 1.0 1.7 15 109 124 1.8 2.5 16 111 119
1.4 3.0 17 89 94 1.6 1.8 18 99 117 1.6 2.6 19 9
4 93 1.4 1.4 20 99 110 1.4 2.0 21 95 97 1.5 1.3
22 102 104 1.7 3.1 23 102 93 1.6 1.9
110- Computing the Median
- Stem leaf Diagrams
Median middle observation 12th observation
111Summary
112Some Comments
- The mean is the centre of gravity of a set of
observations. The balancing point. - The median splits the obsevations equally in two
parts of approximately 50
113- The median splits the area under a histogram in
two parts of 50 - The mean is the balancing point of a histogram
50
50
median
114- For symmetric distributions the mean and the
median will be approximately the same value
50
50
Median
115- For Positively skewed distributions the mean
exceeds the median - For Negatively skewed distributions the median
exceeds the mean
50
50
median
116- An outlier is a wild observation in the data
- Outliers occur because
- of errors (typographical and computational)
- Extreme cases in the population
117- The mean is altered to a significant degree by
the presence of outliers - Outliers have little effect on the value of the
median - This is a reason for using the median in place of
the mean as a measure of central location - Alternatively the mean is the best measure of
central location when the data is Normally
distributed (Bell-shaped)