Title: Data observation and Descriptive Statistics
1Data observation and Descriptive Statistics
2Organizing Data
- Frequency distribution
- Table that contains all the scores along with the
frequency (or number of times) the score occurs. - Relative frequency proportion of the total
observations included in each score.
3Frequency distribution
Amount f(frequency) rf(relative frequency)
0.00 2 0.125
0.13 1 0.0625
0.93 1 0.0625
1.00 1 0.0625
10.00 1 0.0625
32.00 1 0.0625
45.53 1 0.0625
56.00 1 0.0625
60.00 1 0.0625
63.25 1 0.0625
74.93 1 0.0625
80.00 1 0.0625
85.28 1 0.0625
115.35 1 0.0625
120.00 1 0.0625
n16 1.00
4Organizing data
- Class interval frequency distribution
- Scores are grouped into intervals and presented
along with frequency of scores in each interval. - Appears more organized, but does not show the
exact scores within the interval. - To calculate the range or width of the interval
- (Highest score lowest score) / of intervals
- Ex 120 0 / 5 24
5Class interval frequency distribution
Class interval f (frequency) rf ( relative frequency)
0-24 6 .375
25-48 2 .125
49-73 3 .1875
74-98 3 .1875
99-124 2 .125
n 16 1.00
6Graphs
- Bar graphs
- Data that are collected on a nominal scale.
- Qualitative variables or categorical variables.
- Each bar represents a separate (discrete)
category, and therefore, do not touch. - The bars on the x-axis can be placed in any
order.
7Bar Graph
8Graphs
- Histograms
- To illustrate quantitative variables
- Scores represent changes in quantity.
- Bars touch each other and represent a variable
with increasing values. - The values of the variable being measured have a
specific order and cannot be changed.
9Histogram
10Frequency polygon
- Line graph for quantitative variables
- Represents continuous data (time, age, weight)
11Frequency Polygon
- AGE
- 22.06
- 24.05
- 25.04
- 25.04
- 25.07
- 25.07
- 26.03
- 26.11
- 27.03
- 27.11
- 29.03
- 29.05
- 29.05
- 34
- 37.1
- 53
12Descriptive Statistics
- Numerical measures that describe
- Central tendency of distribution
- Width of distribution
- Shape of distribution
13Central tendency
- Describe the middleness of a data set
- Mean
- Median
- Mode
14Mean
- Arithmetic average
- Used for interval and ratio data
- Formula for population mean ( µ pronounced
mu) - µ ? X
- _____
- N
- Formulas for sample mean
-
-
15Mean
Amount f(frequency) rf(relative frequency)
0.00 2 0.125
0.13 1 0.0625
0.93 1 0.0625
1.00 1 0.0625
10.00 1 0.0625
32.00 1 0.0625
45.53 1 0.0625
56.00 1 0.0625
60.00 1 0.0625
63.25 1 0.0625
74.93 1 0.0625
80.00 1 0.0625
85.28 1 0.0625
115.35 1 0.0625
120.00 1 0.0625
46.53 n16 1
16Mean
- Not a good indicator of central tendency if
distribution has extreme scores (high or low). - High scores pull the mean higher
- Low scores pull the mean lower
17Median
- Middle score of a distribution once the scores
are arranged in increasing or decreasing order. - Used when the mean might not be a good indicator
of central tendency. - Used with ratio, interval and ordinal data.
18Median
0.00 0.00
0.13
0.93
1.00
10.00
32.00
45.53
56.00
60.00
63.25
74.93
80.00
85.28
115.35
120.00
19Mode
- The score that occurs in the distribution with
the greatest frequency. - Mode 0 no mode
- Mode 1 unimodal
- Mode 2 bimodal distribution
- Mode 3 trimodal distribution
20Mode
Amount f(frequency) rf(relative frequency)
0.00 2 0.125
0.13 1 0.0625
0.93 1 0.0625
1.00 1 0.0625
10.00 1 0.0625
32.00 1 0.0625
45.53 1 0.0625
56.00 1 0.0625
60.00 1 0.0625
63.25 1 0.0625
74.93 1 0.0625
80.00 1 0.0625
85.28 1 0.0625
115.35 1 0.0625
120.00 1 0.0625
46.53 n16 1
21Measures of Variability
- Range
- From the lowest to the highest score
- Variance
- Average square deviation from the mean
- Standard deviation
- Variation from the sample mean
- Square root of the variance
22Measures of Variability
- Indicate the degree to which the scores are
clustered or spread out in a distribution. - Ex Two distributions of teacher to student
ratio. - Which college has more variation?
College A College B
4 16
12 19
41 22
Sum 57 Sum 57
Mean 19 Mean 19
23Range
- The difference between the highest and lowest
scores. - Provides limited information about variation.
- Influenced by high and low scores.
- Does not inform about variations of scores not at
the extremes. - Examples
- Range X(highest) X (lowest)
- College A range 41- 4 37
- College B range 22-16 6
24Variance
- Limitations of range require a more precise way
to measure variability. - Deviation The degree to which the scores in a
distribution vary from the mean. - Typical measure of variability standard
deviation (SD) - Variance
- The first step in calculating standard deviation
25Variance
- X Number of therapy sessions each student
attended. - M 4.2
Deviation
Sum of deviations 0
26Variance
- In order to eliminate negative signs, we square
the deviations. - Sum the deviations sum of squares or SS
27Variance
- Take the average of the SS
- Ex SS 48.80
- SD2 S(X-M)2
- N
- That is the average of the squared deviations
from the mean - SD2 9.76
28Standard Deviation
- Standard deviation
- Typical amount that the scores vary or deviate
from the sample mean - SD S(X-M)2
- N
- That is, the square root of the variance
- Since we take the square root, this value is now
more representative of the distribution of the
scores.
29Standard Deviation
- X 1, 2, 4, 4, 10
- M 4.2
- SD 3.12 (standard deviation)
- SD2 9.76 (variance)
- Always ask yourself do these data (mean and SD)
make sense based on the raw scores?
30Population Standard Deviation
- The average amount that the scores in a
distribution vary from the mean. - Population standard deviation
- (s pronounced sigma)
-
-
31(No Transcript)
32Sample Standard Deviation
- Sample is a subset of the population.
- Use sample SD to estimate population SD.
- Because samples are smaller than populations,
there may be less variability in a sample. - To correct for this, we divide the sample by N
1 - Increases the standard deviation of the sample.
- Provides a better estimate of population standard
deviation.
- s ?( X - X ) ²
- _________
- N - 1
Unbiased Sample estimator standard deviation
Population standard deviation
33Sample Standard Deviation
X X - mean X - mean squared
0.00 -46.53 2,165.04
0.00 -46.53 2,165.04
0.13 -46.40 2,152.96
0.93 -45.60 2,079.36
1.00 -45.53 2,072.98
10.00 -36.53 1,334.44
32.00 -14.53 211.12
45.53 -1.00 1.00
56.00 9.47 89.68
60.00 13.47 181.44
63.25 16.72 279.56
74.93 28.40 806.56
80.00 33.47 1,120.24
85.28 38.75 1,501.56
115.35 68.82 4,736.19
120.00 73.47 5,397.84
46.53 N 16 SS 26,295.02
34Types of Distributions
- Refers to the shape of the distribution.
- 3 types
- Normal distribution
- Positively skewed distribution
- Negatively skewed distribution
35Normal Distribution
- Normal distributions Specific frequency
distribution - Bell shaped
- Symmetrical
- Unimodal
- Most distributions of variables found in nature
(when samples are large) are normal
distributions.
36Normal Distribution
Mean, media and mode are equal and located in the
center.
37Normal Distribution
38Skewed distributions
- When our data are not symmetrical
- Positively skewed distribution
- Negatively skewed distribution
- Memory hint skew is where the tail is also the
tail looks like a skewer and it points to the
skew (either positive or negative direction)
39Skewed Distributions
40Kurtosis
- Kurtosis - how flat or peaked a distribution is.
- Tall and skinny versus short and wide
- Mesokurtic normal
- Leptokurtic tall and thin
- Platykurtic short and fat (squatty like a
platypus!)
41Kurtosis
leptokurtic
platykurtic
mesokurtic
42Skewness, Number of Modes, and Kurtosis in
Distribution of Housing Prices
43z - Scores
- In which country (US vs. England) is Homer
Simpson considered overweight? - How can we make this comparison?
- Need to convert weight in pounds and kilograms
to a standardized scale. - Z- scores allow for scores from different
distributions to be compared under standardized
conditions. - The need for standardization
- Putting two different variables on the same scale
- z-score Transforming raw scores into
standardized scores - z (X - µ)
- s
- Tell us the number of standard deviations a score
is from the mean. -
-
44z- Scores
- Class 1 M 46.53 SD 41.87 X
54.76 - Class 2 M 53.67 SD 18.23 X
89.07 - In which class did I have more money in
comparison to the distribution of the other
students? - Sample z-score z (X - M)
- s
- When we convert raw scores from different
distributions to z-scores, these scores become
part of the same z distribution and we
can compare scores from different distributions.
45z Distribution
- Characteristics (regardless of the original
distributions) - z score at the mean equals 0
- Standard deviation equals 1
-
46z distribution of exam scores
M 70 s 10
47Standard normal distribution
- If a z-distribution is normal, then we refer to
it as a standard normal distribution. - Provides information about the proportion of
scores that are higher or lower than any other
score in the distribution.
48Standard Normal Curve Table
- Standard normal curve table (Appendix A)
- Statisticians provided the proportion of scores
that fall between any two z-scores. - What is the percentile rank of a z score of 1?
- Percentile rank proportion of scores at or
below a given raw score. - Ex SAT score 1350 M 1120 s 340
- 75th percentile
49Percentile Rank
- The percentage of scores that your score is
higher than. - 89th percentile rank for height
- You are taller than 89 of the students in the
class. (you are tall!) - Homer Simpson 4th percentile rank for
intelligence. - he is smarter than 4 of the population (or
96 of the population is smarter than Homer). - GRE score 88th percentile rank
- Reading scores of grammar school 18th percentile
rank -
50Review
- Data organization
- Frequency distribution, bar graph, histogram and
frequency polygon. - Descriptive statistics
- Central tendency middleness of a distribution
- Mean, median and mode
- Measures of variation the spread of a
distribution - Range, standard deviation
- Distributions can be normal or skewed (positively
or negatively). - Z- scores
- Method of transforming raw scores into standard
scores for comparisons. - Normal distribution mean z-score 0 and
standard deviation 1 - Normal curve table shows the proportions of
scores below the curve for a given z-score.