Title: Types of Distributions
1Chapter 1
2Statistics
- the science of collecting, analyzing, and drawing
conclusions from data
3Descriptive statistics
- the methods of organizing summarizing data
4Inferential statistics
- involves making generalizations from a sample to
a population
5Population
- The entire collection of individuals or objects
about which information is desired
6Sample
- A subset of the population, selected for study in
some prescribed manner
7Variable
- any characteristic whose value may change from
one individual to another
8Data
- observations on single variable or simultaneously
on two or more variables
9Types of variables
10Categorical variables
- or qualitative
- identifies basic differentiating characteristics
of the population
11Numerical variables
- or quantitative
- observations or measurements take on numerical
values - makes sense to average these values
- two types - discrete continuous
12Discrete (numerical)
- listable set of values
- usually counts of items
13Continuous (numerical)
- data can take on any values in the domain of the
variable - usually measurements of something
14Classification by the number of variables
- Univariate - data that describes a single
characteristic of the population - Bivariate - data that describes two
characteristics of the population - Multivariate - data that describes more than two
characteristics (beyond the scope of this course
15Identify the following variables
- the income of adults in your city
- the color of MM candies selected at random from
a bag - the number of speeding tickets each student in AP
Statistics has received - the area code of an individual
- the birth weights of female babies born at a
large hospital over the course of a year
Numerical
Categorical
Numerical
Categorical
Numerical
16Graphs for categorical data
17Bar Graph
- Used for categorical data
- Bars do not touch
- Categorical variable is typically on the
horizontal axis - To describe comment on which occurred the most
often or least often - May make a double bar graph or segmented bar
graph for bivariate categorical data sets
18Using class survey datagraph birth month
graph gender handedness
19Pie (Circle) graph
- Used for categorical data
- To make
- Proportion 360
- Using a protractor, mark off each part
- To describe comment on which occurred the most
often or least often
20Graphs for numerical data
21Dotplot
- Used with numerical data (either discrete or
continuous) - Made by putting dots (or Xs) on a number line
- Can make comparative dotplots by using the same
axis for multiple groups
22Distribution Activity . . .
23Types (shapes)of Distributions
24Symmetrical
- refers to data in which both sides are (more or
less) the same when the graph is folded
vertically down the middle - bell-shaped is a special type
- has a center mound with two sloping tails
25Uniform
- refers to data in which every class has equal or
approximately equal frequency
26Skewed (left or right)
- refers to data in which one side (tail) is longer
than the other side - the direction of skewness is on the side of the
longer tail
27Bimodal (multi-modal)
- refers to data in which two (or more) classes
have the largest frequency are separated by at
least one other class
28How to describe a numerical, univariate graph
29What strikes you as the most distinctive
difference among the distributions of exam scores
in classes A, B, C ?
301. Center
- discuss where the middle of the data falls
- three types of central tendency
- mean, median, mode
31What strikes you as the most distinctive
difference among the distributions of scores in
classes D, E, F?
322. Spread
- discuss how spread out the data is
- refers to the variability of the data
- Range, standard deviation, IQR
33What strikes you as the most distinctive
difference among the distributions of exam scores
in classes G, H, I ?
343. Shape
- refers to the overall shape of the distribution
- symmetrical, uniform, skewed, or bimodal
35What strikes you as the most distinctive
difference among the distributions of exam scores
in class K ?
364. Unusual occurrences
- outliers - value that lies away from the rest of
the data - gaps
- clusters
- anything else unusual
375. In context
- You must write your answer in reference to the
specifics in the problem, using correct
statistical vocabulary and using complete
sentences!
38More graphs for numerical data
39Stemplots (stem leaf plots)
- Used with univariate, numerical data
- Must have key so that we know how to read numbers
- Can split stems when you have long list of leaves
- Can have a comparative stemplot with two groups
Would a stemplot be a good graph for the number
of pieces of gun chewed per day by AP Stat
students? Why or why not?
Would a stemplot be a good graph for the number
of pairs of shoes owned by AP Stat students? Why
or why not?
40Example The following data are price per ounce
for various brands of dandruff shampoo at a local
grocery store. 0.32 0.21 0.29 0.54 0.17 0.28 0.36
0.23 Can you make a stemplot with this data?
41Example Tobacco use in G-rated Movies Total
tobacco exposure time (in seconds) for Disney
movies 223 176 548 37 158 51 299 37
11 165 74 9 2 6 23 206 9 Total tobacco exposure
time (in seconds) for other studios
movies 205 162 6 1 117 5 91 155 24 55 17 Make a
comparative stemplot.
42Histograms
- Used with numerical data
- Bars touch on histograms
- Two types
- Discrete
- Bars are centered over discrete values
- Continuous
- Bars cover a class (interval) of values
- For comparative histograms use two separate
graphs with the same scale on the horizontal axis
Would a histogram be a good graph for the fastest
speed driven by AP Stat students? Why or why not?
Would a histogram be a good graph for the number
of pieces of gun chewed per day by AP Stat
students? Why or why not?
43Cumulative Relative Frequency Plot(Ogive)
- . . . is used to answer questions about
percentiles. - Percentiles are the percent of individuals that
are at or below a certain value. - Quartiles are located every 25 of the data. The
first quartile (Q1) is the 25th percentile, while
the third quartile (Q3) is the 75th percentile.
What is the special name for Q2? - Interquartile Range (IQR) is the range of the
middle half (50) of the data. - IQR Q3 Q1