Review of Midterm - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Review of Midterm

Description:

Review of Midterm math2200 – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 26
Provided by: nanl7
Category:

less

Transcript and Presenter's Notes

Title: Review of Midterm


1
Review of Midterm
  • math2200

2
Data
  • Ws
  • Subjects
  • Variables
  • Categorical versus quantitative

3
One categorical variable
  • Graphs
  • Bar chart
  • Pie chart
  • Numerical summary
  • Frequency table
  • Relative frequency table

4
Two categorical variables
  • Conditional and marginal distribution
  • Graphs
  • Segmented bar charts
  • Side-by-side bar charts
  • Side-by-side pie charts
  • Numerical summary
  • Contingency table
  • table percentage, row percentage, column
    percentage

5
Problems 28 (page 148)
  • Birth order related to major?
  • What percent of these students are oldest or only
    children? (113/223)
  • What percent of Humanities majors are oldest
    children? (15/43)
  • What percent of oldest children are Humanities
    students? (15/113)
  • What percent of the students are oldest children
    majoring in the Humanities? (15/223)

Major Birth Order Birth Order Birth Order Birth Order  
  1 2 3 4 Total
Math/Science 34 14 6 3 57
Agriculture 52 27 5 9 93
Humanities 15 17 8 3 43
Other 12 11 1 6 30
total 113 69 20 21 223
6
Problems 30 (page 148)
  • What is the marginal distribution of majors?
  • What is the conditional distribution of majors
    for the oldest children?

Math/Science Agriculture Humanities Other Total
57 (25.6) 93 (41.7) 43 (19.3) 30 (13.5) 223
Math/Science Agriculture Humanities Other Total
34 (30.1) 52 (46.0) 15 (13.3) 12 (10.6) 113
Major Birth Order Birth Order Birth Order Birth Order  
  1 2 3 4 Total
Math/Science 34 14 6 3 57
Agriculture 52 27 5 9 93
Humanities 15 17 8 3 43
Other 12 11 1 6 30
total 113 69 20 21 223
7
Simpsons Paradox
  • Problem 3.38 Two delivery services

Delivery Service Type of Service Number of deliveries Number of late packages Overall percentage of late deliveries
Pack Rats Regular 400 12 (3) 5.60
Pack Rats Overnight 100 16(16) 5.60
Boxes R Us Regular 100 2(2) 6
Boxes R Us Overnight 400 28 (7) 6
8
One quantitative variable
  • Graphs
  • Histogram
  • Boxplot
  • Qualitative summary
  • of modes
  • Symmetric? Transformation?
  • Outliers?
  • Numerical summary
  • Five-number summary
  • Center mean versus median
  • Spread sd versus IQR

9
Problem 32 Pay
  • The 1999 National Occupational Employment and
    Wage Estimates for management Occupations
  • For chief executives
  • Mean 48.67/hour
  • Median 52.08/hour
  • For General and Operations Managers
  • Mean 31.69/hour
  • Median 27.23/hour
  • Are these wage distributions likely to be
    symmetric, skewed to the left or skewed to the
    right?

10
Shifting and rescaling
Location shift rescale
min x x
Q1 x x
median x x
Q3 x x
max x x
mean x x
spread
variance x
Standard deviation x
IQR x
range x
11
Problem 4.42 Job Growth
  • 20 cities job growth rates predicted by Standard
    Poors DRI in 1996
  • Are the mean and median very different?
  • Which one is more appropriate?
  • Mean (2.37) or median (2.235)?
  • SD (0.425) or IQR (0.515)?
  • If we subtract from these growth rates the
    predicted U.S. average growth rate of 1.20, how
    would this change the above summary statistics?
  • If we omit Las Vegas (growth rate3.72) from the
    data, how would you expect the above summary
    statistics to change?
  • How to summarize the distribution of the data?

12
One quantitative variable and one categorical
variable
  • Comparing groups
  • with histogram, boxplot, stem-and-leaf plot
  • Transformation when spread is too different
    across groups

13
Normal model
  • Z-score and standard normal
  • Nearly normal condition
  • Normal probability plot
  • Four types of problems
  • Given parameters and data values (or z-score),
    ask for probabilities
  • Given parameters and probabilities, ask for data
    values (or z-score)
  • Given probabilities and data values (or z-score),
    ask for parameters
  • Given probabilities, data values (or z-score) and
    one parameter, ask for the other parameter

14
Problem 22 Winter Olympic 2002 speed skating
  • Top 25 mens and 25 womens 500-m speed skating
    times
  • Mean 73.46
  • Sd 3.33
  • If the Normal model is appropriate, what percent
    of the times should be within 1.67 seconds of
    73.46?
  • Solution 1 1.670.5sd, Normcdf(-0.5,0.5,0,1)
  • Solution 2 Normcdf(72.19, 75.13, 73.46, 3.33)
  • In the data, only 6 are within that range. Why
    are the percentages so different?

15
(No Transcript)
16
Problem 39 assembly time
  • Only 25 of the companys customers succeeded in
    building the desk under an hour
  • 5 said it took them over 2 hours
  • Assume that consumer assembly time follows a
    Normal model
  • Mean ? , SD ?
  • Z-score corresponding to 25
  • (1- mean)/ SD invNorm(0.25,0,1) 0.6744897495
  • Z-score corresponding to 95
  • (2- mean)/ SD invNorm(0.95,0,1) 1.644853626
  • Solve the two equations, we have
  • mean 1.29
  • SD 0.43

17
Problem 39 assembly time (cont.)
  • Mean 1.29, sd0.43
  • What assembly time should the company quote in
    order that 60 of customers succeed in finishing
    the desk by then?
  • invNorm(0.6,1.29,0.43)

18
Problem 39 assembly time (cont.)
  • Mean 1.29, sd0.43
  • The company wishes to improve the one-hour
    success rate to 60. If the sd stays the same,
    what new lower mean time does the company need to
    achieve?
  • Z-score invnorm (0.6,0,1)
  • Z-score (1-mean)/sd
  • Mean 0.89

19
Correlation
  • Sign of r means?
  • The range of r?
  • X and Y are called uncorrelated if and only if
    r0
  • r(x,y)r(y,x)
  • No units
  • Effected by shifting or rescaling X, Y or both?
  • Uncorrelated does NOT imply no association
  • Sensitive to outliers (remove a point close to
    the line fitted through the scatterplot increase
    or decrease r?)

20
Correlation Review II 13, Page 264
  • What factor most explains differences in Fuel
    Efficiency among cars? Heres a correlation
    matrix exploring that relationship for the cars
    Weight, Horsepower, engine size (Displacement),
    and number of Cylinders.

MPG Weight Horse-Power Displace ment Cylinders
MPG 1.000
Weight -0.903 1.000
Horse-Power -0.871 0.917 1.000
Displacement -0.786 0.951 0.872 1.000
Cylinders -0.806 0.917 0.864 0.940 1.000
  1. Which factor seems most strongly associated with
    Fuel Efficiency ?
  2. What does the negative correlation indicate?
  3. Explain the meaning of R2 for that relationship.

21
Matching r and scatterplots
  • Here are several scatterplots. The calculated
    correlations are 0.85, -0.87, 0.04 and 0.53.
    which is which?

22
Linear regression (least squares)
  • How to calculate the slope?
  • Given the slope, and standard deviations, how to
    calculate the correlation?
  • The line always goes through
  • Residual
  • Overestimation
  • Underestimation
  • Causal relationship ?
  • How to interpret ?

23
Diagnostics of a Linear Model
  • Visual inspection scatter plot satisfies the
    Straight Enough Condition? Looks okay,
  • Regression calculate the regression equation, r
    and R2. (R2rr gives the percentage of
    variation of the data explained by the model).
    R2 is tiny, saylt0.2, a linear model may not be
    a good choice.
  • Residuals check the residual plot even when R2
    is large. Bad sign if we see some pattern. The
    spread of the residuals are supposed to about the
    same across the X-axis if the linear model is
    appropriate. (you can either put predicted value
    or x-variable on x-axis).
  • Re-expression consider re-expressing the data.
    If a linear model is not appropriate for the
    data, And remember to repeat the diagnostics
    every time after fitting a new linear model on
    the transformed data.

24
Randomness Simulation
  • Simulation Component ?
  • Response variable?
  • Trial?
  • Example 11.20
  • Suppose the chance of passing the drivers test
    is 34 the first time and 72 for the subsequent
    retests. Estimate the percentage of those tested
    who still do not have a drivers license after
    two attemps.

25
Check list
  • Graphs and plots bar chart, pie chart,
    histogram, boxplot (mod boxplot on ti-83), normal
    probability plot, scatterplot, residual plotHow
    to make ? How to interpret ?
  • Statistics mean, medium, min, max, range,
    quartiles, standard deviation, IQR, correlation
    coefficient How to calculate ? How to interpret?
  • Model normal distribution, linear
    regression.How to get the parameters ?
Write a Comment
User Comments (0)
About PowerShow.com