Analysis of new car data - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Analysis of new car data

Description:

Independent: Horsepower, Vehicle Weight (lbs), and Engine Size (liters) ... Land Rover Disco, Hummer H2, and Audi S4 Quattro. N = 258 with outliers removed ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 17
Provided by: distdell4
Category:
Tags: analysis | car | data | new

less

Transcript and Presenter's Notes

Title: Analysis of new car data


1
Analysis of new car data
  • Roy Clemons
  • Emily Hollister
  • Stat 653
  • November 2004

2
Data Set
  • 2004 New Car and Truck data
  • Obtained from www.amstat.org
  • Based on Kiplingers Personal Finance report,
    December 2003

3
Design
  • Experimental unit Vehicle Type
  • Number of populations 271
  • Analyzed a subset of vehicles based on a price
    range 30,000 - 40, 000
  • Variables used
  • Dependent City Mileage (mpg)
  • Independent Horsepower, Vehicle Weight (lbs),
    and Engine Size (liters)
  • Factors Vehicle type (Sedan vs. SUV)

4
Working Model
  • Ymileage µi ß1ixhorse ß2ixwt ß3ixeng
  • Unknown parameters
  • Intercepts for Sedan and SUV
  • Slopes of Vehicle weight, Horsepower, and Engine
    size as related to city mileage

5
Hypotheses
Sedan 1 SUV 2
  • H0 µ1 µ2
  • H0 ß11 ß12 (Slope of XHorsepower)
  • H0 ß21 ß22 (Slope of XWeight)
  • H0 ß31 ß32 (Slope of XEngine Size)

6
Analysis
  • Removed outliers based on St. Deviation
  • Hybrid and diesel-powered vehicles
  • Honda Insight, Toyota Prius, Honda Civic Hybrid,
    and Volkswagen Jetta GLS TDI
  • Other mileage outliers
  • Land Rover Disco, Hummer H2, and Audi S4 Quattro

N 258 with outliers removed
7
Analysis continued
  • 2. Set up model in GLM to test normality of
    residuals.
  • 3. Saved design matrix, checked for
    multicollinearity, and used the BP test to check
    equality of variance.
  • Residuals were not normal
  • Variance was not equal between vehicle types
  • Collinearity diagnostics were high (multiple
    values gt10.)

8
Transformation and Standardization
  • Box-Cox indicated y-1 produced the most
    appropriate fit for our data
  • The independent variables were standardized

9
Analysis
  • Transformations corrected issues of normality,
    equality of variance.

Tests of Normality
Breusch-Pagan test for Heteroscedasticity
(CHI-SQUARE dfP) 8.846 Significance level of
Chi-square dfP (H0homoscedasticity) .3554
10
and multicollinearity!
Coefficients(a)
a Dependent Variable inv_cmpg
11
Model Fit
12
Testing Hypotheses
  • Used GLM L-matrix to test our hypotheses
    concerning intercepts and slopes
  • Significant differences were found (p0.05) for
  • Intercepts
  • Slopes of Xhorse, Xeng
  • No significant difference was found between the
    slopes of Xwt
  • Significance value 0.149

13
Subset Selection
  • Horsepower appeared to be important in the model
    but not for both vehicle types.
  • Subset selection revealed that Horsepower is an
    important factor for Sedan mileage but not for
    SUV mileage

14
Subset selection models
  • Sedan
  • Y-1mileage µ1 ß1xz_horse ß21xz_wt
    ß31xz_eng
  • SUV
  • Y-1mileage µ2 ß22xz_wt ß32xz_eng

Though the ß21 and ß22 values were not found to
be significantly different from one another (via
L-matrix test), they were still included in the
subset selection.
15
Final models
  • Sedan
  • Y-1mileage µ1 ß1xz_horse ß2xz_wt
    ß31xz_eng
  • SUV
  • Y-1mileage µ2 ß2xz_wt ß32xz_eng
  • We used a common slope for X z_wt

16
Conclusions
  • Estimating the city fuel economy of 2004 Sedans
    and SUVs requires 2 different models.
  • Models explain 88 of the variance of city fuel
    economy
  • Removal of outliers, transformation of Y, and the
    standardization of the independent variables were
    necessary to meet statistical assumptions.
Write a Comment
User Comments (0)
About PowerShow.com