Lecture 16 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Lecture 16

Description:

Inference based on taking repeated random samples ( ) from the same subpopulations ... The best single prediction of a future response at X0 is the estimated ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 18
Provided by: dsma3
Category:
Tags: best | free | lecture | samples

less

Transcript and Presenter's Notes

Title: Lecture 16


1
Lecture 16 Thurs, Oct. 30
  • Inference for Regression (Sections 7.3-7.4)
  • Hypothesis Tests and Confidence Intervals for
    Intercept and Slope
  • Confidence Intervals for mean response
  • Prediction Intervals
  • Next time Robustness of least squares
    inferences, graphical tools for model assessment
    (8.1-8.3)

2
Regression
  • Goal of regression Estimate the mean response Y
    for subpopulations Xx,
  • Example Y neuron activity index, Xyears
    playing stringed instrument
  • Simple linear regression model
  • Estimate and by least squares
    choose to minimize the sum of squared
    residuals (prediction errors)

3
Ideal Model
  • Assumptions of ideal simple linear regression
    model
  • There is a normally distributed subpopulation of
    responses for each value of the explanatory
    variable
  • The means of the subpopulations fall on a
    straight-line function of the explanatory
    variable.
  • The subpopulation standard deviations are all
    equal (to
  • )
  • The selection of an observation from any of the
    subpopulations is independent of the selection of
    any other observation.

4
The standard deviation
  • is the standard deviation in each
    subpopulation.
  • measures the accuracy of predictions from the
    regression.
  • If the simple linear regression model holds, then
    approximately
  • 68 of the observations will fall within of
    the regression line
  • 95 of the observations will fall within of
    the regression line

5
Estimating
  • Residuals provide basis for an estimate of
  • Degrees of freedom for simple linear regression
    n-2
  • If the simple linear regression models holds,
    then approximately
  • 68 of the observations will fall within of
    the least squares line
  • 95 of the observations will fall within of
    the least squares line

6
Inference for Simple Linear Regression
  • Inference based on the ideal simple linear
    regression model holding.
  • Inference based on taking repeated random samples
    ( ) from the same subpopulations
  • ( ) as in the observed data.
  • Types of inference
  • Hypothesis tests for intercept and slope
  • Confidence intervals for intercept and slope
  • Confidence interval for mean of Y at XX0
  • Prediction interval for future Y for which XX0

7
Hypothesis tests for and
  • Hypothesis test of vs.
  • Based on t-test statistic,
  • p-value has usual interpretation, probability
    under the null hypothesis that t would be at
    least as large as its observed value, small
    p-value is evidence against null hypothesis
  • Hypothesis test for vs.
    is based on an analogous test statistic.
  • Test statistics and p-values can be found on JMP
    output under parameter estimates, obtained by
    using fit line after fit Y by X.

8
JMP output for example
9
Confidence Intervals for and
  • Confidence intervals provide a range of plausible
    values for and
  • 95 Confidence Intervals
  • Finding CIs in JMP Can find
  • under parameter estimates after fitting line.
    Can find in Table A.2.
  • For brain activity study, CIs

10
Confidence Intervals for Mean of Y at XX0
  • What is a plausible range of values for
  • 95 CI for
  • ,
  • Note about formula
  • Precision in estimating is not
    constant for all values of X. Precision
    decreases as X0 gets farther away from sample
    average of Xs
  • JMP implementation Use Confid Curves fit command
    under red triangle next to Linear Fit after using
    Fit Y by X, fit line. Use the crosshair tool to
    find the exact values of the confidence interval
    endpoints for a given X0.

11
Prediction Intervals
  • What are likely values for a future value Y0 at
    some specified value of X (X0)?
  • The best single prediction of a future response
    at X0 is the estimated mean response
  • A prediction interval is an interval of likely
    values along with a measure of the likelihood
    that interval will contain response.
  • 95 prediction interval for X0 If repeated
    samples are obtained from the
    subpopulations and a prediction
    interval is formed, the prediction interval will
    contain the value of Y0 for a future observation
    from the subpopulation X0 95 of the time.

12
Prediction Intervals Cont.
  • Prediction interval must account for two sources
    of uncertainty
  • Uncertainty about the location of the
    subpopulation mean
  • Uncertainty about where the future value will be
    in relation to its mean
  • Prediction Error Random Sampling Error
    Estimation Error

13
Prediction Interval Formula
  • 95 prediction interval at X0
  • Compare to 95 CI for mean at X0
  • Prediction interval is wider due to random
    sampling error in future response
  • As sample size n becomes large, margin of error
    of CI for mean goes to zero but margin of error
    of PI doesnt.
  • JMP implementation Use Confid Curves Indiv
    command under red triangle next to Linear Fit
    after using Fit Y by X, fit line. Use the
    crosshair tool to find the exact values of the
    confidence interval endpoints for a given X0.

14
Example
  • A building maintenance company is planning to
    submit a bid on a contract to clean 40 corporate
    offices scattered throughout an office complex.
    The costs incurred by the maintenance company are
    proportional to the number of crews needed for
    this task. Currently the company has 11 crews.
    Will 11 crews be enough?
  • Recent data are available for the number of rooms
    that were cleaned by varying number of crews.
    The data are in cleaning.jmp.
  • Assuming a simple linear regression model holds,
    which is more relevant for answering the question
    of interest a confidence interval for the mean
    number of rooms cleaned by 11 crews or a
    prediction interval for the number of rooms
    cleaned on a particular day by 11 crews?

15
Correlation
  • Section 7.5.4
  • Correlation is a measure of the degree of linear
    association between two variables X and Y. For
    each unit in population, both X and Y are
    measured.
  • Population correlation
  • Correlation is between 1 and 1. Correlation of
    0 indicates no linear association. Correlations
    near 1 indicates strong positive linear
    association correlations near 1 indicate strong
    negative linear association.

16
Correlation and Regression
  • Features of correlation
  • Dimension-free. Units of X and Y dont matter.
  • Symmetric in X and Y. There is no response and
    explanatory variable.
  • Correlation only measures degree of linear
    association. It is possible for there to be an
    exact relationship between X and Y and yet sample
    correlation coefficient is zero.
  • Correlation in JMP Click multivariate and put
    variables in Y, columns.
  • Connection to regression
  • Test of slope vs.
    is identical to test of vs.
    . Test of correlation coefficient
    only makes sense if the pairs (X,Y) are randomly
    sampled from population.

17
Correlation in JMP
Write a Comment
User Comments (0)
About PowerShow.com