Transforming Relationships - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Transforming Relationships

Description:

Transforming Relationships AP Statistics Practice of Statistics Section 4.1 – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 29
Provided by: Susan725
Category:

less

Transcript and Presenter's Notes

Title: Transforming Relationships


1
Transforming Relationships
  • AP Statistics
  • Practice of Statistics
  • Section 4.1

2
What Youll Learn
  • Recognize when the relationship between two
    variables is either an exponential relationship
    or a power relationship
  • Perform the appropriate transformation to
    linearize the data, find the LSRL on the
    transformed points, untransform to find a model
    for the original data

3
Not everything in Linear!
  • Weve looked at several sets of data in which the
    relationships are linear in nature
  • What about those relationships that exhibit a
    different nonlinear pattern?
  • Consider for a moment gypsy moths.
  • An outbreak of gypsy moths in Massachusetts from
    1978 to 1981 resulted in many acres of defoliated
    land. The acreages are listed in the following
    table.

4
Gypsy Moths
  • The data and graph depict the number of acres
    defoliated by gypsy moths in Massachusetts
    between 1978 and 1981.

Years 1978 1979 1980 1981
Acres of Defoliated land 63042 226260 907075 2826095
5
  • So, this doesnt look too bad! Lets try a
    linear regression on the data, remembering to
    check both the correlation coefficient and the
    residual plot.

Simple Linear Regression Simple linear regression
results Dependent Variable Acres Independent
Variable Year Acres -1.7746007E9 896997.4
(Year) Sample size 4 R (correlation
coefficient) 0.9136 R-sq 0.8347045 Estimate
of error standard deviation 631139.44
Well a visual of the line doesnt look too bad,
and thats a great correlation coefficient.
(remember though, sometimes r is
deceptive---be sure to check the residuals!)
6
The Residuals
  • A check of the residuals indicates that a linear
    model is not appropriate! (Notice the parabolic
    pattern in the plot that even with only 4 data
    points can be seen!)

7
So, what type of relationship is this?
  • Remember from linear regression that when the
    relationship is linear, the response variable
    increases (or decreases) by a constant amount.

Years Since 1977 1 2 3 4
Acres of defoliated land 63042 226260 907075 2826095
Difference in Acres 163218 680815 1919020
  • Notice that the difference between number of
    acres is not constant
  • With this in mind and the problem with the
    residual plot, lets consider another type of
    relationship.

8
Exponential Relationships
  • In an exponential relationship, the response
    variable increases by a fixed percentage of the
    previous total. In other words, we should be able
    to multiply the previous value by some constant
    to get the next one.
  • So, lets check out this possibility (we will
    again disregard the increase from 1990-1993 and
    only look at the increases for 1-year intervals.

Years Since 1977 1 2 3 4
Acres of defoliated land 63042 226260 907075 2826095
Ratio (Next/Prev) 3.5890 4.0090 3.1156
  • Notice that although the ratio is not exactly the
    same (we wouldnt expect it to be exact with
    real data) that there does appear to be a
    pretty consistent ratio value.

9
So How Do We Create the Model?
  • If the relationship is an exponential one, we can
    use a mathematical transformation to linearize
    the data, find the LSRL of the transformed data,
    then untransform to find the model that will
    fit the original data.
  • Ok, so lets take all of that step by step

10
Finding the Model
  • Step 1 Use a mathematical model to linearize
    (create a new data set whose relationship is
    linear)
  • If the original data is exponential, find the
    logarithm (either common log or natural log) of
    each of the response values.
  • When working with years it is also helpful to
    code the year data so our calculators can
    handle the values (most computer programs are
    capable of creating models using the full year)
    To do this we will take each year and subtract
    1977 (this way all of our values are gt 0)

Years 1978 1979 1980 1981
Acres of Defoliated land 63042 226260 907075 2826095
Years Since 1977 1 2 3 4
Log10 (acres) 4.7996 5.3546 5.9576 6.4512
11
Finding the Model
  • Now, lets check a scatterplot of the transformed
    data

Notice the change in the pattern from our
original data to the transformed data. The
logarithm transformation really straightened our
data. (Using the natural logarithm would have
had the same effect, our values would have just
been different)
12
Finding the Model
  • Step 2 Find the LSRL for the transformed data
    (remember to check the r and the residuals!)

Simple Linear Regression Simple linear regression
results Dependent Variable log10(Acres)
Independent Variable Year-1977 log 10(Acres)
4.2513404 0.5557706 (Year-1977) Sample size 4
R (correlation coefficient) 0.9993 R-sq
0.9985874 Estimate of error standard deviation
0.033050213
This model looks promising, but remember to check
the residuals.
13
Finding the Model
A check of the residuals confirms that an
exponential model is appropriate.
14
Untransforming to find the model for our
original data
  • Remember that our goal was to find a model that
    we could use for prediction of the number of
    defoliated acres of land for a given year.
  • The linear model we have would predict the common
    logarithm of acres. In order for our model to be
    useful, we need to reverse the transformation to
    create the model that fits the original data.
  • Although many transformations are easier to
    untransform after evaluating, we can use the
    properties of logarithms with both exponential
    and power (well look at those next) to find the
    model for our original data.

15
Properties of Logarithms
  • Before we try to untransform, lets review the
    properties of logarithms you learned in Algebra
    (yes, you really did learn these!)
  • Logb xy logb x logb y (Addition rule)
  • Logb xm mlogb x (Power rule)
  • Logb bn n (Same base)
  • Logb(x/y) logb x logb y (Subtraction rule)
  • Since any subtraction can be changed to an
    addition equation, we will not use this last rule
    much!

16
Untransforming exponential expressions
  • An exponential function takes the form
  • y abx, where a, b are constants
  • (This is the form we want to end up with)
  • So, lets get started

log10 (Acres) 4.2513404 0.5557706
(Year-1977) 10log10(Acres) 10 4.2513404
0.5557706 (Year-1977) Acres 10 4.2513404
(10.5557706(Year-1977)) Acres 17837.7634
(3.5956(Year-1977))
Linear regression of the transformed data Raise
both sides using power of 10 (same base) Same
base law and multiplication law for
exponents. Simplify the constants
This is now in the form of yabx, where
a17837.7634 and b 3.5956 Notice that b is
approximately the average of the ratios
(next/prev) we calculated when we began looking
for a model.
17
So, does it fit our original data?
  • Since our original goal was to find a model that
    would allow us to predict the number of acres of
    defoliated land if we knew the year, we need to
    check to see if our model actually fits the data.

The model looks pretty good, but as with any
model we need to use caution when predicting
outside our original data range.
18
Power Models
  • Another important transformation used in modeling
    is the power model.
  • Power models have the form
  • Y axb where a and b are constants
  • We can find an appropriate power model by taking
    the logarithms for both the response and
    explanatory variables, finding the linear
    regression for the transformed data, then using
    the laws of logarithms and exponents to
    untransform
  • Lets look at an example

19
Fishing Tournament
  • In a fishing tournament that you are in charge of
    you need to find a way to record the weight of
    each fish caught without destroying or killing
    the fish.
  • Since it is easier to measure the length of the
    fish rather than its weight, we must find a way
    to convert the length to weight.
  • The local marine research lab has been gracious
    enough to provide you with the data for the
    average length and weight at different ages for
    Atlantic Ocean rockfish which model most fish
    species growing under normal feeding conditions.

20
The Data
Age (yr) Length (cm) Weight (g)
1 5.2 2
2 8.5 8
3 11.5 21
4 14.3 38
5 16.8 69
6 19.2 117
7 21.3 148
8 23.3 190
9 25.0 264
10 26.7 293
11 28.2 318
12 29.6 371
13 30.8 455
14 32.0 504
15 33.0 518
16 34.0 537
17 34.9 651
18 36.4 719
19 37.1 726
20 37.7 810
  • Since length is one dimensional and weight is
    three dimensional we should be able to find a
    reasonable model using power model (the residuals
    for a regression on the original data confirms
    that the variables are NOT linearly relatedbut
    we already knew that!)
  • As before we need to first transform our data but
    we have to perform transformations on both length
    and weight

21
Transforming the Data
Age (yr) Length (cm) Log 10 (length) Weight (g) Log10 (weight)
1 5.2 .7160 2 .3010
2 8.5 .9294 8 .9031
3 11.5 1.0607 21 1.3222
4 14.3 1.1553 38 1.5798
5 16.8 1.2253 69 1.8388
6 19.2 1.2833 117 2.0682
7 21.3 1.3284 148 2.1703
8 23.3 1.3674 190 2.2788
9 25.0 1.3979 264 2.4216
10 26.7 1.4265 293 2.4669
11 28.2 1.4502 318 2.5024
12 29.6 1.4713 371 2.5694
13 30.8 1.4886 455 2.6580
14 32.0 1.5052 504 2.7024
15 33.0 1.5315 518 2.7143
16 34.0 1.5428 537 2.7300
17 34.9 1.5611 651 2.8136
18 36.4 1.5694 719 2.8567
19 37.1 1.5763 726 2.8609
20 37.7 1.5763 810 2.9085
This scatterplot indicates that a linear
regression on the logarithms of both variables is
certainly one to consider.
22
Linear Regression on the transformed data
  • Simple linear regression results Dependent
    Variable log10(Weight(g)) Independent Variable
    log10(Length(cm)) log10 (Weight(g)) -1.8993973
    3.049418 log10 (Length(cm)) Sample size 20 R
    (correlation coefficient) 0.9993 R-sq
    0.9985228

A check of the correlation coefficient is
certainly promising (r.9993), the scatterplot of
the transformed data indicates the line fits very
well, and most importantly-----look at those
residuals!!! Yes, statisticians get very excited
when they see residuals that look that good!
23
Untransforming a power model
  • log10 (Weight(g)) -1.8993973 3.049418 log10
    (Length(cm))
  • 10log10(Weight(g)) 10-1.8993973 3.049418
    log10(length(cm))
  • Weight 10-1.8993973 (103.049418log10(length(cm))
    )
  • Weight 10-1.8993973(10log10(length(cm))3.049418)
  • Weight 10-1.8993973(length(cm))3.049418)
  • Weight .01261 (length(cm))3.049418
  • Linear equation of the transformed data
  • Raise both sides using a base of 10
  • Same base and Multiplication law for exponents
  • Power rule for logarithms
  • Same base
  • Simplify constants

Last check plot the new model on the original
data. Looks like weve got a model that will be
very useful for estimating the weight of a fish
if we know its length!
24
Are there Other Possibilities?
  • There are many other possibilities to transform
    data in order to find a model.
  • If either an exponential or power model is not
    appropriate you may try
  • Square the response or explanatory variable
  • Take the square root of either variable
  • Take the reciprocal of either variable
  • The possibilities are endless, but for now we
    will concentrate mostly on either an exponential
    or power model.

25
Transforming on the TI
  • There are a couple of different ways to find both
    an exponential and power regression model on your
    TI-calculator
  • Using lists to transform
  • Using the built in regression models

26
Using lists to transform
  • Well use the Gypsy Moth data first.

Enter in lists 1 2 L1 years since 1977
L2 acres of defoliated land Take the common log
of the values in list 2 and put the new values in
list 3 L3 log (L2) Now do a linear
regression on lists 1 3 You can check residuals
just like we did before to verify this
regression. Now untransform as we did before to
get the exponential
Note for a power model create another list for
the logarithm of the explanatory variable and do
the linear regression on these two lists.
27
Using the Regression Models
  • The TI family of calculators has both an
    exponential and power model built into the stat
    calc menus.
  • Create a list for the explanatory variable and
    one for the response variable
  • From the home screen
  • STAT
  • CALC
  • 0ExpReg
  • (APwrReg)
  • L1, L2
  • The model does not need untransforming
  • The residuals created are the residuals from the
    linear transformation on the transformed data
    (yes, your calculator actually transforms the
    data, does a linear regression, then untransforms

28
How to decide which model
  • Creating mathematical models for real data
    involves a lot of trial and error.
  • One strategy
  • Try a linear model first ( residuals)
  • Then try an exponential model ( residuals)
  • Then try a power model ( residuals)
  • If all residuals show a pattern, you can continue
    to try different transformations or choose the
    one with the best correlation
  • Remember, no model is perfect, some models are
    useful..we wish to find a useful model.
Write a Comment
User Comments (0)
About PowerShow.com