Title: Transforming Relationships
1Transforming Relationships
- AP Statistics
- Practice of Statistics
- Section 4.1
2What Youll Learn
- Recognize when the relationship between two
variables is either an exponential relationship
or a power relationship - Perform the appropriate transformation to
linearize the data, find the LSRL on the
transformed points, untransform to find a model
for the original data
3Not everything in Linear!
- Weve looked at several sets of data in which the
relationships are linear in nature - What about those relationships that exhibit a
different nonlinear pattern? - Consider for a moment gypsy moths.
- An outbreak of gypsy moths in Massachusetts from
1978 to 1981 resulted in many acres of defoliated
land. The acreages are listed in the following
table.
4Gypsy Moths
- The data and graph depict the number of acres
defoliated by gypsy moths in Massachusetts
between 1978 and 1981.
Years 1978 1979 1980 1981
Acres of Defoliated land 63042 226260 907075 2826095
5- So, this doesnt look too bad! Lets try a
linear regression on the data, remembering to
check both the correlation coefficient and the
residual plot.
Simple Linear Regression Simple linear regression
results Dependent Variable Acres Independent
Variable Year Acres -1.7746007E9 896997.4
(Year) Sample size 4 R (correlation
coefficient) 0.9136 R-sq 0.8347045 Estimate
of error standard deviation 631139.44
Well a visual of the line doesnt look too bad,
and thats a great correlation coefficient.
(remember though, sometimes r is
deceptive---be sure to check the residuals!)
6The Residuals
- A check of the residuals indicates that a linear
model is not appropriate! (Notice the parabolic
pattern in the plot that even with only 4 data
points can be seen!)
7 So, what type of relationship is this?
- Remember from linear regression that when the
relationship is linear, the response variable
increases (or decreases) by a constant amount.
Years Since 1977 1 2 3 4
Acres of defoliated land 63042 226260 907075 2826095
Difference in Acres 163218 680815 1919020
- Notice that the difference between number of
acres is not constant - With this in mind and the problem with the
residual plot, lets consider another type of
relationship.
8Exponential Relationships
- In an exponential relationship, the response
variable increases by a fixed percentage of the
previous total. In other words, we should be able
to multiply the previous value by some constant
to get the next one. - So, lets check out this possibility (we will
again disregard the increase from 1990-1993 and
only look at the increases for 1-year intervals.
Years Since 1977 1 2 3 4
Acres of defoliated land 63042 226260 907075 2826095
Ratio (Next/Prev) 3.5890 4.0090 3.1156
- Notice that although the ratio is not exactly the
same (we wouldnt expect it to be exact with
real data) that there does appear to be a
pretty consistent ratio value.
9So How Do We Create the Model?
- If the relationship is an exponential one, we can
use a mathematical transformation to linearize
the data, find the LSRL of the transformed data,
then untransform to find the model that will
fit the original data. - Ok, so lets take all of that step by step
10Finding the Model
- Step 1 Use a mathematical model to linearize
(create a new data set whose relationship is
linear) - If the original data is exponential, find the
logarithm (either common log or natural log) of
each of the response values. - When working with years it is also helpful to
code the year data so our calculators can
handle the values (most computer programs are
capable of creating models using the full year)
To do this we will take each year and subtract
1977 (this way all of our values are gt 0)
Years 1978 1979 1980 1981
Acres of Defoliated land 63042 226260 907075 2826095
Years Since 1977 1 2 3 4
Log10 (acres) 4.7996 5.3546 5.9576 6.4512
11Finding the Model
- Now, lets check a scatterplot of the transformed
data
Notice the change in the pattern from our
original data to the transformed data. The
logarithm transformation really straightened our
data. (Using the natural logarithm would have
had the same effect, our values would have just
been different)
12Finding the Model
- Step 2 Find the LSRL for the transformed data
(remember to check the r and the residuals!)
Simple Linear Regression Simple linear regression
results Dependent Variable log10(Acres)
Independent Variable Year-1977 log 10(Acres)
4.2513404 0.5557706 (Year-1977) Sample size 4
R (correlation coefficient) 0.9993 R-sq
0.9985874 Estimate of error standard deviation
0.033050213
This model looks promising, but remember to check
the residuals.
13Finding the Model
A check of the residuals confirms that an
exponential model is appropriate.
14Untransforming to find the model for our
original data
- Remember that our goal was to find a model that
we could use for prediction of the number of
defoliated acres of land for a given year. - The linear model we have would predict the common
logarithm of acres. In order for our model to be
useful, we need to reverse the transformation to
create the model that fits the original data. - Although many transformations are easier to
untransform after evaluating, we can use the
properties of logarithms with both exponential
and power (well look at those next) to find the
model for our original data.
15Properties of Logarithms
- Before we try to untransform, lets review the
properties of logarithms you learned in Algebra
(yes, you really did learn these!) - Logb xy logb x logb y (Addition rule)
- Logb xm mlogb x (Power rule)
- Logb bn n (Same base)
- Logb(x/y) logb x logb y (Subtraction rule)
- Since any subtraction can be changed to an
addition equation, we will not use this last rule
much!
16Untransforming exponential expressions
- An exponential function takes the form
- y abx, where a, b are constants
- (This is the form we want to end up with)
- So, lets get started
log10 (Acres) 4.2513404 0.5557706
(Year-1977) 10log10(Acres) 10 4.2513404
0.5557706 (Year-1977) Acres 10 4.2513404
(10.5557706(Year-1977)) Acres 17837.7634
(3.5956(Year-1977))
Linear regression of the transformed data Raise
both sides using power of 10 (same base) Same
base law and multiplication law for
exponents. Simplify the constants
This is now in the form of yabx, where
a17837.7634 and b 3.5956 Notice that b is
approximately the average of the ratios
(next/prev) we calculated when we began looking
for a model.
17So, does it fit our original data?
- Since our original goal was to find a model that
would allow us to predict the number of acres of
defoliated land if we knew the year, we need to
check to see if our model actually fits the data.
The model looks pretty good, but as with any
model we need to use caution when predicting
outside our original data range.
18Power Models
- Another important transformation used in modeling
is the power model. - Power models have the form
- Y axb where a and b are constants
- We can find an appropriate power model by taking
the logarithms for both the response and
explanatory variables, finding the linear
regression for the transformed data, then using
the laws of logarithms and exponents to
untransform - Lets look at an example
19Fishing Tournament
- In a fishing tournament that you are in charge of
you need to find a way to record the weight of
each fish caught without destroying or killing
the fish. - Since it is easier to measure the length of the
fish rather than its weight, we must find a way
to convert the length to weight. - The local marine research lab has been gracious
enough to provide you with the data for the
average length and weight at different ages for
Atlantic Ocean rockfish which model most fish
species growing under normal feeding conditions.
20The Data
Age (yr) Length (cm) Weight (g)
1 5.2 2
2 8.5 8
3 11.5 21
4 14.3 38
5 16.8 69
6 19.2 117
7 21.3 148
8 23.3 190
9 25.0 264
10 26.7 293
11 28.2 318
12 29.6 371
13 30.8 455
14 32.0 504
15 33.0 518
16 34.0 537
17 34.9 651
18 36.4 719
19 37.1 726
20 37.7 810
- Since length is one dimensional and weight is
three dimensional we should be able to find a
reasonable model using power model (the residuals
for a regression on the original data confirms
that the variables are NOT linearly relatedbut
we already knew that!) - As before we need to first transform our data but
we have to perform transformations on both length
and weight
21Transforming the Data
Age (yr) Length (cm) Log 10 (length) Weight (g) Log10 (weight)
1 5.2 .7160 2 .3010
2 8.5 .9294 8 .9031
3 11.5 1.0607 21 1.3222
4 14.3 1.1553 38 1.5798
5 16.8 1.2253 69 1.8388
6 19.2 1.2833 117 2.0682
7 21.3 1.3284 148 2.1703
8 23.3 1.3674 190 2.2788
9 25.0 1.3979 264 2.4216
10 26.7 1.4265 293 2.4669
11 28.2 1.4502 318 2.5024
12 29.6 1.4713 371 2.5694
13 30.8 1.4886 455 2.6580
14 32.0 1.5052 504 2.7024
15 33.0 1.5315 518 2.7143
16 34.0 1.5428 537 2.7300
17 34.9 1.5611 651 2.8136
18 36.4 1.5694 719 2.8567
19 37.1 1.5763 726 2.8609
20 37.7 1.5763 810 2.9085
This scatterplot indicates that a linear
regression on the logarithms of both variables is
certainly one to consider.
22Linear Regression on the transformed data
- Simple linear regression results Dependent
Variable log10(Weight(g)) Independent Variable
log10(Length(cm)) log10 (Weight(g)) -1.8993973
3.049418 log10 (Length(cm)) Sample size 20 R
(correlation coefficient) 0.9993 R-sq
0.9985228
A check of the correlation coefficient is
certainly promising (r.9993), the scatterplot of
the transformed data indicates the line fits very
well, and most importantly-----look at those
residuals!!! Yes, statisticians get very excited
when they see residuals that look that good!
23Untransforming a power model
- log10 (Weight(g)) -1.8993973 3.049418 log10
(Length(cm)) - 10log10(Weight(g)) 10-1.8993973 3.049418
log10(length(cm)) - Weight 10-1.8993973 (103.049418log10(length(cm))
) - Weight 10-1.8993973(10log10(length(cm))3.049418)
- Weight 10-1.8993973(length(cm))3.049418)
- Weight .01261 (length(cm))3.049418
- Linear equation of the transformed data
- Raise both sides using a base of 10
- Same base and Multiplication law for exponents
- Power rule for logarithms
- Same base
- Simplify constants
Last check plot the new model on the original
data. Looks like weve got a model that will be
very useful for estimating the weight of a fish
if we know its length!
24Are there Other Possibilities?
- There are many other possibilities to transform
data in order to find a model. - If either an exponential or power model is not
appropriate you may try - Square the response or explanatory variable
- Take the square root of either variable
- Take the reciprocal of either variable
- The possibilities are endless, but for now we
will concentrate mostly on either an exponential
or power model.
25Transforming on the TI
- There are a couple of different ways to find both
an exponential and power regression model on your
TI-calculator - Using lists to transform
- Using the built in regression models
26Using lists to transform
- Well use the Gypsy Moth data first.
Enter in lists 1 2 L1 years since 1977
L2 acres of defoliated land Take the common log
of the values in list 2 and put the new values in
list 3 L3 log (L2) Now do a linear
regression on lists 1 3 You can check residuals
just like we did before to verify this
regression. Now untransform as we did before to
get the exponential
Note for a power model create another list for
the logarithm of the explanatory variable and do
the linear regression on these two lists.
27Using the Regression Models
- The TI family of calculators has both an
exponential and power model built into the stat
calc menus. - Create a list for the explanatory variable and
one for the response variable
- From the home screen
- STAT
- CALC
- 0ExpReg
- (APwrReg)
- L1, L2
- The model does not need untransforming
- The residuals created are the residuals from the
linear transformation on the transformed data
(yes, your calculator actually transforms the
data, does a linear regression, then untransforms
28How to decide which model
- Creating mathematical models for real data
involves a lot of trial and error. - One strategy
- Try a linear model first ( residuals)
- Then try an exponential model ( residuals)
- Then try a power model ( residuals)
- If all residuals show a pattern, you can continue
to try different transformations or choose the
one with the best correlation - Remember, no model is perfect, some models are
useful..we wish to find a useful model.