Regression analysis - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Regression analysis

Description:

Regression analysis. Contd. Model selection and equations in ... Beak. Ranks. Length data (cm) Total 234 5738.5. 2. Kendall's Coefficient of Concordance ... – PowerPoint PPT presentation

Number of Views:175
Avg rating:3.0/5.0
Slides: 26
Provided by: stweb
Category:

less

Transcript and Presenter's Notes

Title: Regression analysis


1
  • Regression analysis
  • Contd.

2
  • Model selection and equations in regression
    analysis (Univariate)

Example of Chicken manure and NFY (Practicum 10
Ex. 1)
MODEL MOD_1. Independent CM Dep Models
Rsq d.f. F Sigf b0 b1 b2
b3 NFY LIN .654 25 47.30
.000 2321.18 5.0595 NFY LOG .832
25 123.73 .000 -4179.0 1582.98 NFY
QUA .914 24 127.92 .000 1029.14
18.0385 -.0135 NFY CUB .919 23
86.45 .000 774.282 22.1247 -.0241
6.9E-06 NFY EXP .652 25 46.75
.000 2207.95 .0013
R square (R2) P - value Intercept
Other coefficients
3
Model formulation
Linear Y 2321.18 5.06 X (n 27, p
0.00, r2 0.654) Qua. Y 1029.14 18.04 X
0.0135 X2 (n 27, p 0.00, r2 0.914) Cubic
Y 1029.14 18.04 X 0.0135 X2 0.0000069
X3 (n 27, p 0.00, r2
0.919) Exponential Y a.e bx 2207.95 e
0.0013X or Ln Y Ln (2207.95) 0.0013 X (n
27, p 0.00, r2 0.652) Logarithmic Y a
b log X (n 27, p 0.00, r2 0.832) Y
-4179.0 1582.98 Log X
4
  • Model selection in regression analysis
  • Model selection principles
  • Select significant models only (i.e. F Sigf or p
    lt0.05)
  • If there are more than one significant models,
    select the ones with higher R2
  • If R2 values are very close, select the simplest
    model, which is easier to describe or justify
    based on the constant and the trend line
  • Linear gt Quadratic or Exponential or all others
  • Quadratic gt Cubic
  • Check the significance of all the coefficients of
    the selected model, formulate the equation,
    calculate the expected values and prepare a graph
  • Here we select quadratic model

5
Dependent variable.. NFY Method..
QUADRATI Multiple R .95616 R Square
.91424 Adjusted R Square
.90709 Standard Error 644.72240
Analysis of Variance DF Sum of
Squares Mean Square Regression 2
106346711.2 53173355.6 Residuals 24
9976007.3 415667.0 F 127.92297
Signif F .0000 --------------------
Variables in the Equation -------------------- Var
iable B SE B
Beta T Sig T CM
18.038453 1.566838 2.883755
11.513 .0000 CM2 -.013453
.001577 -2.136642 -8.530
.0000 (Constant) 1029.144461 229.359372
4.487 .0002
6
Presentation of the results
(Note all the statistical output should be in
appendix not in the main text of the Thesis or
report.)
n 27, p 0.000, r2 0.91 Y 1029.14 18.04
X 0.0135 X2
  • Results
  • About 1.0 ton of fish can be produced without
    chicken manure.
  • About 18 kg of fish/ha/year can be increased
    (Plt0.05) by adding 1 kg/ha/wk 25 kg/ha/year
    chicken manure up to 600 kg/ha/week
  • Use of excess chicken manure (gt600 kg/ha/wk)
    reduces the fish yield probably because high dry
    matter loading, .etc.
  • Maximum prod level (x) b/2c i.e. 18.04/ (2
    -0.0135) 668 kg

7
Multiple linear regression
  • In reality, dependent variables are affected by
    many independent variables simultaneously
    multiple regression analysis is necessary!
  • Example,
  • Fish growth is affected by pond fertilization (N
    P), feeding rate, temperature, DO, etc.
  • Model Y a b1X1 b2X2 ...........
    bnXn

8
Multiple linear regression
  • Stepwise regression method
  • Initial model identification
  • iteratively "stepping," or repeatedly altering
    the model at the previous step by adding or
    removing a predictor variable based on "stepping
    criteria
  • terminating the search when stepping is no longer
    possible given the stepping criteria, or when a
    specified maximum number of steps has been
    reached.
  • If ANOVA shows significant, that means at least
    one factor has significant effect but it does not
    point out which factors have significant effects
    therefore we have to see the table for
    coefficients for each factor
  • The best fitted or appropriate model is the one
    which includes all the factors whose coefficients
    are significant

9
Multiple linear regression
  • Analysis methods
  • Method 1 Forward selection method
  • Selects most important variables serially
  • Possible to identify/rank variables based on
    their importance as it finds quickly the most
    important variable and then followed by others
    serially
  • For example, if there are six variables from x1
    to x6
  • Forward selection method would show the following
    results
  • Model 1 Y a b2x2
  • Model 2 Y a b2x2 b1x1
  • Model 3 Y a b2x2 b1x1 b5x5
  • Variables X3, x4 and x6 were discarded as their
    coefficients had pgt0.05.
  • Final selected model is Model 3

10
Multiple linear regression
  • Method 2. Backward elimination method
  • Discards insignificant variables step-by-step
    keeping only significant ones at the final model
  • This method quickly finds out the least important
    factors easily and then followed by it.
  • But if you have too many variables, this method
    is cumbersome..use forward
  • For example, if there are six variables from x1
    to x6 Forward selection method would show the
    following results
  • Model 1 Y a b2x2 b1x1 b5x5 b3x3
    b4x4 b6x6
  • Model 2 Y a b2x2 b1x1 b5x5 b4x4
    b3x3
  • Model 3 Y a b2x2 b1x1 b5x5 b4x4
  • Model 4 Y a b2x2 b1x1 b5x5
  • Variables X3, x4 and x6 were discarded as their
    coefficients had pgt0.05.
  • Final model is Model 4.

11
Multiple regression (Practicum 10 Ex. 2)
Y - SO2 in air (?g/m3) X1 Temperature (ºF) X2
No. of enterprises (gt20 workers) X3 Population
(000) X4 wind speed (m/hr) X5
precipitation/rainfall (inch) X6 no. of rainy
day/year
Stepwise or forward selection method
v
Other factors keeping constant! (partial
correlation)
12
Multiple regression (Practicum 10 Ex. 2)
Y - SO2 in air (?g/m3) Factors X1, X2, X3, X4,
X5 and X6
Backward selection method
v
Other factors keeping constant! (partial
correlation)
13
Multiple regression
Forward selection or backward elimination method
Which method? If you expect there are only few
variables have significant effects then use
forward selection method If you expect only few
need to be discarded then back elimination method
is suitable
For example, if there are 100 of
variables/factors, If you think only 10 factors
will have effects then go from the front but if
you think 80 factors will have effects (i.e. only
20 factors need to be discarded then start from
back side.which you way you will reach faster?
Forward selection
Backward elimination
1, 2, 3, 1080.100
No. of
variables
14
Multiple regression
Model/Equation Y 83.963 1.823X1 0.02715X2
0.854X5
n 20, p 0.000, r2 0.793
  • Model description
  • The model/result showed that
  • Per unit increase in temp (X1) can decrease
    1.823 ?g SO2/m3
  • Increase of 1 enterprise (X2) can increase
    0.0275 ?g SO2/m3
  • Increase of 1 inch of rainfall/year can increase
    0.854 ?g SO2/m3

15
Multiple regression Prediction
Problem What would be the minimum and maximum
SO2 levels in a city where annual temperature
ranges from 45 to 75 º F, if there are 2000
enterprises and average annual precipitation is
50 inch? Solution For minimum temperature 45
º F Y 83.963 1.823X1 0.02715X2 0.854X5
83.963-1.823450.0271520000.85450 99 ?g
SO2/m3 For minimum temperature 75 º F Y
83.963 1.823X1 0.02715X2 0.854X5
83.963-1.823750.0271520000.85450 44 ?g
SO2/m3 The range 44-99 ?g SO2/m3
16
  • Correlation
  • Degree of association of two variables or how
    close they are
  • No dependent factor(s) or no cause and effect
    (both go together)
  • Can be positive or negative
  • Examples
  • Radius and perimeter of a circle (?)
  • Fish weight and length (condition factor?)
  • Fish survival and yield etc.
  • Height and weight etc. etc.

17
Correlation coefficient
? (X - X) . (?Y - Y) r v ?(X
X)2. ?(Y Y)2 Correlation
coefficient -1 ? r ? 1 While regression
coefficient ? ? b ? ?
18
- - - P A R T I A L C O R R E L A T I O N
C O E F F I C I E N T S - - - Controlling
for.. Y X1 X2
X3 X4 X5 X6 X1
1.0000 .2500 .2729 -.1677
.6968 -.2953 ( 0) (
17) ( 17) ( 17) ( 17)
( 17) P . P .302 P
.258 P .493 P .001 P .220 X2
.2500 1.0000 .9456 .2759
-.1219 -.3298 ( 17) (
0) ( 17) ( 17) ( 17)
( 17) P .302 P . P
.000 P .253 P .619 P .168 X3
.2729 .9456 1.0000 .2957
-.1140 -.3524 ( 17) (
17) ( 0) ( 17) ( 17)
( 17) P .258 P .000 P .
P .219 P .642 P .139 X4
-.1677 .2759 .2957 1.0000
-.1416 .2209 ( 17) (
17) ( 17) ( 0) ( 17)
( 17) P .493 P .253 P
.219 P . P .563 P .363 X5
.6968 -.1219 -.1140 -.1416
1.0000 .2681 ( 17) (
17) ( 17) ( 17) ( 0)
( 17) P .001 P .619 P
.642 P .563 P . P .267 X6
-.2953 -.3298 -.3524 .2209
.2681 1.0000 ( 17) (
17) ( 17) ( 17) ( 17)
( 0) P .220 P .168 P
.139 P .363 P .267 P . (Coefficient
/ (D.F.) / 2-tailed Significance) " . " is
printed if a coefficient cannot be computed
Partial correlation
  • Bi-variate

19
  • Advance topics
  • Data mining
  • - Large number of data data acquisition,
    exploratory analysis, model building and
    deployment
  • - Modeling
  • - Neural network

20
  • Non-parametric test - rank correlation
  • 1. Spearmans rank correlation
  • Bi-variate correlation
  • Spearmans rank correlation coefficient
  • rs 1 (6 ?d2) / (n3-n)
  • 2. Kendalls Rank Correlation or Kendalls
    Coefficient of Concordance
  • - Multivariate correlation

21
  • Spearmans rank correlation Example

H0 rs 0 Spearmans Rank cor. Coef.
(rs) 1-6?d2 / (n 3 n) 1-642/(123-12)
1-0.147 0.853 From table rs 0.05, 12
0.587 Therefore, Reject H0
22
  • 2. Kendalls Coefficient of Concordance

Total 234 5738.5
23
  • 2. Kendalls Coefficient of Concordance
  • H0 There is no association among the three
    variables
  • Here,
  • M 3, n 12, ?R 234, ?R2 5738.5
  • W ?R2 (?R)2/n / M2 (n3-n) / 12
  • 5738.5 (2342)/12 / 32 (123-12)
    1175.5/1287 0.913
  • ?2 MW (n-1) 30.913(12.-1) 30.13
  • ?2 0.05, 11 19.675 (From table), Therefore,
    Reject H0
  • gtThere is a significant (Plt0.05) association
    among the 3 variables.

24
  • Result from SPSS

25
No Lab session - Course is completed! Some
useful websites http//www.psychstat.smsu.edu/in
trobook/sbk27.htm
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com