Association Between Two Variables - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Association Between Two Variables

Description:

Question: Are pesticides less likely in organic food (p. 95) ... Omitted (Lurking) Variables. Scatterplot. Does a relationship exist? Is the relationship linear? ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 34
Provided by: chadsp
Category:

less

Transcript and Presenter's Notes

Title: Association Between Two Variables


1
Chapter 3
  • Association Between Two Variables

2
True Interest of Statistics
  • Examine relationship between two or more
    variables.
  • Does an association exist? Does the value of one
    variable relate to an other?

3
Terms
  • Dependent (or response) variable
  • Independent (or explanatory) variable

4
I) Two Categorical Variables
  • Describe relationship using Contingency Table
  • Each cell records the number of observations
    meeting criteria set by variables.

5
Example Contingency Table
  • Question Are pesticides less likely in organic
    food (p. 95)?
  • 1) Identify explanatory and response variables.
  • 2) Make a Table
  • What do cell values represent?
  • 3) Sum totals of each explanatory variable. WHY?

6
Example Contingency Table
  • 4) Conditional upon being a certain type of
    produce (Type X), what is the chance of finding
    pesticides?
  • 5) Conclude

7
Excel Example Contingency Table
8
Other Examples of Two Categorical Variables
  • Condition of Home vs. Home Ownership
  • Gender vs. Smoking
  • Others?

9
Identifying Explanatory and Response Variables
  • THINK of cause and effect Response variable is
    the effect
  • Direction of arrow
  • Income Political ID
  • Greenhouse Gas Global Temperature

10
But
  • CORRELATION (ASSOCIATION) DOES NOT IMPLY
    CAUSATION.
  • Reverse Causality (Feedback Loop)
  • Omitted Variables
  • This course Association/Correlation
  • Advanced course Causation
  • Be careful about language
  • Side note Time component helps est. causation

11
II) Comparing a Categorical and a Quantitative
Variable
  • Easier to analyze if the categorical variable is
    the explanatory variable.
  • Conditional upon being in group X, what are the
    characteristics of Y?
  • Gender and height
  • Race and income

12
Options for Comparing
  • 1) Compare summary statistics for two groups.
  • 2) Recode categorical data as zeros and ones,
    then draw a scatterplot.

13
Excel Example Ch3-1.xls
  • 1) Sort data
  • 2) Create table
  • 3) Since womens heights are in Column B, Rows
    2-263, type average(B2B263) in the table.
    For men, type average(B264 B382).

14
Excel Example Ch3-1.xls
  • 4) Standard deviation (of a sample) is
    stdev(B2B263). To find this formula
  • Excel 2007 Formulas ? More Functions ?
    Statistical ? stdev
  • ALL Excel Go to Function Wizard ? More Functions
    ? Standard Deviation, then select one of the
    possibilities.
  • 5) Coefficient of Variation?
  • Advantage of CV over s?

15
Excel Example Ch3-1.xls
  • 6) Scatterplot
  • i) Type if(A2Female,1,0) in Female
    Column. Why?
  • ii) Double click crosshair on lower-right corner
    of the cell. Why?
  • iii) Make Chart
  • Excel 2007 Highlight Female and Height Data, Go
    to Insert ? Chart ? Scatter
  • Old Excel Chart Wizard ? XY(Scatter) ? Finish
  • What is true of all observations at X 0 ?
  • What can we learn from this chart?

16
III) Comparing Two Quantitative Variables
  • Examples?

17
Options for Comparing
  • Correlation, r
  • A single number (numerical summary) describing
    the strength of a linear relationship between X
    and Y.
  • Range of values
  • Strong Positive Relationship r
  • Strong Negative Relationship r
  • Insensitive to which of the two variables is X
    and which is Y.
  • Unit-Free (Insensitive to units of measurement)

18
Options for Comparing
  • Correlation, r
  • Where ZX deviations from the mean

19
Options for Comparing
  • Correlation, r
  • Correlation does not imply causation
  • Reverse Causality
  • Omitted (Lurking) Variables
  • Scatterplot
  • Does a relationship exist?
  • Is the relationship linear?
  • Is the relationship positive or negative?

20
Options for Comparing
  • Scatterplot
  • Example

21
Options for Comparing
  • Scatterplot
  • Example r

22
Options for Comparing
  • Scatterplot
  • Example r

23
Options for Comparing
  • Scatterplot
  • Example r

24
Options for Comparing
  • Scatterplot
  • Example Most of the time, when X is above its
    mean, Y is above its mean, so ZX ZY gt 0 most of
    the time.
  • This implies that r gt 0
  • See Correlation by Eye applet for more
    correlation examples.

25
Options for Comparing
  • REGRESSION (Line of Best Fit)
  • Suppose we believe a linear relationship between
    X and Y exists such that
  • Y a bX error
  • We can develop statistics (a b) to estimate
    population parameters (a b) that predict Y
    values given particular X values.

26
Options for Comparing
  • REGRESSION (Line of Best Fit)

27
Excel Example Ch3-3.xls
  • 3rd Party Votes in Florida, 1996 vs. 2000.
  • What is X What is Y?
  • Why is this interesting?
  • 1) Correlation
  • i) correl(B2B68, C2C68)
  • ii) Tools ? Data Analysis ? Correlation

28
Excel Example Ch3-3.xls
  • 2) Scatterplot
  • i) Highlight columns
  • ii) Choose Insert ? Chart ? Scatter (or Chart
    Wizard ? XY(Scatter))
  • 3) Regression / Line of Best Fit (all XLS
    versions)
  • i) Right-Click scatterplot data
  • ii) Choose Add Trendline, Select Display
    equation on chart option.
  • iii) Interpretation of coefficients?
  • iv) Much more on this later

29
Regression Intuition
  • We observe Y, but develop statistics to predict Y
    for given X values.
  • The difference (Y Y) is called a residual
  • A regression line of best fit minimizes the sum
    of squared residuals. That is, it makes the
    errors associated with our predictions as small
    as possible.

30
Regression Intuition
  • What do residuals look like?
  • Least squares regressions find a b values that
    minimize the sum of squared residuals.

31
Regression Notes
  • Size of b says nothing about the strength of the
    statistical relationship. If you change the units
    of X, you will change the size of b, but not r.
  • Although b r (SY/SX), you can have r 1 but
    still find b to be meaningless.
  • A regression of Annual Wages on Number of
    Children for men earning 20,000 to 40,000
    reveals
  • Wage 29600 178 Children
  • What does this mean?

32
Regression Potential Problems
  • Extrapolation
  • Previous example What about men who earn more
    than 100,000 per year?
  • Time Series ESPNs Sports Figures episode
    starring Marion Jones. Let the Y-variable
    represent Ms. Joness fastest time in the 100m
    Dash. Let X Years Since 1989, the year she
    started competing. What will happen 120 years
    after her first race?

33
Regression Potential Problems
  • Correlation does not imply causation.
  • Reverse Causality
  • Omitted Variables
  • Book Example A study found that, over a three
    year period, the proportion of smokers who died
    was less than the proportion of non-smokers who
    died. Is smoking good for you?
Write a Comment
User Comments (0)
About PowerShow.com