Discrete and Categorical Data

About This Presentation
Title:

Discrete and Categorical Data

Description:

Objective: Minimize sum of squared errors. Min iei2 = i(yi a bxi)2. Minimize the sum of squared errors (SSE) Treat ... If you minimize SSE you maximize R2 ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 60
Provided by: mprc8
Learn more at: https://www3.nd.edu

less

Transcript and Presenter's Notes

Title: Discrete and Categorical Data


1
Discrete and Categorical Data
  • William N. Evans
  • Department of Economics
  • University of Maryland

2
Introduction
  • Workhorse statistical model in social sciences is
    the multivariate regression model
  • Ordinary least squares (OLS)
  • yi ß0 x1i ß1 x2i ß2 xki ßk ei
  • yi xi ß ei

3
Linear model yi ? ?xi ?i
  • ? and ? are population values represent the
    true relationship between x and y
  • Unfortunately these values are unknown
  • The job of the researcher is to estimate these
    values
  • Notice that if we differentiate y with respect to
    x, we obtain
  • dy/dx ?

4
  • ? represents how much y will change for a fixed
    change in x
  • Increase in income for more education
  • Change in crime or bankruptcy when slots are
    legalized
  • Increase in test score if you study more

5
Put some concreteness on the problem
  • State of Maryland budget problems
  • Drop in revenues
  • Expensive k-12 school spending initiatives
  • Short-term solution raise tax on cigarettes by
    34 cents/pack
  • Problem a tax hike will reduce consumption of
    taxable product
  • Question for state as taxes are raised, how
    much will cigarette consumption fall?

6
  • Simple model yi ? ?xi ?i
  • Suppose y is a states per capita consumption of
    cigarettes
  • x represents taxes on cigarettes
  • Question how much will y fall if x is increased
    by 34 cents/pack?
  • Problem many reasons why people smoke cost is
    but one of them

7
  • Data
  • (Y) State per capita cigarette consumption for
    the years 1980-1997
  • (X) tax (State Federal) in real cents per pack
  • Scatter plot of the data
  • Negative covariance between variables
  • When xgt?, more likely that ylt?
  • When xlt?, more likely that ygt?
  • Goal pick values of ? and ? that best fit the
    data
  • Define best fit in a moment

8
Notation
  • True model
  • yi ? ?xi ?i
  • We observe data points (yi,xi)
  • The parameters ? and ? are unknown
  • The actual error (?i) is unknown
  • Estimated model
  • (a,b) are estimates for the parameters (?,?)
  • ei is an estimate of ?i where
  • eiyi-a-bxi
  • How do you estimate a and b?

9
Objective Minimize sum of squared errors
  • Min ?iei2 ?i(yi a bxi)2
  • Minimize the sum of squared errors (SSE)
  • Treat positive and negative errors equally
  • Over or under predict by 5 is the same
    magnitude of error
  • Quadratic form
  • The optimal value for a and b are those that make
    the 1st derivative equal zero
  • Functions reach min or max values when
    derivatives are zero

10
(No Transcript)
11
(No Transcript)
12
  • The model has a lot of nice features
  • Statistical properties easy to establish
  • Optimal estimates easy to obtain
  • Parameter estimates are easy to interpret
  • Model maximizes prediction
  • If you minimize SSE you maximize R2
  • The model does well as a first order
    approximation to lots of problems

13
Discrete and Qualitative Data
  • The OLS model work well when y is a continuous
    variable
  • Income, wages, test scores, weight, GDP
  • Does not has as many nice properties when y is
    not continuous
  • Example doctor visits
  • Integer values
  • Low counts for most people
  • Mass of observations at zero

14
Downside of forcing non-standard outcomes into
OLS world?
  • Can predict outside the allowable range
  • e.g., negative MD visits
  • Does not describe the data generating process
    well
  • e.g., mass of observations at zero
  • Violates many properties of OLS
  • e.g. heteroskedasticity

15
This talk
  • Look at situations when the data generating
    process does lend itself well to OLS models
  • Mathematically describe the data generating
    process
  • Show how we use different optimization procedure
    to obtain estimates
  • Describe the statistical properties

16
  • Show how to interpret parameters
  • Illustrate how to estimate the models with
    popular program STATA

17
Types of data generating processes we will
consider
  • Dichotomous events (yes or no)
  • 1yes, 0no
  • Graduate high school? work? Are obese? Smoke?
  • Ordinal data
  • Self reported health (fair, poor, good, excel)
  • Strongly disagree, disagree, agree, strongly
    agree

18
  • Count data
  • Doctor visits, lost workdays, fatality counts
  • Duration data
  • Time to failure, time to death, time to
    re-employment

19
Econometric Resources
  • Recommended textbook
  • Jeffrey Wooldridge, undergraduate and grad
  • Lots of insight and mathematical/statistical
    detail
  • Very good examples
  • Helpful web sites
  • My graduate class
  • Jeff Smiths class

20
STATA
  • Very fast, convenient, well-documented, cheap and
    flexible statistical package
  • Excellent for cross-section/panel data projects,
    not as great for time series
  • Not as easy to manipulate large data sets from
    flat files as SAS
  • I usually clean data in SAS, estimate models in
    STATA

21
STATA Resources - Specific
  • Regression Models for Categorical Dependent
    Variables Using STATA
  • J. Scott Long and Jeremy Freese
  • Available for sale from STATA website for 52
    (www.stata.com)
  • Post-estimation subroutines that translate
    results
  • Do not need to buy the book to use the subroutines

22
  • In STATA command line type
  • net search spost
  • Will give you a list of available programs to
    download
  • One is
  • Spostado from http//www.indiana.edu/jslsoc/stat
    a
  • Click on the link and install the files

23
Continuous Distributions
  • Random variables with infinite number of possible
    values
  • Examples -- units of measure (time, weight,
    distance)
  • Many discrete outcomes can be treated as
    continuous, e.g., SAT scores

24
How to describe a continuous random variable
  • The Probability Density Function (PDF)
  • The PDF for a random variable x is defined as
    f(x), where
  • f(x) 0
  • If(x)dx 1
  • Calculus review The integral of a function
    gives the area under the curve

25
(No Transcript)
26
Cumulative Distribution Function (CDF)
  • Suppose x is a measure like distance or time
  • 0 x 4
  • We may be interested in the Pr(xa) ?

27
CDF
  • What if we consider all values?

28
Properties of CDF
  • Note that Pr(x b) Pr(xgtb) 1
  • Pr(xgtb) 1 Pr(x b)
  • Many times, it is easier to work with compliments

29
General notation for continuous distributions
  • The PDF is described by lower case such as f(x)
  • The CDF is defined as upper case such as F(a)

30
Standard Normal Distribution
  • Most frequently used continuous distribution
  • Symmetric bell-shaped distribution
  • As we will show, the normal has useful properties
  • Many variables we observe in the real world look
    normally distributed.
  • Can translate normal into standard normal

31
Examples of variables that look normally
distributed
  • IQ scores
  • SAT scores
  • Heights of females
  • Log income
  • Average gestation (weeks of pregnancy)
  • As we will show in a few weeks sample means are
    normally distributed!!!

32
Standard Normal Distribution
  • PDF
  • For - ? z ?

33
Notation
  • ?(z) is the standard normal PDF evaluated at z
  • ?a Pr(z ? a)

34
(No Transcript)
35
Standard Normal
  • Notice that
  • Normal is symmetric ?(a) ?(-a)
  • Normal is unimodal
  • Medianmean
  • Area under curve1
  • Almost all area is between (-3,3)
  • Evaluations of the CDF are done with
  • Statistical functions (excel, SAS, etc)
  • Tables

36
Standard Normal CDF
  • Pr(z ? -0.98) ?-0.98 0.1635

37
(No Transcript)
38
  • Pr(z ? 1.41) ?1.41 0.9207

39
(No Transcript)
40
  • Pr(xgt1.17) 1 Pr(z ? 1.17) 1- ?1.17
  • 1 0.8790 0.1210

41
(No Transcript)
42
  • Pr(0.1 ? z ? 1.9)
  • Pr(z ? 1.9) Pr(z ? 0.1)
  • M(1.9) - M(0.1) 0.9713 - 0.5398
  • 0.4315

43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Important Properties of Normal Distribution
  • Pr(z ? A) ?A
  • Pr(z gt A) 1 - ?A
  • Pr(z ? - A) ?-A
  • Pr(z gt -A) 1 - ?-A ?A

47
Maximum likelihood estimation
  • Observe n independent outcomes, all drawn from
    the same distribution
  • (y1, y2, y3.yn)
  • yi is drawn from f(yi ?) where ? is an unknown
    parameter for the distribution f
  • Recall definition of indepedence. If a and b and
    independent, Prob(a and b) Pr(a)Pr(B)

48
  • Because all the draws are independent, the
    probability these particular n values of Y would
    be drawn at random is called the likelihood
    function and it equals
  • L Pr(y1)Pr(y2)Pr(yn)
  • L f(y1 ?)f(y2 ?)..f(y3 ?)

49
  • MLE pick a value for ? that best represents the
    chance these n values of y would have been
    generated randomly
  • To maximize L, maximize a monotonic function of L
  • Recall ln(abcd)ln(a)ln(b)ln(c)ln(d)

50
  • Max L ln(L) lnf(y1 ?) lnf(y2 ?)
  • .. lnf(yn ?) Si lnf(yi ?)
  • Pick ? so that L is maximized
  • dL/d? 0

51
L
?
?1
?2
52
Example Poisson
  • Suppose y measures counts such as doctor
    visits.
  • yi is drawn from a Poisson distribution
  • f(yi?) e-? ?yi/yi! For ?gt0
  • Eyi Varyi ?

53
  • Given n observations, (y1, y2, y3.yn)
  • Pick value of ? that maximizes L
  • Max L Si lnf(yi ?) Si lne-? ?yi/yi!
  • Si ? yiln(?) ln(yi!)
  • -n ? ln(?) Si yi Si ln(yi!)

54
  • L -n ? ln(?) Si yi Si ln(yi!)
  • dL/d? -n (1/ ? )Si yi 0
  • Solve for ?
  • ? Si yi /n ? sample mean of y

55
  • In most cases however, cannot find a closed
    form solution for the parameter in lnf(yi ?)
  • Must search over all possible solutions
  • How does the search work?
  • Start with candidate value of ?.
  • Calculate dL/d?

56
  • If dL/d? gt 0, increasing ? will increase L so we
    increase ? some
  • If dL/d? lt 0, decreasing ? will increase L so we
    decrease ? some
  • Keep changing ? until dL/d? 0
  • How far you step when you change ? is
    determined by a number of different factors

57
L
dL/d? gt 0
?
?1
58
L
dL/d? lt 0
?
?3
59
Properties of MLE estimates
  • Sometimes call efficient estimation. Can never
    generate a smaller variance than one obtained by
    MLE
  • Parameters estimates are distributed as a normal
    distribution when samples sizes are large
Write a Comment
User Comments (0)