Discrete and Categorical Data

About This Presentation

Title:

Discrete and Categorical Data

Description:

Objective: Minimize sum of squared errors. Min iei2 = i(yi a bxi)2. Minimize the sum of squared errors (SSE) Treat ... If you minimize SSE you maximize R2 ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 60

Provided by: mprc8

Learn more at: https://www3.nd.edu

more less

Transcript and Presenter's Notes

Title: Discrete and Categorical Data

1
Discrete and Categorical Data

William N. Evans
Department of Economics
University of Maryland

2
Introduction

Workhorse statistical model in social sciences is
the multivariate regression model
Ordinary least squares (OLS)
yi ß0 x1i ß1 x2i ß2 xki ßk ei
yi xi ß ei

3
Linear model yi ? ?xi ?i

? and ? are population values represent the
true relationship between x and y
Unfortunately these values are unknown
The job of the researcher is to estimate these
values
Notice that if we differentiate y with respect to
x, we obtain
dy/dx ?

? represents how much y will change for a fixed
change in x
Increase in income for more education
Change in crime or bankruptcy when slots are
legalized
Increase in test score if you study more

5
Put some concreteness on the problem

State of Maryland budget problems
Drop in revenues
Expensive k-12 school spending initiatives
Short-term solution raise tax on cigarettes by
34 cents/pack
Problem a tax hike will reduce consumption of
taxable product
Question for state as taxes are raised, how
much will cigarette consumption fall?

Simple model yi ? ?xi ?i
Suppose y is a states per capita consumption of
cigarettes
x represents taxes on cigarettes
Question how much will y fall if x is increased
by 34 cents/pack?
Problem many reasons why people smoke cost is
but one of them

Data
(Y) State per capita cigarette consumption for
the years 1980-1997
(X) tax (State Federal) in real cents per pack
Scatter plot of the data
Negative covariance between variables
When xgt?, more likely that ylt?
When xlt?, more likely that ygt?
Goal pick values of ? and ? that best fit the
data
Define best fit in a moment

8
Notation

True model
yi ? ?xi ?i
We observe data points (yi,xi)
The parameters ? and ? are unknown
The actual error (?i) is unknown
Estimated model
(a,b) are estimates for the parameters (?,?)
ei is an estimate of ?i where
eiyi-a-bxi
How do you estimate a and b?

9
Objective Minimize sum of squared errors

Min ?iei2 ?i(yi a bxi)2
Minimize the sum of squared errors (SSE)
Treat positive and negative errors equally
Over or under predict by 5 is the same
magnitude of error
Quadratic form
The optimal value for a and b are those that make
the 1st derivative equal zero
Functions reach min or max values when
derivatives are zero

10
(No Transcript)
11
(No Transcript)
12

The model has a lot of nice features
Statistical properties easy to establish
Optimal estimates easy to obtain
Parameter estimates are easy to interpret
Model maximizes prediction
If you minimize SSE you maximize R2
The model does well as a first order
approximation to lots of problems

13
Discrete and Qualitative Data

The OLS model work well when y is a continuous
variable
Income, wages, test scores, weight, GDP
Does not has as many nice properties when y is
not continuous
Example doctor visits
Integer values
Low counts for most people
Mass of observations at zero

14
Downside of forcing non-standard outcomes into
OLS world?

Can predict outside the allowable range
e.g., negative MD visits
Does not describe the data generating process
well
e.g., mass of observations at zero
Violates many properties of OLS
e.g. heteroskedasticity

15
This talk

Look at situations when the data generating
process does lend itself well to OLS models
Mathematically describe the data generating
process
Show how we use different optimization procedure
to obtain estimates
Describe the statistical properties

Show how to interpret parameters
Illustrate how to estimate the models with
popular program STATA

17
Types of data generating processes we will
consider

Dichotomous events (yes or no)
1yes, 0no
Graduate high school? work? Are obese? Smoke?
Ordinal data
Self reported health (fair, poor, good, excel)
Strongly disagree, disagree, agree, strongly
agree

Count data
Doctor visits, lost workdays, fatality counts
Duration data
Time to failure, time to death, time to
re-employment

19
Econometric Resources

Recommended textbook
Jeffrey Wooldridge, undergraduate and grad
Lots of insight and mathematical/statistical
detail
Very good examples
Helpful web sites
My graduate class
Jeff Smiths class

20
STATA

Very fast, convenient, well-documented, cheap and
flexible statistical package
Excellent for cross-section/panel data projects,
not as great for time series
Not as easy to manipulate large data sets from
flat files as SAS
I usually clean data in SAS, estimate models in
STATA

21
STATA Resources - Specific

Regression Models for Categorical Dependent
Variables Using STATA
J. Scott Long and Jeremy Freese
Available for sale from STATA website for 52
(www.stata.com)
Post-estimation subroutines that translate
results
Do not need to buy the book to use the subroutines

In STATA command line type
net search spost
Will give you a list of available programs to
download
One is
Spostado from http//www.indiana.edu/jslsoc/stat
a
Click on the link and install the files

23
Continuous Distributions

Random variables with infinite number of possible
values
Examples -- units of measure (time, weight,
distance)
Many discrete outcomes can be treated as
continuous, e.g., SAT scores

24
How to describe a continuous random variable

The Probability Density Function (PDF)
The PDF for a random variable x is defined as
f(x), where
f(x) 0
If(x)dx 1
Calculus review The integral of a function
gives the area under the curve

25
(No Transcript)
26
Cumulative Distribution Function (CDF)

Suppose x is a measure like distance or time
0 x 4
We may be interested in the Pr(xa) ?

27
CDF

What if we consider all values?

28
Properties of CDF

Note that Pr(x b) Pr(xgtb) 1
Pr(xgtb) 1 Pr(x b)
Many times, it is easier to work with compliments

29
General notation for continuous distributions

The PDF is described by lower case such as f(x)
The CDF is defined as upper case such as F(a)

30
Standard Normal Distribution

Most frequently used continuous distribution
Symmetric bell-shaped distribution
As we will show, the normal has useful properties
Many variables we observe in the real world look
normally distributed.
Can translate normal into standard normal

31
Examples of variables that look normally
distributed

IQ scores
SAT scores
Heights of females
Log income
Average gestation (weeks of pregnancy)
As we will show in a few weeks sample means are
normally distributed!!!

32
Standard Normal Distribution

PDF
For - ? z ?

33
Notation

?(z) is the standard normal PDF evaluated at z
?a Pr(z ? a)

34
(No Transcript)
35
Standard Normal

Notice that
Normal is symmetric ?(a) ?(-a)
Normal is unimodal
Medianmean
Area under curve1
Almost all area is between (-3,3)
Evaluations of the CDF are done with
Statistical functions (excel, SAS, etc)
Tables

36
Standard Normal CDF

Pr(z ? -0.98) ?-0.98 0.1635

37
(No Transcript)
38

Pr(z ? 1.41) ?1.41 0.9207

39
(No Transcript)
40

Pr(xgt1.17) 1 Pr(z ? 1.17) 1- ?1.17
1 0.8790 0.1210

41
(No Transcript)
42

Pr(0.1 ? z ? 1.9)
Pr(z ? 1.9) Pr(z ? 0.1)
M(1.9) - M(0.1) 0.9713 - 0.5398
0.4315

43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Important Properties of Normal Distribution

Pr(z ? A) ?A
Pr(z gt A) 1 - ?A
Pr(z ? - A) ?-A
Pr(z gt -A) 1 - ?-A ?A

47
Maximum likelihood estimation

Observe n independent outcomes, all drawn from
the same distribution
(y1, y2, y3.yn)
yi is drawn from f(yi ?) where ? is an unknown
parameter for the distribution f
Recall definition of indepedence. If a and b and
independent, Prob(a and b) Pr(a)Pr(B)

Because all the draws are independent, the
probability these particular n values of Y would
be drawn at random is called the likelihood
function and it equals
L Pr(y1)Pr(y2)Pr(yn)
L f(y1 ?)f(y2 ?)..f(y3 ?)

MLE pick a value for ? that best represents the
chance these n values of y would have been
generated randomly
To maximize L, maximize a monotonic function of L
Recall ln(abcd)ln(a)ln(b)ln(c)ln(d)

Max L ln(L) lnf(y1 ?) lnf(y2 ?)
.. lnf(yn ?) Si lnf(yi ?)
Pick ? so that L is maximized
dL/d? 0

51
L
?
?1
?2
52
Example Poisson

Suppose y measures counts such as doctor
visits.
yi is drawn from a Poisson distribution
f(yi?) e-? ?yi/yi! For ?gt0
Eyi Varyi ?

Given n observations, (y1, y2, y3.yn)
Pick value of ? that maximizes L
Max L Si lnf(yi ?) Si lne-? ?yi/yi!
Si ? yiln(?) ln(yi!)
-n ? ln(?) Si yi Si ln(yi!)

L -n ? ln(?) Si yi Si ln(yi!)
dL/d? -n (1/ ? )Si yi 0
Solve for ?
? Si yi /n ? sample mean of y

In most cases however, cannot find a closed
form solution for the parameter in lnf(yi ?)
Must search over all possible solutions
How does the search work?
Start with candidate value of ?.
Calculate dL/d?

If dL/d? gt 0, increasing ? will increase L so we
increase ? some
If dL/d? lt 0, decreasing ? will increase L so we
decrease ? some
Keep changing ? until dL/d? 0
How far you step when you change ? is
determined by a number of different factors

57
L
dL/d? gt 0
?
?1
58
L
dL/d? lt 0
?
?3
59
Properties of MLE estimates

Sometimes call efficient estimation. Can never
generate a smaller variance than one obtained by
MLE
Parameters estimates are distributed as a normal
distribution when samples sizes are large

Write a Comment

User Comments (0)