Title: Discrete and Categorical Data
1Discrete and Categorical Data
- William N. Evans
- Department of Economics
- University of Maryland
2Introduction
- Workhorse statistical model in social sciences is
the multivariate regression model - Ordinary least squares (OLS)
- yi ß0 x1i ß1 x2i ß2 xki ßk ei
- yi xi ß ei
3Linear model yi ? ?xi ?i
- ? and ? are population values represent the
true relationship between x and y - Unfortunately these values are unknown
- The job of the researcher is to estimate these
values - Notice that if we differentiate y with respect to
x, we obtain - dy/dx ?
4- ? represents how much y will change for a fixed
change in x - Increase in income for more education
- Change in crime or bankruptcy when slots are
legalized - Increase in test score if you study more
5Put some concreteness on the problem
- State of Maryland budget problems
- Drop in revenues
- Expensive k-12 school spending initiatives
- Short-term solution raise tax on cigarettes by
34 cents/pack - Problem a tax hike will reduce consumption of
taxable product - Question for state as taxes are raised, how
much will cigarette consumption fall?
6- Simple model yi ? ?xi ?i
- Suppose y is a states per capita consumption of
cigarettes - x represents taxes on cigarettes
- Question how much will y fall if x is increased
by 34 cents/pack? - Problem many reasons why people smoke cost is
but one of them
7- Data
- (Y) State per capita cigarette consumption for
the years 1980-1997 - (X) tax (State Federal) in real cents per pack
- Scatter plot of the data
- Negative covariance between variables
- When xgt?, more likely that ylt?
- When xlt?, more likely that ygt?
- Goal pick values of ? and ? that best fit the
data - Define best fit in a moment
8Notation
- True model
- yi ? ?xi ?i
- We observe data points (yi,xi)
- The parameters ? and ? are unknown
- The actual error (?i) is unknown
- Estimated model
- (a,b) are estimates for the parameters (?,?)
- ei is an estimate of ?i where
- eiyi-a-bxi
- How do you estimate a and b?
9Objective Minimize sum of squared errors
- Min ?iei2 ?i(yi a bxi)2
- Minimize the sum of squared errors (SSE)
- Treat positive and negative errors equally
- Over or under predict by 5 is the same
magnitude of error - Quadratic form
- The optimal value for a and b are those that make
the 1st derivative equal zero - Functions reach min or max values when
derivatives are zero
10(No Transcript)
11(No Transcript)
12- The model has a lot of nice features
- Statistical properties easy to establish
- Optimal estimates easy to obtain
- Parameter estimates are easy to interpret
- Model maximizes prediction
- If you minimize SSE you maximize R2
- The model does well as a first order
approximation to lots of problems
13Discrete and Qualitative Data
- The OLS model work well when y is a continuous
variable - Income, wages, test scores, weight, GDP
- Does not has as many nice properties when y is
not continuous - Example doctor visits
- Integer values
- Low counts for most people
- Mass of observations at zero
14Downside of forcing non-standard outcomes into
OLS world?
- Can predict outside the allowable range
- e.g., negative MD visits
- Does not describe the data generating process
well - e.g., mass of observations at zero
- Violates many properties of OLS
- e.g. heteroskedasticity
15This talk
- Look at situations when the data generating
process does lend itself well to OLS models - Mathematically describe the data generating
process - Show how we use different optimization procedure
to obtain estimates - Describe the statistical properties
16- Show how to interpret parameters
- Illustrate how to estimate the models with
popular program STATA
17Types of data generating processes we will
consider
- Dichotomous events (yes or no)
- 1yes, 0no
- Graduate high school? work? Are obese? Smoke?
- Ordinal data
- Self reported health (fair, poor, good, excel)
- Strongly disagree, disagree, agree, strongly
agree
18- Count data
- Doctor visits, lost workdays, fatality counts
- Duration data
- Time to failure, time to death, time to
re-employment
19Econometric Resources
- Recommended textbook
- Jeffrey Wooldridge, undergraduate and grad
- Lots of insight and mathematical/statistical
detail - Very good examples
- Helpful web sites
- My graduate class
- Jeff Smiths class
20STATA
- Very fast, convenient, well-documented, cheap and
flexible statistical package - Excellent for cross-section/panel data projects,
not as great for time series - Not as easy to manipulate large data sets from
flat files as SAS - I usually clean data in SAS, estimate models in
STATA
21STATA Resources - Specific
- Regression Models for Categorical Dependent
Variables Using STATA - J. Scott Long and Jeremy Freese
- Available for sale from STATA website for 52
(www.stata.com) - Post-estimation subroutines that translate
results - Do not need to buy the book to use the subroutines
22- In STATA command line type
- net search spost
- Will give you a list of available programs to
download - One is
- Spostado from http//www.indiana.edu/jslsoc/stat
a - Click on the link and install the files
23Continuous Distributions
- Random variables with infinite number of possible
values - Examples -- units of measure (time, weight,
distance) - Many discrete outcomes can be treated as
continuous, e.g., SAT scores
24How to describe a continuous random variable
- The Probability Density Function (PDF)
- The PDF for a random variable x is defined as
f(x), where - f(x) 0
- If(x)dx 1
- Calculus review The integral of a function
gives the area under the curve
25(No Transcript)
26Cumulative Distribution Function (CDF)
- Suppose x is a measure like distance or time
- 0 x 4
- We may be interested in the Pr(xa) ?
27CDF
- What if we consider all values?
28Properties of CDF
- Note that Pr(x b) Pr(xgtb) 1
- Pr(xgtb) 1 Pr(x b)
- Many times, it is easier to work with compliments
29General notation for continuous distributions
- The PDF is described by lower case such as f(x)
- The CDF is defined as upper case such as F(a)
30Standard Normal Distribution
- Most frequently used continuous distribution
- Symmetric bell-shaped distribution
- As we will show, the normal has useful properties
- Many variables we observe in the real world look
normally distributed. - Can translate normal into standard normal
31Examples of variables that look normally
distributed
- IQ scores
- SAT scores
- Heights of females
- Log income
- Average gestation (weeks of pregnancy)
- As we will show in a few weeks sample means are
normally distributed!!!
32Standard Normal Distribution
33Notation
- ?(z) is the standard normal PDF evaluated at z
- ?a Pr(z ? a)
34(No Transcript)
35Standard Normal
- Notice that
- Normal is symmetric ?(a) ?(-a)
- Normal is unimodal
- Medianmean
- Area under curve1
- Almost all area is between (-3,3)
- Evaluations of the CDF are done with
- Statistical functions (excel, SAS, etc)
- Tables
36Standard Normal CDF
- Pr(z ? -0.98) ?-0.98 0.1635
37(No Transcript)
38- Pr(z ? 1.41) ?1.41 0.9207
39(No Transcript)
40- Pr(xgt1.17) 1 Pr(z ? 1.17) 1- ?1.17
- 1 0.8790 0.1210
41(No Transcript)
42- Pr(0.1 ? z ? 1.9)
- Pr(z ? 1.9) Pr(z ? 0.1)
- M(1.9) - M(0.1) 0.9713 - 0.5398
- 0.4315
43(No Transcript)
44(No Transcript)
45(No Transcript)
46Important Properties of Normal Distribution
- Pr(z ? A) ?A
- Pr(z gt A) 1 - ?A
- Pr(z ? - A) ?-A
- Pr(z gt -A) 1 - ?-A ?A
47Maximum likelihood estimation
- Observe n independent outcomes, all drawn from
the same distribution - (y1, y2, y3.yn)
- yi is drawn from f(yi ?) where ? is an unknown
parameter for the distribution f - Recall definition of indepedence. If a and b and
independent, Prob(a and b) Pr(a)Pr(B)
48- Because all the draws are independent, the
probability these particular n values of Y would
be drawn at random is called the likelihood
function and it equals - L Pr(y1)Pr(y2)Pr(yn)
- L f(y1 ?)f(y2 ?)..f(y3 ?)
49- MLE pick a value for ? that best represents the
chance these n values of y would have been
generated randomly - To maximize L, maximize a monotonic function of L
- Recall ln(abcd)ln(a)ln(b)ln(c)ln(d)
50- Max L ln(L) lnf(y1 ?) lnf(y2 ?)
- .. lnf(yn ?) Si lnf(yi ?)
- Pick ? so that L is maximized
- dL/d? 0
51L
?
?1
?2
52Example Poisson
- Suppose y measures counts such as doctor
visits. - yi is drawn from a Poisson distribution
- f(yi?) e-? ?yi/yi! For ?gt0
- Eyi Varyi ?
53- Given n observations, (y1, y2, y3.yn)
- Pick value of ? that maximizes L
- Max L Si lnf(yi ?) Si lne-? ?yi/yi!
- Si ? yiln(?) ln(yi!)
-
- -n ? ln(?) Si yi Si ln(yi!)
54- L -n ? ln(?) Si yi Si ln(yi!)
- dL/d? -n (1/ ? )Si yi 0
- Solve for ?
- ? Si yi /n ? sample mean of y
55- In most cases however, cannot find a closed
form solution for the parameter in lnf(yi ?) - Must search over all possible solutions
- How does the search work?
- Start with candidate value of ?.
- Calculate dL/d?
56- If dL/d? gt 0, increasing ? will increase L so we
increase ? some - If dL/d? lt 0, decreasing ? will increase L so we
decrease ? some - Keep changing ? until dL/d? 0
- How far you step when you change ? is
determined by a number of different factors
57L
dL/d? gt 0
?
?1
58L
dL/d? lt 0
?
?3
59Properties of MLE estimates
- Sometimes call efficient estimation. Can never
generate a smaller variance than one obtained by
MLE - Parameters estimates are distributed as a normal
distribution when samples sizes are large