PS601 Quantitative Methods - PowerPoint PPT Presentation

1 / 103
About This Presentation
Title:

PS601 Quantitative Methods

Description:

To be turned in a single Web page. Built in stages Keep up. Instruction provided ... A bunch of numbers that measure a characteristic for a group of cases. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 104
Provided by: polsc
Category:

less

Transcript and Presenter's Notes

Title: PS601 Quantitative Methods


1
(No Transcript)
2
PS601 Quantitative Methods
  • Dr. Robert D. Duval
  • Course Introduction
  • Presentation Notes and Slides
  • Version of January 15, 2008

3
Overview of Course
  • Syllabus
  • Texts
  • Grading
  • Assignments
  • To be turned in a single Web page
  • Built in stages Keep up.
  • Instruction provided
  • Software
  • Excel
  • Stata
  • NVu

4
Prerequisites
  • An fundamental understanding of calculus
  • An informal but intuitive understanding of the
    mathematics of Probability
  • A sense of humor

5
Statistics is an innate cognitive skill
  • We all possess the ability to do rudimentary
    statistical analysis
  • in our heads
  • intuitively.
  • The cognitive machinery for stats is built in to
    us, just like it is for calculus.
  • This is part of how we process information about
    the world
  • It is not simply mysterious arcane jargon
  • It is simply the mysterious arcane way you
    already think

6
The First Two Weeks
  • Review and Setting
  • The Logic of Research
  • Logic
  • Microcomputers
  • Statistics

7
Overview of Statistics
  • Descriptive Statistics
  • Frequency Distributions
  • Probability
  • Statistical Inference
  • Statistical tests
  • Contingency Tables
  • Regression Analysis

8
The Logic of Research
  • A quick review of the research process
  • Theory
  • Hypothesis
  • Observation
  • Analysis

9
Sample Theories
  • IR - Balance of Power
  • Wars erupt when there are shifts in the balance
    of power
  • Domestic Policy
  • The crime rate is affected by the economy
  • Democratic Peace
  • Nations with democratic regimes engage in war
    less than authoritarian regimes.

10
Examples of Research
  • Sewage treatment plants
  • Energy conservation goals
  • Converting Coal fired Powerm plants to oil
  • Air Quality
  • Sex discrimination in raises

11
(No Transcript)
12
  • Theory
  • many are normatively driven.

13
Theory Hypothesis
14
Theory Hypothesis Observatio
n
15
Theory Analysis Hypothesis
Observation
16
Theory Analysis Hypothesis
Observation
17
Theory Deduction Analysis Hypoth
esis Induction Operationalization
Observation
Confirmation/ rejection
18
Statistics
A Philosophical Overview
  • Methods as Theory
  • Doesnt see statistics as a tool
  • It is the embodiment of the ideas we express
  • Methods as Language
  • We articulate an implicit structure when we
    ascribe causation or systematic patterns.
  • This structure may be
  • logical hence mathematical
  • Relational equality/inequality
  • Algebraic or geometric

19
Principle organizing concepts
  • The Nature of the Problem
  • What are we asking?
  • Measurement
  • How do we have awareness of the phenomenon
  • Standards for comparison
  • How can we infer if what we are seeing is what we
    expect to see?

20
Mathematical notation
  • Important mathematical notation the student needs
    to know.
  • Summation
  • For instance, the sum of all Xi from i1 to n
    means beginning with the first number in your
    data set, add together all n numbers.
  • The ? is a symbolic representation of the process
    of adding up a specified series or collection of
    numbers

21
Mathematical notation (cont.)
  • Square Roots and Exponents
  • e - the base of natural logarithms
  • Exponential and Logarithmic Equations

22
The Base of Natural Logarithms
  • Where does e come from?
  • e is the base of natural logarithms
  • Invented by John Napier in 1618 as a concept,
    but actually calculated/derived by Jacob
    Bernoulli
  • It is the number such that the derivative of ax
    equals 1.0
  • It is derived from

23
Compound Interest and e
  • If 1.00 is put in the bank at 100 interest,
    compounded annually, its future value is 2.00.
  • What about compounded semi-annually?

24
Demystifying e (sort of)
  • So how does this translate to real life?
  • Compound interest
  • Where
  • PV Present Value (amount deposited)
  • FV future value (amount accrued)
  • i interest rate (e.g .06 for 6 interest)
  • k number of periods/year
  • n number of years

25
Levels of Measurement
  • Nominal
  • Dichotomous (two values)
  • Ordinal
  • Interval
  • Ratio many times handled like interval
  • For instance Levels of Measurement

26
Nominal Measurement
  • Nominal variables are those which can be named,
    but not quantified
  • Religion (Protestant, Catholic, Hebrew, Buddhist,
    etc)
  • Race (Caucasian, African-American, Hispanic,
    Asian, etc)
  • Linguistic Group
  • Marital Status (Married, Single, Divorced)

27
Ordinal Measurement
  • With ordinal variables, there is a rough
    quantitative sense to their measurement, but the
    differences between scores are not necessarily
    equal.
  • They are thus in order, but not fixed

28
Examples of Ordinal Measures
  • Rankings (1st, 2nd, 3rd, etc)
  • Grades (A, B, C, D, F)
  • Education (High School, College, Adv degree)
  • Evaluations
  • Hi, Medium, Low
  • Likert Scales
  • 5 pt (Strongly Agree, Agree, Neither Agree nor
    Disagree, Disagree, Strongly Disagree)
  • 7 pt liberalism scale (Strongly Liberal, Liberal,
    Weakly Liberal, Moderate, Weakly Conservative,
    Conservative, Strongly Conservative)

29
Interval Measurement
  • Variables or measurements where the difference
    between values is measured by a fixed scale.
  • Money
  • People
  • Education (in years)
  • Age
  • Constructed Scales

30
Dichotomous Measurement
  • Variables that only have two values.
  • May be treated as nominal, however, sometimes an
    ordinal quality may exist.
  • Gender - male, female
  • Race - black, white
  • Agreement - yes, no
  • T/F - true, false
  • Value - high, low
  • and others less easy to name
  • war, no war
  • vote, no vote

31
Ratio Measurement
  • Ratio Variables have fixed zero points.
  • The percentage is a ratio variable
  • part/whole\
  • Feeling Thermometers
  • We treat ratio and interval variables the same
  • Also need an upper bound
  • Although not an absolute constraint

32
Statistics
  • Induction about the Observable World
  • A statistic is a number that provides information
    about some variable of interest.
  • Descriptive Statistics
  • Numbers that describe some aspect of the world
  • Inferential Statistics
  • We use inferential statistics to take information
    from a sample and make some inference about a
    population.

33
Descriptive Statistics
  • There are two main ways we describe collections
    of data.
  • Measures of Central Tendency
  • Measures of Dispersion
  • These two approaches give us the ability to
    describe the distribution of the data what the
    data looks like.

34
Statistical Tools for Describing the World -
Distributions
  • Intuitive Definition
  • A bunch of numbers that measure a characteristic
    for a group of cases.
  • May be represented by a set of numbers, a graph
    or picture, or even a mathematical equation.

35
Measures of Central Tendency
  • Measures which provide some indication of the
    typical value or the 'middle' of the distribution

36
Measures of Central Tendency The Arithmetic Mean
(or Average)
  • The sum of all of the numbers in a set, divided
    by the number in the set
  • Most appropriate for symmetric distributions
  • Influenced by extreme values

37
Measures of Central Tendency The Median
  • The middle number in the data set.
  • (Sort the Data...)
  • The Median is the middle value if there are an
    odd number of cases.
  • The Median is the average of the two middle
    values if there are an even number of cases.
  • Best measure for skewed distributions
  • Not very tractable mathematically!

38
Measures of Central Tendency The Mode
  • The most frequently occurring value.
  • Used primarily for nominal data.
  • The peak value of a frequency distribution is
    also referred to as the mode.

39
Common terms for Measures of Central Tendency
  • We use the idea of measures of central tendency a
    great deal in everyday language.
  • Average, accordance, bread-and-butter,
    commonplace, Commensurate, congruent, consistent,
    conventional, customary, day-to-day, everyday,
    frequent, garden variety, general, habitual,
    humdrum, invariably, likeness, mean, median,
    medium, mediocrity, middle, middling,
    nondescript, normal, ordinary, popular,
    prevailing, regular, the same, standard,
    stereotypical, stock, typical, unexceptional,
    uniform, usual
  • From The Elementary Forms of Statistical Reason
    by R. P. Cuzzort and James S. Vrettos)

40
Measures of Dispersion
  • The Range
  • Range Highest value - lowest value
  • Uses only two pieces of information
  • Strongly influenced by the particular
    observations used.
  • A single outlier gives a very misleading view
  • For instance
  • The range in the length of term in office for a
    President of the United States is 30 days to 12
    years.

41
Measures of Dispersion
  • Percentiles the point on a distribution below
    which that percent fall
  • The 95th percentile means that you are in the top
    5
  • The Inner Quartile range is between the 25th and
    the 75th - hence the middle 50 of the data.

42
The Deviation about the Mean
  • The Deviation about the Mean
  • Indicates how far a value is from the center.
  • Note that in looking at how a distribution
    spreads out, we are using the measure of the
    center as our conceptual foundation.

43
The average of the deviations
  • So it would seem to make sense to calculate all
    of the deviations and find their average.
  • This would seem to give us a measure of the
    typical amount any given data point might vary.

44
The Average Deviation
  • Does the average of the deviations make sense?

45
Calculating the Average Deviation
Xi
1 1-3-2
2 2-3-1
3 3-30
4 4-31
5 5-32
?15 3.0 ??
46
The average absolute deviation.
  • Can we find the average of the absolute value of
    the deviations?
  • Yes, but difficult to use.

47
Calculating the average absolute deviation
Xi
1 1-32
2 2-31
3 3-30
4 4-31
5 5-32
?15 3.0 ?6 ABD6/5 1.2
48
Fixing these deviant measures
  • In order to represent variation about the mean,
    we must get rid of the minus signs in a
    mathematically acceptable manner.

49
The standard deviation
  • Square the deviations to remove minus signs
  • Take the square root to return to the original
    scale

50
Calculating the standard deviation
Xi ( )2
1 1-3-2 4
2 2-3-1 1
3 3-30 0
4 4-31 1
5 5-32 4
?15 3.0 ?6 ABD6/5 1.2 ?10 ?(10/5) s1.414
51
The Variance
  • The mean of the squared deviations has some
    utility as well.
  • Variance is what we seek to explain!

52
Calculating the Standard Deviation
  • The best way to calculate the standard deviation
    is to use a computer.
  • If one is not available, try the table method.
  • StDevdemo.xls (Excel)

53
Population measures
  • OKI lied. The formula for the standard
    deviation is not quite as I described.
  • It turns out that the Standard Deviation is
    biased in small samples.
  • The estimate is a little too small in small
    samples.
  • Thus we designate whether we are using population
    or sample data.

54
Population vs. Sample Means
55
Population vs. Sample Standard Deviations
56
Frequency Distributions
  • A frequency distribution is a graph or chart that
    shows the number of observations of a given
    value, or class interval.

57
The Frequency Histogram
  • To create a frequency histogram
  • Determine the class interval width.
  • Determine the number of intervals desired.
  • Tally number of observations in each range.
  • Create bar chart from class totals.
  • Note that
  • The X-axis represents the class interval values
  • The Y-axis represents the of cases

58
Example Frequency Distribution
  • Develop a frequency histogram for the following
    crime rate data for the 50 states.
  • Use the data provided in class from the US
    Statistical Abstract on (US Crime Rate p.5)
  • A brief aside on following computer
    demonstrations in class
  • Follow the general conceptual process the
    details will come later.
  • Most of the detail is provided on the screen
    scan it and all menu items.

59
Frequency Polygon
  • Same as a frequency histogram except the
    midpoints of the class intervals are used
  • Points are connected with a line graph
  • A large number of classes will make the
    distribution a smooth curve if there is a large
    sample size.

60
Frequency DistributionsShape
  • Modality
  • The number of peaks in the curve
  • Skewness
  • An asymmetry in a distribution where values are
    shifted to one extreme or the other.
  • Kurtosis
  • The degree of Peakedness in the curve
  • Continuity
  • Discrete versus continuous

61
Frequency Distributions - Modality
  • Unimodal
  • Bimodal
  • See ADA scores
  • Multimodal

62
Frequency Distributions - Skewness
  • The Third Moment about the Mean
  • Right Skew (Positive Skew)
  • Left Skew (Negative Skew)

63
Frequency DistributionsMeasuring Skewness
  • Measuring skewness alternate formula
  • Normal distribution has skewness 0.0
  • (Normal ranges between 3.0)

64
Frequency Distributions - Kurtosis
  • The Fourth Moment about the Mean
  • Platykurtic
  • Leptokurtic
  • Mesokurtic

65
Frequency DistributionsMeasuring Kurtosis
  • Alternate measure of kurtosis
  • Normal distributions have kurtosis 3.0

66
Continuity
  • Discrete distribution
  • Values can take on only discrete specific values
  • e.g. role of a die x ? 1, 2, 3, 4, 5, 6
  • Continuous distributions
  • Takes on infinitely fine values.
  • Sometimes the distinction is meaningless
  • i.e.

67
Frequency Distributions - Types
  • The Normal
  • Characterized by Mean and SD
  • Developed by Abraham de Moivre to describe errors
    in observations in astronomy
  • Also called the Gaussian distribution, since
    Gauss discovered the use of 2 parameter
    exponential functions for distributions
  • The normal curve is the most ubiquitous of this
    class
  • Describes the distribution of an infinite sum or
    mean of independent randomly generated variables
  • You should think of the normal curve as a
    fundamental law of the universe!

68
Frequency Distributions - Types
  • The Uniform

69
Frequency Distributions - Types
  • The Normal
  • The Uniform
  • The Log-normal
  • The Exponential
  • Statistical Distributions
  • t
  • ?-Square
  • F

70
Freuency Distributions Types (cont.)
  • Hyper-geometric
  • Poisson
  • Binomial
  • Gamma
  • Weibull
  • Logarithmic
  • Benford

71
A partial list of distributions
  • Discrete
  • Univariate
  • Benford  Bernoulli  binomial  Boltzmann 
    categorical  compound Poisson  discrete
    phase-type  degenerate  Gauss-Kuzmin 
    geometric  hypergeometric  logarithmic 
    negative binomial  parabolic fractal  Poisson 
    Rademacher  Skellam  uniform  Yule-Simon 
    zeta  Zipf  Zipf-Mandelbrot
  • Multivariate
  • Ewens  multinomial  multivariate Polya
  • Continuous
  • Univariate
  • Beta  Beta prime  Cauchy  chi-square  Dirac
    delta function  Coxian  Erlang   exponential 
    exponential power  F  Fermi-Dirac  Fisher's
    z  Fisher-Tippett  Gamma  generalized extreme
    value  generalized hyperbolic  generalized
    inverse Gaussian  Half-logistic  Hotelling's
    T-square  hyperbolic secant  hyper-exponential 
    hypoexponential  inverse chi-square (scaled
    inverse chi-square)  inverse Gaussian  inverse
    gamma (scaled inverse gamma)  Kumaraswamy 
    Landau  Laplace  Lévy  Lévy skew
    alpha-stable  logistic  log-normal 
    log-logistic  Maxwell-Boltzmann  Maxwell
    speed  Nakagami  normal (Gaussian) 
    normal-gamma  normal inverse Gaussian  Pareto 
    Pearson  phase-type  polar  raised cosine 
    Rayleigh  relativistic Breit-Wigner  Rice 
    RosinRammler  shifted Gompertz  Student's t 
    triangular  truncated normal  Tweedie  type-1
    Gumbel  type-2 Gumbel  uniform 
    Variance-Gamma  Voigt  von Mises  Weibull 
    Wigner semicircle  Wilks' lambda
  • Multivariate
  • Dirichlet  Generalized Dirichlet 
    inverse-Wishart  Kent  matrix normal 
    multivariate normal  multivariate Student  von
    Mises-Fisher  Wigner quasi  Wishart
  • Miscellaneous
  • bimodal  Cantor  conditional  equilibrium 
    exponential family  Infinite divisibility
    (probability)  location-scale family 
    marginal  maximum entropy  posterior  prior 
    quasi  sampling  singular   unimodal

72
Charts and Graphs
  • A picture is worth
  • Charts convey a large amount of specialized
    information in a compact way
  • They do not require the same type of cognitive
    processing that words and numbers do
  • Learn to use them!

73
Graphs Charts - Types
  • Descriptive Graphs
  • Bar Chart
  • Pie Graph
  • Line Graph
  • Distributions
  • Histogram
  • Box Plot
  • Steam and Leaf

74
Bar Charts
  • Best for displaying actual values.
  • Can handle moderate of cases (bars)
  • Excel calls it a column chart

75
Bar Chart An Example
76
Pie Charts
  • Best used with small number of categories or
    cases to display
  • Good for showing relative distribution
  • Percentages, proportions
  • Use only one column of data
  • Plus one column of labels

77
Pie Chart Examples
78
Line graphs
  • Best for showing data across time
  • Always give dates
  • Label X axis
  • Indicate units on Y axis
  • Use legend for multiple lines

79
Line Graphs - Example
80
Box Plot
  • Quick picture of a distribution
  • Parts of box give distribution characteristics
  • Your Text is not quite accurate!

81
Stem and Leaf Plot
  • Good for showing distribution while preserving
    data
  • Figuring out stems can be tricky

82
Probability
  • Before starting a discussion of the normal curve,
    a couple of brief points about probability is in
    order
  • Probability is essentially How likely is it that
    some event x will occur?
  • All Probabilities range between 0.0 and 1.0
  • Probabilities outside that range are impossible
    they are not probabilities
  • We symbolize probability as P(x)
  • The Probability of some event x is P
  • 0.0 ? P(x) ? 1.0

83
Probability Density Functions
  • A probability density function is a frequency
    distribution whose area is set equal to 1.0.
  • Most distributions are PDFs.
  • They let us assess the likelihood or probability
    of cases taking on particular values.

84
The Normal Distribution
  • The normal distribution is one of the most
  • Popular
  • Ubiquitous
  • Useful
  • distributions that we have.
  • It gives great predictive ability when we can
    apply it to data.

85
The Normal Distribution the Formula
  • The normal curve is described by the following
    formula.

86
The Normal Distribution (cont.)
  • This formula will give us the following
    distribution

87
Using the Normal Distribution
  • The normal distribution is a tool for examining
    how values are distributed.
  • Think of the normal distribution as an underlying
    physical process that generates values according
    to a general pattern.
  • Because these values have this pattern, we have
    information about them that translates to
    probabilities.

88
Standard Normal Distribution
  • The standard normal distribution is specialized
    example of the Normal Dist.
  • The Standard Normal distribution has ? 0.0 and
    ? 1.0
  • We say this symbolically as
  • Z ?N(0,1)
  • (or Z is normally distributed with a mean of zero
    and a standard deviation of one)

89
The Normal PDF
  • Because the standard normal curve is a PDF, we
    can use it to make probability assessments about
    values in the distribution.
  • We can, in essence, convert our regular normal
    distribution to the Standard Normal, and examine
    areas under the curve to calculate probabilities.

90
Using the Normal PDF
  • We know the following facts
  • Area under the curve 1.0
  • Its symmetric, so the probability of Xi being
    greater that 0.0 is .5
  • Symbolically,
  • P(Xi gt 0.0) .5

91
Using the Normal PDF
  • We can use this information in the following
    fashion
  • The P(0.0 ? Xi ? 1.0) .3413
  • Thus 68 of the Xis will fall between ?1?
  • Thus 96 of the Xis will fall between ?2?

92
Standard Normal Variables
  • A Standard Normal Variable is one that has been
    transformed by the following formula
  • All Z-scores, as they are called, will have a
    mean 0.0 and s 1.0

93
Creating indices
  • Z-score transformations are useful for adding
    apples and oranges
  • They transform similar variables to the same
    scale.
  • As a result, we can add together things that
    would otherwise be impossible to combine.

94
The assumption of normality
  • As a result of the application of the standard
    normal distribution, we can make probability
    statements about data.
  • (A probability statement is one that is of the
    general How likely is it that? format.)
  • If we can assume that the data is normally
    distributed, then we know how likely it is that
    individual cases are above or below selected
    values expressed in standard deviation units.

95
The Central Limit Theorem
  • The Normal Distribution pops up in one very
    important context
  • The Central Limit Theorem
  • This is a fundamental concept that allows us to
    infer the characteristics of a population based
    upon a sample.

96
Sampling Distributions
  • The probability distribution of a statistic is
    called its sampling distribution.
  • If we collect a sample and calculate the mean,
    that is one data point in the sampling
    distribution of the sample mean.
  • If we do this many times, we have a sampling
    distribution, which we can then describe.

97
The Central Limit Theorem
  • The CLT tells us
  • As the sample size n gets larger, the sampling
    distribution of the sample mean can be
    approximated by a normal distribution with mean
    of ?, and a standard deviation equal to
  • where ? and ? are the population
    characteristics.

98
The Implications of the Central Limit Theorem
  • We can use the CLT to make probability statements
    about the sample mean because we know its
    distributional characteristics.
  • Even if the original variable X is not normally
    distributed, the sampling dist of the sample mean
    is!

99
Statistical inference
  • We can use information about the way variables
    are distributed to make assessments of
    probability about them.
  • Many of these questions are phrased as Is A
    greater (or less) than B?
  • This may also be phrased
  • Does A belong to the same population as B?

100
Assessing probabilities
  • Take income
  • Would you expect a doctor to have a higher income
    than the population at large?
  • Would dog-catchers be lower?
  • Would you expect males to have a higher income
    than females
  • Is WV income lower than the national average?
    What about Oklahoma?

101
Statistical Decision-making
  • Many of these questions are best answered with a
    statement of statistical confidence a
    probability assessment.
  • This statistical confidence places a decision
    within an objective framework.
  • If we define the criteria for making decisions
    according to some reasonable standards, then we
    can remove (or certainly reduce the subjectivity
    of the researcher.

102
Statistical D-M (cont.)
  • If you collected the following information, would
    you conclude that males had a higher income than
    females?
  • MeanMales 55.5K, MeanFemales 54.9K
  • MeanMales 55.5K, MeanFemales 50.9K
  • MeanMales 55.5K, MeanFemales 34.9K
  • Where would you draw the line?
  • Does sample size matter?

103
Statistical Decision-making problem setup
  • The Mon river
Write a Comment
User Comments (0)
About PowerShow.com