Types of units and variables

1 / 62
About This Presentation
Title:

Types of units and variables

Description:

Types of units and variables Examples of variables. What are the possible units? Murder rate Litigation rate Support for freedom of speech Income Party identification ... –

Number of Views:73
Avg rating:3.0/5.0
Slides: 63
Provided by: Vanessa145
Category:

less

Transcript and Presenter's Notes

Title: Types of units and variables


1
Types of units and variables
2
Examples of variables.What are the possible
units?
  • Murder rate
  • Litigation rate
  • Support for freedom of speech
  • Income
  • Party identification
  • Liberalism

3
What is not a variable?
  • Speed of light
  • Parameters
  • Statistics
  • These are called constants.

4
How do we describe variables?
  • Measures of central tendency (mean, median, mode)
  • Dispersion around the mean

5
The Arithmetic Mean (or Average)
  • The sum of all of the numbers in a set, divided
    by the number in the set
  • Most appropriate for symmetric distributions
  • Influenced by extreme values

6
The Median
  • The middle number in the data set.
  • If you sort the data in order from lowest to
    highest
  • The Median is the middle value if there are an
    odd number of cases.
  • The Median is the average of the two middle
    values if there are an even number of cases.
  • Best measure for skewed distributions

7
The Mode
  • The most frequently occurring value.
  • Used primarily for nominal data.
  • The peak value of a frequency distribution is
    also referred to as the mode.

8
Types of Variables
  • Nominal
  • Dichotomous
  • Dummy
  • Ordinal
  • Ratio
  • Interval
  • Continuous
  • Independent
  • Dependent

Note that this list of variables is neither
exhaustive nor mutually exclusive.
9
Dependent variable
  • This is the political or social phenomenon we are
    interested in explaining.
  • It should be important.
  • And its explanation should matter to us.
  • There is only ONE dependent variable in any
    research project.
  • Dependent variables are also called endogenous
    variables.

10
Independent variables
  • These are the political or social phenomena we
    use to explain our dependent variable.
  • Logic can be used to defend why you believe that
    the independent variable causes the dependent
    variable.
  • There should be non-obvious, interesting and
    important implications from your conclusion.
  • Independent variables are called exogenous.
    Sometimes, this categorization is incorrect.
    (more on that later).

11
Causal model
X ? Y Independent variable causes dependent
variable
For example
Percentage of people living in urban areas causes
female literacy
What could be the units of analysis in this
example?
12
Predicting Perceptions of Fairness of a Supreme
Court Decision
Support for
Bush v. Gore
Perception that the
Process Should be
Legalistic
13
Data
Country pop density Urban Religion Austria
8000 94.0 58 Catholic Belgium
10100 329.0 96 Catholic Bosnia
4600 87.0 36 Muslim Bulgaria
8900 79.0 68 Orthodox Canada
29100 2.8 77 Catholic Croatia
4900 85.0 51 Catholic Czech Rep.
10400 132.0 . Catholic Denmark
5200 120.0 85 Protstnt Finland
5100 39.0 60 Protstnt France
58000 105.0 73 Catholic Germany
81200 227.0 85 Protstnt Iceland
263 2.5 91 Protstnt
14
Female literacy and urban density
Female Literacy ()
People living in cities ()
15
Nominal Variables
16
Frequency distribution of a nominal variable
example from the survey of Beslan victims
. tab q11 The most important problem that
caused the tragedy in Beslan
Freq. Percent
Cum. --------------------------------------------
-------------------------------
Corruption among border guards 152
13.84 13.84 Corruption among state
officials 561 51.09 64.94
War in Chechnya 100
9.11 74.04 Lack of consistency in
actions of diffe 51 4.64
78.69 Refusal of federal authorities to negot
53 4.83 83.52 Inhuman
actions of hostage takers 79
7.19 90.71 Mistakes of FSB and police
forces 53 4.83 95.54
Refused 14
1.28 96.81
Unsure 35 3.19
100.00 ------------------------------------------
---------------------------------
Total 1,098 100.00
17
Mean, median, mode
  • Is the mean of the table interesting?
  • What about the median?
  • What about the mode?

18
When you have nominal variables, create a dummy
variable
generate mimpcorr 0 if q11 lt 97 replace
mimpcorr 1 if (q11 1 q11 2) . tab
mimpcorr mimpcorr Freq. Percent
Cum. ----------------------------------------
------- 0 336 32.03
32.03 1 713 67.97
100.00 ------------------------------------------
----- Total 1,049 100.00
Why is the n 1,049?
19
Dichotomous Variables
20
Dichotomous Variables
  • Variables that only have two values.
  • Gender - male, female
  • Race - black, white
  • Agreement - yes, no
  • true, false
  • Value - high, low
  • war, no war
  • vote, no vote

21
Frequency Are you a man or a woman?
. tab d1 Gender Freq. Percent
Cum. ---------------------------------------
-------- Male 372 33.88
33.88 Female 726 66.12
100.00 ------------------------------------------
----- Total 1,098 100.00
Note the n 1,098
22
Mean, median, mode
  • Is the mean of the table interesting?
  • What about the median?
  • What about the mode?

23
Ordinal Measurement
  • With ordinal variables, there is a rough
    quantitative sense to their measurement, but the
    differences between scores are not necessarily
    equal.
  • The values are in order, but not fixed

24
Examples of Ordinal Measures
  • Rankings (1st, 2nd, 3rd, etc)
  • Grades (A, B, C, D, F)
  • Education (High School, College, Advanced degree)
  • Evaluations
  • Hi, Medium, Low
  • Likert Scales
  • 5 pt (strongly agree, agree, neither agree nor
    disagree, disagree, strongly disagree)
  • 7 point liberalism scale (strongly liberal,
    liberal, weakly liberal, moderate, weakly
    conservative, conservative, strongly conservative)

25
Naming concepts
  • You should name concepts so that the reader knows
    what is high and low

26
Acceptable concept names?
  • Racism
  • Perceived inequality
  • Support for equality
  • Culture
  • Institutions
  • Germany

27
Concept name?
  • Given what happened in Beslan, some people think
    that violence against Ingush is justifiable.
    Other people think that, despite the tragedy in
    Beslan, there is no justification for violence
    against Ingush. Which view is closer to your own?
    Do you feel this way strongly or only somewhat?
  • yes (strongly)
  • yes (somewhat)
  • no (somewhat)
  • no (strongly)

28
Concept name?
  • How proud are you to be a Russian citizen? very
    proud, rather proud, not very proud, not at all
    proud
  • For the following statement, do you strongly
    agree, somewhat agree, somewhat disagree, or
    strongly disagree I would rather be a citizen of
    Russia than of any other country in the world.

29
Concept name?
  • If you had some complaint about a national
    government activity and took that complaint to a
    member of the national government, do you think
    that he or she would pay a lot of attention, some
    attention, very little attention, no attention at
    all?

30
Frequency distribution ordinal variable
Whether violence against Ingush is
justifiable or not Freq.
Percent Cum. ------------------------------
--------------------------------------------- Vio
lence against Ingush is justifiable 191
17.40 17.40 Violence against Ingush
is justifiable 126 11.48
28.87 There is no justification for violence
319 29.05 57.92 There is no
justification for violence 155
14.12 72.04
Refused 99 9.02 81.06
Unsure 208
18.94 100.00 ----------------------------
-----------------------------------------------
Total
1,098 100.00
31
Mean, median, mode
  • Is the mean of the table interesting?
  • What about the median?
  • What about the mode?

32
Ratio Measurement
  • Ratio variables have fixed zero points.
  • A percentage is a ratio variable.
  • Ratio variables are usually continuous but must
    not be measured continuously but may be measured
    discretely

33
Interval Measurement
  • Variables or measurements where the difference
    between values is measured by a fixed scale. Can
    be continuous or discrete.
  • Money
  • Number of people (population)
  • Age

34
What about income?
  • Income increases a dollar at a time
  • Distance between points seems fixed

graduate student salary
low
professor salary
high
highest
medium
35
Variables can be categorized based on their
relationship with another variable
  • If the impact of a variable on another variable
    is interval, then we say that its relationship is
    interval level
  • This means that the effect is the same,
    regardless of the value of the independent
    variable

36
Interval relationship
40,000
35,000
30,000
25,000
Luxury spending
20,000
15,000
10,000
5,000
0
120,000
100,000
80,000
60,000
40,000
20,000
10,000
0
Income
37
Income and luxury spending
38
Central Tendency, Variance and Standard Deviation
39
Units of analysis, populations, samples
  • Units of analysis are usually people, time or
    places, such as countries, cities or states
    (provinces) the number of units is called the
    number of observations or n for short
  • Population the whole population of the states
    in the U.S., the population of people in the
    U.S., all the countries in the world
  • A sample from above kinds of populations

40
Statistics versus parameters
  • Parameters indicate attributes of populations
  • Statistics indicate attributes of samples
  • When we have a sample, we use statistics to make
    inferences about population and therefore the
    parameters
  • Usually we do not know parameters
  • The study of the statistics is the study of
    making inferences from sample statistics to
    population parameters
  • Greek symbols are usually used for parameters and
    alphabetic symbols are used for statistics

41
Expected Values and Probabilities
  • If you have a set of numbers called x
    1,1,2,2,3,3 what is the expected value?
  • What is P(2)? What is P(1)? What is P(3)?
  • If our x is 1,1,3,3,17, then the expected value
    is 5, even though P(5) 0.
  • Suppose we know that E(X) 5 with the equation y
    5 7x.
  • What is E(Y)?

42
Variance or Dispersion
  • Variance is the spread about the mean
  • Why do we care about variance?
  • Variance in rights protections
  • Variance in election outcomes
  • Variance in the presence of genocide across
    countries
  • Variance in income inequality
  • Variance in economic growth
  • Variance in revolution

43
Measures of Dispersion
  • The Range
  • Range Highest value - lowest value
  • The range of the temperature in a day around the
    middle of September is 40F to 85F the range is
    45 degrees
  • Uses only two pieces of information

44
The Deviation about the Mean
  • The Deviation about the Mean
  • Indicates how far a value is from the center.

45
Two sets of numbers and notation
one number in the set
1 24
2 27
3 21
4 22
1 31
2 7
3 12
4 18
place in the set
mean of the set
X1 31 X2 7 X 17
X1 24 X2 27 X 23.5
46
The average of the deviations
  • So does it make sense to calculate all of the
    deviations and find their average?
  • This would seem to give us a measure of the
    typical amount any given data point might vary.

47
The Average Deviation
  • Does the average of the deviations make sense?

48
Calculating the Average Deviation
Xi Xi-X
1 1-3-2
2 2-3-1
3 3-30
4 4-31
5 5-32
?15 X15/5 3.0 ??
49
Fixing these deviant measures
  • To represent variation about the mean, we have to
    calculate deviations as positive numbers
  • We must get rid of the minus signs in a
    mathematically acceptable manner.

50
Variance
  • Square the deviations to remove minus signs, then
    sum them
  • Read above as the sum of squared deviations from
    the mean
  • The units of variance are squares
  • Note that the sigma indicates population

51
The standard deviation
  • Take the square root to return to the original
    scale
  • Read above as the square root of the sum of
    squared deviations from the mean
  • Note that the sigma indicates population
    parameter, not a statistic

52
Calculating the standard deviation
Xi Xi-Xmean (Xi-Xmean)2
1 1-3-2 4
2 2-3-1 1
3 3-30 0
4 4-31 1
5 5-32 4
Sum (?) 15 Mean 15/5 3.0 Sum (?) 0 Sum (?) 10 ?(10/5) s1.414
53
The Variance
  • Variance, the average of the squared deviations
    has some utility as well.
  • Variance is what we seek to explain!

54
Population measures
  • The formula for the standard deviation is not
    quite as I described for samples.
  • It turns out that the standard deviation is
    biased in small samples.
  • The estimate is a little too small in small
    samples.
  • Thus we designate whether we are using population
    or sample data.

55
Population vs. Sample Standard Deviations
56
Calculating variance An example
57
Observation LOVAR HIVAR
X1 1 -100
X2 1 -100
X3 2 10
X4 3 100
X5 3 100
X6 1 -100
X7 1 -100
X8 2 10
X9 3 100
X10 3 100
X11 1 -100
X12 1 -100
X13 2 10
X14 3 100
X15 3 100
X16 1 -100
X17 2 10
X18 3 100
X19 3 100
X20 1 -100
Two variables LOVAR, HIVAR Which varies more?
58
Stata syntax Summarize hivar
hivar ---------------
----------------------------------------------
Percentiles Smallest 1 -100
-100 5 -100 -100 10
-100 -100 Obs
20 25 -100 -100 Sum of
Wgt. 20 50 10
Mean 2
Largest Std. Dev. 91.85801 75
100 100 90 100
100 Variance 8437.895 95
100 100 Skewness
-.0667475 99 100 100
Kurtosis 1.248016

59
Stata syntaxSummarize lowvar, details
  • lovar
  • --------------------------------------------------
    -----------
  • Percentiles Smallest
  • 1 1 1
  • 5 1 1
  • 10 1 1 Obs
    20
  • 25 1 1 Sum of Wgt.
    20
  • 50 2 Mean
    2
  • Largest Std. Dev.
    .9176629
  • 75 3 3
  • 90 3 3 Variance
    .8421053
  • 95 3 3 Skewness
    0
  • 99 3 3 Kurtosis
    1.25

60
Variance of sample LOVAR
  • Sum of squared deviation from the mean / n-1
  • (1 2)2 (1 2)2 (2 2)2 (3 2)2 (3
    2)2 (1 2)2 (2 2)2 (3 2)2 (3 2)2
    (1 2)2 (1 2)2 (2 2)2 (3 2)2
    (3 2)2 (1 2)2 (2 2)2 (3 2)2 (3
    2)2

19
61
Standard Deviation
Square root of the sum of squared deviation from
the mean / n-1
v
(1 2)2 (1 2)2 (2 2)2 (3 2)2 (3
2)2 (1 2)2 (2 2)2 (3 2)2 (3 2)2
(1 2)2 (1 2)2 (2 2)2 (3 2)2
(3 2)2 (1 2)2 (2 2)2 (3 2)2 (3
2)2
19
62
Plot of HIVAR AND LOVAR
. plot hivar lovar 100


h
i v
a r

-100
-------------------------------------------------
--------------- 1
lovar 3
Write a Comment
User Comments (0)
About PowerShow.com