Title: Data Analysis (Quantitative Methods)
1Data Analysis (Quantitative Methods)
- Lecture 1
- Fundamental Statistics
2What is statistics
- Statistics is the science of data. This involves
collecting, classifying, summarizing, organizing,
analyzing, and interpreting numerical information.
3Dealing with Data
- Measurement Scales
- Descriptive Statistics
- Inferential Statistics
4Dealing with Data
- Quantitative data are measurements that are
recorded on a naturally occurring numerical
scale. - Qualitative data are measurements that cannot be
measured on a natural numerical scale they can
only be classified into one of a group of
categories.
5Levels of Measurement.
- Nominal Scale (Qualitative category membership
e.g. gender, eye colour, nationality). - Ordinal Scale (Ranks or assignments, positions in
a group e.g. 1st 2nd 3rd). - Interval and Ratio Scales (measured on an
independent scale with units, e.g. I.Q scale.
Ratio scale has an absolute zero point e.g.
distance, Kelvin scale).
6Variables
- A variable is a characteristic or property of an
individual population unit (a set of unit we are
interested in studying). - Discrete Variables There are no possible values
between adjacent units on the scale. For
Example, number of children in a family. X1, X2,
, Xn - Continuous Variables Is a variable that
theoretically can have an infinite number of
values between adjacent units on the scale. For
Example, Time, height, weight. X e 0,100,
0,30), (12, 80, (1,2)
7Descriptive Statistics
Descriptive statistics utilizes numerical and
graphical methods to look for patterns in a data
set, to summarize the information revealed in a
data set, and to represent that information in a
convenient form.
- Graphical Representation of Data
- Measures of Central Tendency
- Measures of Dispersion
8Representing Data Graphically
- Bar Charts
- Histograms
- Pie Charts
- Scattergrams
9The Bar Chart
- Used for Discrete variables
- Bars are separated
10Histogram
- Columns can only represent frequencies.
- All categories represented.
- Columns are not spaced apart.
11Pie Chart
- Used to illustrate percentages
12Scattergrams - Positive Relationships
13Negative Correlation
14No Relationship
15Measures of Central Tendency
- The Mean
- The Median
- The Mode
16The Mean
Mean Sum of all values in a group divided by
the number of values in that group. So if 5
people took 135, 109, 95, 121, 140 seconds to
solve an anagram, the mean time taken is
135 109 95 121 140
600 --------------------------------------------
----------- 120 5
5
17The Mean Pros Cons
- Advantages
- Very Sensitive Measure.
- Forms the basis of most tests used in inferential
statistics.
- Disadvantages
- Can be effected by outlying scores E.g.
- 135, 109, 95, 121,140 480. Mean 1080/6 180
seconds.
18The Median
The median is the central value of a set of
numbers that are placed in numerical order.
For an odd set of numbers 95, 109, 121, 135, 140
The Median is 121
For an even set of numbers 95, 109, 121, 135,
140, 480 The Median is the two central scores
divided by 2. (121 135)/2 128
19The Median Pros Cons
- Advantages
- Easier and quicker to calculate than the mean.
- Unaffected by extreme values.
- Disadvantages
- Doesnt take into account the exact values of
each item - If values are few it can be unrepresentative. E.G
- 2,3,5,98,112 the median is 5
20The Mode
The Mode The most frequently occurring value.
1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6,
7, 7, 7, 8
The Mode 5
21The Mode Pros cons
- Disadvantages
- Doesnt take into account the value of each item.
- Not useful for small sets of data
- Advantages
- shows the most important value of a set.
- Unaffected by extreme values
22Data Types and Central Tendency Measures.
The Mode may also be used on Ordinal and Interval
Data. The median may also be used on Interval
Data.
23Why look at dispersion?
- 17, 32, 34, 58, 69, 70, 98, 142
- Mean 65
- 61, 62, 64, 65, 65, 66, 68, 69
- Mean 65
24Measures of Dispersion
- The Range
- Variance
- The Standard Deviation
25The Range
The Range is the difference between the highest
and the lowest scores.
Range Highest score - lowest score
4, 10, 5, 12, 6, 14 Range 14 - 4 10
26Population
- Population is a set of units (usually, people,
objects, transactions, or events) that we are
interested in studying.
27Sample
- A sample is a subset of the units of a
population. - A statistical inference is an estimate,
prediction, or some other generalization about a
population based on information contained in a
sample
28Variance
Population Variance
Sample Variance
29Calculating the Standard Deviation
Population standard deviation
Sample standard deviation
30Inferential Statistics
- Inferential statistics utilizes sample data to
make estimates, decisions, predictions, or other
generations about a larger set of data. - Inferential statistics allows us to draw
conclusions about populations, and to test
research hypotheses. - Inferential Statistics Involves
- Probability
- Statistical Tests e.g., t test.
31Summary
- All data is measured on either Nominal, Ordinal,
Interval or Ratio Scales - Variables can be discrete and continuos
- Descriptive Statistics such as measures of
central tendency and dispersion are used to
describe or characters data - Inferential Statistics is used to make inferences
from sample data about the population at large.
32Data Analysis(Quantitative Methods)
33Probability
- An experiment is an act or process of observation
that leads to a single outcome that cannot be
predicted with certainty. - A sample point is the most basic outcome of an
experiment. Ob1, Ob2, , Obn.
34Sample Space Venn Diagram
- The sample space of an experiment is the
collection of all its sample points. - S Ob1, Ob2, , Obn
- Venn diagrams.
- Graphical representations.
-
Ob1 Ob2 Obn
S
35Examples
- Experiment Observe the up face on a coin
- Sample Space 1. Observe a head H
- 2. Observe a tail
T - S H, T
H T
S
36Examples
- Experiment Observe the up face on a die.
- Sample Space 1. Observe a 1.
- 2. Observe a 2.
- 3. Observe a 3.
- 4. Observe a 4.
- 5. Observe a 5.
- 6. Observe a 6.
- S 1,2,3,4,5,6
1 2 3 4 5 6
s
37Examples
- Experiment Observe the up face on two coins
- Sample Space 1. Observe HH
- 2. Observe HT
- 3. Observe TH
- 4. Observe TT
- S HH,HT,TH,TT
HH HT TH TT
S
38Probability Rules for Sample Points
- All sample point probabilities must lie between
0 and 1. - The probabilities of all the sample points within
a sample space must sum to 1.
39Probability
- An Event is a specific collection of Sample
points. - Example Consider the experiment of tossing two
balanced coins. - Events
- A Observe exactly one head
- B Observe at least one head
40Probability
- Sample point Probability
- HH 1/4
- HT 1/4
- TH 1/4
- TT 1/4
- P(A)P(HT)P(TH)1/2
- P(B)P(HH)P(TH)P(HT)3/4
41Probability of an Event
- The probability of an event A is calculated by
summing the probabilities of the sample points in
the sample space for A.
42Steps for Calculating Probabilities of Events
- Define the experiment.
- List the sample points. Ob1,Ob2,,Obn
- Assign probabilities to the sample points.
- P(Ob1), , P(Obn).
- Determine the collection of sample points
contained in the event of interest. - Sum the sample point probabilities to get the
event probability.
43Unions and intersections
- The Union of two events A and B is the event that
occurs if either A or B or both occur on a single
performance of the experiment, denoted as the
symbol
44Unions and intersections
- The intersection of two events A and B is the
event that occurs if both A and B occur on a
single performance of the experiment, denoted as
the symbol - P(A ? B)
45Unions and intersections
A
A
B
46Unions and intersections
- Example 1.
- Consider the die-toss experiment. Define the
following events - A Toss an even number
- B Toss a number less than or equal to 3
- Find
47Unions and intersections
48Complementary Events
- The complement of an event A is the event that A
does not occur -- that is, the event consisting
of all sample points that are not in event A and
denoted as symbol Ac - P(A)P(Ac)1
49Probability
- Additive Rule of Probability
50Conditional Probability
- To find the conditional probability that event A
occurs given that event B occurs, divide the
probability that both A and B occur by the
probability that B occurs, that is,
51Probability
HH
H
T
H
HT
TH
H
T
TT
T
52Independent Events
- Events A and B are independent events if the
occurrence of B does not alter the probability
that A has occurred that is, events A and B are
independent if - P(AB)P(A)
- When events A and B are independent, it is also
true that P(BA)P(B) - Events that are not independent are said to be
dependent.
53Probability
- Probability of Intersection of Two independent
Events - If events A and B are independent, the
probability of the intersection of A and B equals
the product of the probabilities of A and B that
is P(A ? B)P(A) P(B) - The converse is also true
- if P(A ? B)P(A) P(B), then A and B are
independent.
54Exercise
- Calculate the mode, mean, and median of the
following data - (1) 12, 13, 15, 18, 12, 56, 13, 17, 19, 20, 35,
36 - (2) 35, 23, 18, 26, 35, 23, 39, 45, 47, 37, 23,
35, 19
55Excercise
- Calculate the range, variance and standard
deviation of the following data - (1) 2, 3, 1, 6, 8, 5, 9, 4, 5
- (2) 2, 0, 8, 4, 7, 5, 3, 2, 100
56Excercise
- Two fair coins are tossed and the following
events are defined - AObserved at least one head
- BObserved exactly one head
- CObserved exactly one tail
- DObserved at most one head
- Find P(A), P(B ? D), P(AD)
57Exercise
- Use tree-diagram to obtain the Sample space of an
experiment that consists of a fair coin being
tossed four times. Consider the following
events - AAll four results are the same.
- Bexactly one Head occurs.
- Cat least two Heads occur.
- Find P(A),P(B),P(C), P(A)P(B)P(C), P(A? C),
P(A? B) - Hence, explain why all the events A,B and C are
not Mutually Exclusive.
58Exercise
- Let P(A)0.7, P(B)0.5 and P(A? B) 0.8.
- Find (1) P(A? B) (2) P(BA)
(3) Is event A independent of event B?
59References
- Statistics, 8th Edition
- MaClave and Sincich
- Prentice Hall, 2000.
-