Title: STAT%20551%20PROBABILITY%20AND%20STATISTICS%20I
1STAT 551PROBABILITY AND STATISTICS I
2WHAT IS STATISTICS?
- Statistics is a science of collecting data,
organizing and describing it and drawing
conclusions from it. That is, statistics is a way
to get information from data. It is the science
of uncertainty.
3WHAT IS STATISTICS?
- A pharmaceutical CEO wants to know if a new drug
is superior to already existing drugs, or
possible side effects. - How fuel efficient a certain car model is?
- Is there any relationship between your GPA and
employment opportunities? - Actuaries want to determine risky customers for
insurance companies.
4STEPS OF STATISTICAL PRACTICE
- Preparation Set clearly defined goals, questions
of interests for the investigation - Data collection Make a plan of which data to
collect and how to collect it - Data analysis Apply appropriate statistical
methods to extract information from the data - Data interpretation Interpret the information
and draw conclusions
5STATISTICAL METHODS
- Descriptive statistics include the collection,
presentation and description of numerical data. - Inferential statistics include making inference,
decisions by the appropriate statistical methods
by using the collected data. - Model building includes developing prediction
equations to understand a complex system.
6BASIC DEFINITIONS
- POPULATION The collection of all items of
interest in a particular study.
- SAMPLE A set of data drawn from the population
- a subset of the population available for
observation
- PARAMETER A descriptive measure of the
- population, e.g., mean
- STATISTIC A descriptive measure of a sample
- VARIABLE A characteristic of interest about each
- element of a population or
sample.
7EXAMPLE
- Population Unit
Sample Variable - All students currently Student Any
department GPA - enrolled in school
Hours of works per -
week - All books in library Book
Statistics Books Replacement cost - Frequency of check out
- Repair needs
- All campus fast food Restaurant Burger King
Number of employees - restaurants Seating capacity
- Hiring/Not hiring
Note that some samples are not representative of
population and shouldnt be used to draw
conclusions about population. In the first
example, some students from all (or almost all)
departments would constitute a better sample.
8How not to run a presidential poll
- For the 1936 election, the Literary Digest picked
names at random out of telephone books in some
cities and sent these people some ballots,
attempting to predict the election results,
Roosevelt versus Landon, by the returns. Now,
even if 100 returned the ballots, even if all
told how they really felt, even if all would
vote, even if none would change their minds by
election day, still this method could be (and
was) in trouble They estimated a conditional
probability, used part of the American population
which had phones, that part was not typical of
the total population. Dudewicz Mishra, 1988
9STATISTIC
- Statistic (or estimator) is any function of a
r.v. of r.s. which do not contain any unknown
quantity. E.g. - are statistics.
- are NOT.
- Any observed or particular value of an estimator
is an estimate.
10RANDOM VARIABLES
- Variables whose observed value is determined by
chance - A r.v. is a function defined on the sample space
S that associates a real number with each outcome
in S. - Rvs are denoted by uppercase letters, and their
observed values by lowercase letters. - Example Consider the random variable X, the
number of brown-eyed children born to a couple
heterozygous for eye color (each with genes for
both brown and blue eyes). If the couple is
assumed to have 2 children, X can assume any of
the values 0,1, or 2. The variable is random in
that brown eyes depend on the chance inheritance
of a dominant gene at conception. If for a
particular couple there are two brown-eyed
children, we have x2.
11COLLECTING DATA
- Target Population The population about which we
want to draw inferences. - Sampled Population The actual population from
which the sample has been taken.
12SAMPLING PLAN
- Simple Random Sample (SRS) All possible members
are equally likely to be selected. - Stratified Sampling Population is separated
into mutually exclusive sets (strata) and then
sample is drawn by using simple random samples
from each strata. - Convenience Sample It is obtained by selecting
individuals or objects without systematic
randomization.
13 14EXAMPLE
- A politician who is running for the office of
mayor of a city with 25,000 registered voters
runs a survey. In the survey, 48 of the 200
registered voters interviewed say they plan to
vote for her. - What is the population of interest?
- What is the sample?
- Is the value 48 a parameter or a statistic?
-
The political choices of the 25,000 registered
voters
The political choices of the 200 voters
interviewed
Statistic
15EXAMPLE
- A manufacturer of computer chips claims that less
than 10 of his products are defective. When 1000
chips were drawn from a large production run,
7.5 were found to be defective. - What is the population of interest?
- What is the sample?
- What is parameter?
- What is statistic?
- Does the value 10 refer to a parameter or a
statistics? - Explain briefly how the statistic can be used to
make inferences about the parameter to test the
claim.
The complete production run for the computer
chips
1000 chips
Proportion of the all chips that are defective
Proportion of sample chips that are defective
Parameter
Because the sample proportion is less than 10,
we can conclude that the claim may be true.
16DESCRIPTIVE STATISTICS
- Descriptive statistics involves the arrangement,
summary, and presentation of data, to enable
meaningful interpretation, and to support
decision making. - Descriptive statistics methods make use of
- graphical techniques
- numerical descriptive measures.
- The methods presented apply both to
- the entire population
- the sample
17Types of data and information
- A variable - a characteristic of population or
sample that is of interest for us. - Cereal choice
- Expenditure
- The waiting time for medical services
- Data - the observed values of variables
- Interval and ratio data are numerical
observations (in ratio data, the ratio of two
observations is meaningful and the value of 0 has
a clear no interpretation. E.g. of ratio data
weight e.g. of interval data temp.) - Nominal data are categorical observations
- Ordinal data are ordered categorical observations
18Types of data examples
Examples of types of data Examples of types of data
Quantitative Quantitative
Continuous Discrete
Blood pressure, height, weight, age Number of children Number of attacks of asthma per week
Categorical (Qualitative) Categorical (Qualitative)
Ordinal (Ordered categories) Nominal (Unordered categories)
Grade of breast cancer Better, same, worse Disagree, neutral, agree Sex (Male/female) Alive or dead Blood group O, A, B, AB
19Types of data analysis
- Knowing the type of data is necessary to properly
select the technique to be used when analyzing
data. - Types of descriptive analysis allowed for each
type of data - Numerical data arithmetic calculations
- Nominal data counting the number of observation
in each category - Ordinal data - computations based on an ordering
process
20Types of data - examples
Numerical data
Nominal
Age - income 55 75000 42 68000 . . . .
Person Marital status 1 married 2 single 3 sin
gle . . . .
Weight gain 10 5 . .
Computer Brand 1 IBM 2 Dell 3 IBM . . . .
21Types of data - examples
Numerical data
Nominal data
A descriptive statistic for nominal data is the
proportion of data that falls into each
category.
Age - income 55 75000 42 68000 . . . .
Weight gain 10 5 . .
IBM Dell Compaq Other Total 25
11 8 6 50
50 22 16 12
22Cross-Sectional/Time-Series/Panel Data
- Cross sectional data is collected at a certain
point in time - Test score in a statistics course
- Starting salaries of an MBA program graduates
- Time series data is collected over successive
points in time - Weekly closing price of gold
- Amount of crude oil imported monthly
- Panel data is collected over successive points in
time as well
23Differences
Cross-sectional Time series Panel
Change in time Cannot measure Can measure Can measure
Properties of the series No series Long usually just one or a few series Short hundreds of series
Measurement time Measurement only at one time point even if more than one time point, samples are independent from each other Usually at regular time points (all series are taken at the same time points and time points are equally spaced) Varies
Measurements Response(s) time-independent covariates Response(s) time usually no covariate Response(s) time time-dependent and independent covariates
24GAMES OF CHANCE
25COUNTING TECHNIQUES
- Methods to determine how many subsets can be
obtained from a set of objects are called
counting techniques.
FUNDAMENTAL THEOREM OF COUNTING If a job
consists of k separate tasks, the i-th of which
can be done in ni ways, i1,2,,k, then the
entire job can be done in n1xn2xxnk ways.
26THE FACTORIAL
- number of ways in which objects can be permuted.
- n! n(n-1)(n-2)2.1
- 0! 1, 1! 1
- Example Possible permutations of 1,2,3 are
1,2,3, 1,3,2, 3,1,2, 2,1,3, 2,3,1,
3,2,1. So, there are 3!6 different
permutations.
27COUNTING
- Partition Rule There exists a single set of N
distinctly different elements which is
partitioned into k sets the first set containing
n1 elements, , the k-th set containing nk
elements. The number of different partitions is
28COUNTING
- Example Lets partition 1,2,3 into two sets
first with 1 element, second with 2 elements. - Solution
- Partition 1 1 2,3
- Partition 2 2 1,3
- Partition 3 3 1,2
- 3!/(1! 2!)3 different partitions
29Example
- How many different arrangements can be made of
the letters ISI? - 1st letter 2nd letter 3rd letter
I
I
S
I
S
S
I
I
N3, n12, n21 3!/(2!1!)3
30Example
- How many different arrangements can be made of
the letters statistics? - N10, n13 s, n23 t, n31 a, n42 i, n51 c
31COUNTING
- Ordered, without replacement
- Ordered, with replacement
- 3. Unordered, without replacement
- 4. Unordered, with replacement
(e.g. picking the first 3 winners of a
competition)
(e.g. tossing a coin and observing a Head in the
k th toss)
(e.g. 6/49 lottery)
(e.g. picking up red balls from an urn that has
both red and green balls putting them back)
32PERMUTATIONS
- Any ordered sequence of r objects taken from a
set of n distinct objects is called a permutation
of size r of the objects.
33COMBINATION
- Given a set of n distinct objects, any unordered
subset of size r of the objects is called a
combination.
Properties
34COUNTING
Number of possible arrangements of size r from n objects Number of possible arrangements of size r from n objects
Without Replacement With Replacement
Ordered
Unordered
35EXAMPLE
- How many different ways can we arrange 3 books
(A, B and C) in a shelf? - Order is important without replacement
- n3, r3 n!/(n-r)!3!/0!6, or
Possible number of books for 1st place in the shelf Possible number of books for 2nd place in the shelf Possible number of books for 3rd place in the shelf
3 x 2 x 1
36EXAMPLE, cont.
- How many different ways can we arrange 3 books
(A, B and C) in a shelf? - 1st book 2nd book 3rd book
A
B
C
C
B
A
B
C
C
A
C
A
B
A
B
37EXAMPLE
- Lotto games Suppose that you pick 6 numbers out
of 49 - What is the number of possible choices
- If the order does not matter and no repetition is
allowed? - If the order matters and no repetition is
allowed?