Title: STA 4322STA5328 Introduction to Statistical Theory
1STA 4322/STA5328 Introduction to Statistical
Theory
2A key question of statistics
- Key questions easy to understand, not easy to
answer without the knowledge of a special field. - Example 1. A key question of mathematics Three
angles of a triangle - Example 2 A key question of calculus The volume
of a ball is 4?r3/3.
3A key question of chemistry Why O2, not O?
- A key question in physics What Newton saw the
law of motion that people had not seen before
him? - Speed (velocity) is a vector
- A key question in computer science How would
electricity in wires understand sentences such as
IF A?B then C2, otherwise C4?
4Key Concept in StatisticsNatural cure rate of a
disease 50A drug is invented and to be tested.
Where does the no jump to yes?
5There is no 100 correct statistical decision
Risk Risk of making a wrong decision Accidental
death rate 10-6/day in USA
How many patients should we recruit in the
beginning?
6Another key question
- Suppose I wish to know the percentage of voters
who support public health insurance. - I have no hypothesis to test, but I am interested
in estimating this proportion. - Suppose I asked 100 randomly selected persons and
the yes answer was 65. Or suppose I asked 1000
persons the answer was 650. In both cases, my
answer would be 65, but I know their accuracies
are different, but by how much? Do we need a
large sample to make the estimate even more
accurate?
7Beyond cure rates
- Survival time improved by a drug
- Patient difference in age, gender, tumor size
and/or genetic markers. - Cure rate in medicine affected rate in plants,
accident death rate in car insurance - Survival time in medicine fruit weight in
plants, accident payment in car insurance
8The Purpose of Statistics
- To make inference about unknown quantities from
samples of data - For example You want to know something about
the age distribution of graduate students at the
University of Florida. How many ages are lt22,
lt23, lt24, etc? Or, what is the average age?
9Populations and Samples
- In either case you want information about the set
of ages of all UF graduate students. These ages
would be the population of interest. - If it is infeasible to get the ages of all UF
graduate students, i.e., you cannot observe the
entire population, you may get ages of a subset
of the population. The subset is called a
sample. Then, you use the data in the sample to
estimate what you want to know about the
population.
10Examples of Populations
- Amounts of grapefruit on all trees in Florida
- Serum zinc levels in dogs in Gainesville area
- Strengths of concrete from given mix of sand,
cement and gravel
11Samples
- Amounts of grapefruit on trees in plots drawn
from the state of Florida - Serum zinc levels in dogs entering UF College of
Veterinary Medicine Small Animal Clinic - Measurements from samples of concrete with known
ingredients in concrete mix
12Applications of StatisticsKey When there are
uncertainty in response
- Effectiveness of new drugs or treatments
- DNA evidence in court
- Estimating the bowhead whale population
- Corn yield by different fertilizers
- Quality control of light bulbs
- Public opinion by polls
13Key Exercise
- If someone asks you what is statistics, can you
point out a key question to him/her?
14Successful stories of polls
1992 US Presidential election predictions
Source, from newspaper a few days before the
election.
15More on polls
Source Nov. 5 (Election day morning) USA Today
Both 2000 and 2004, the candidates (Bush vs Gore,
Kerry) were too close to call (within ?3). The
actual results showed the same.
It is difficult to reduce ?3 by sample size
alone. From mathematics to practice Random
sample, mind change, not telling mind
16The next two elections, 2000 (Bush vs Gore) and
2004 (Bush vs Kerry) were too close to call
before the election. The final results confirmed
this fact. Now the 2008 election.
- This map was drawn by the New York Times 3 - 1
day before the election. All the state
projections were correct. Toss-up states were
extremely close. - It also predicted that Obama would get 52?2 and
McCain 41?2 with 7 undecided. - The actual result is Obama 52.5 and McCain 46.
- The total number of votes was 124,471,000.
17Solution to the key questionWhat you need to
know beforehand?
- What risk you can take on a wrong claim (to claim
ineffective drug as effective). - What do you considered as a good drug that need
to be detected with high probability. - Let the first answer to be a0.05
- Let the second answer to be if the cure rate
becomes larger than 0.6 (p1), I want at least 0.9
(1-ß) probability to detected.
18Danger of treatment based on screening (I)
- Source New England Journal of Medicine, Sep. 12,
2002, pp. 781-789. - Randomized clinical trials in early prostate
cancer, Radical prostatectomy group (n347),
watchful waiting (n348).(Duration 1989-1999,
median follow-up time 6.2 years)
- It is obvious that there were less death due to
prostate cancer in the surgical group, because
the prostate had been removed. To claim
effectiveness based on 6253 is unreasonable. - No expense and quality of life change is
reflected in this table.
19Danger of treatment based on screening (II)
- Source The lancet, 2000, 355 129-43. The
lancet, 2001, 358 1340-42. - Randomized clinical trials in mammography for
breast cancer. - Malmö (Sweden) study (1988- 97screened
21,088 control 21,195) - Canada study (1981 97 screened 44,925
control 44,910)
20The solution (1)
21The solution (2)
22The solution (3)
23Solution to another key question
- 65 yes in a sample of 100, we feel the real
percentage is 65. - 650 yes in a sample of 1000, we feel the real
percentage is 65. - Which one is more accurate?
- Idea In a single observation, we do not know,
but on the whole, the large sample gives a more
accurate estimate. - How to quantitative this concept?
- Let Y be the yes answers in a sample of size n
and the true proportion of yes answers in the
population is p.
24(No Transcript)
25(No Transcript)