Lecture 1: Mon, Jan 13 - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Lecture 1: Mon, Jan 13

Description:

Solutions to examples will be provided in class. All lectures can be downloaded ... campaigns have featured movie and television stars, rock videos, athletic ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 32
Provided by: str2
Category:
Tags: jan | lecture | mon | movie | stars

less

Transcript and Presenter's Notes

Title: Lecture 1: Mon, Jan 13


1
Lecture 1 Mon, Jan 13
  • Introduction/Syllabus (web page)
  • Some useful guidelines
  • Course Outline
  • Review (Ch 9,10)
  • Key Statistical Concepts
  • Sampling Distributions
  • Confidence Intervals Inference
  • JMP-IN

2
Guidelines
  • Lectures
  • Daily Reading Even Numbered Problems.
  • Always try to relate new concepts to existing
    examples
  • Solutions to examples will be provided in class.
  • All lectures can be downloaded off the course
    website. (Click on the lecture schedule and click
    on the date to obtain the lecture)
  • Print outs of lectures will not be provided by
    the instructor.
  • All necessary JMP instructions for the
    assignments will be provided in class, or
    available through the JMP help link in the course
    website.

3
Guidelines (Contd.)
  • Assignments and Exams
  • A typical exam and/or assignment will have 60-70
    straightforward material, along with 30-40 of
    harder material.
  • Each assignment will be worth 10 pts and all
    problems will be graded.
  • Total of 7-8 assignments
  • All assignments will typically be due a week from
    the date they are assigned.

4
Guidelines (Contd.)
  • All assignments and exams are cumulative in some
    sense (For example You could be asked to
    construct confidence intervals (Ch10,12,13) for
    the estimate of the slope (Ch 18,19,20)).
  • Assignments and exams not collected will be kept
    outside the instructors office.
  • The instructors office hours is primarily meant
    for addressing conceptual issues. For homework
    related questions students are encouraged to use
    the TAs office hours and Statlab hours.

5
Guidelines (Contd.)
  • JMP-IN
  • Used extensively for assignments
  • Familiarity with outputs for exams
  • Recommended JMP-IN text is good reference
  • Other general guidelines
  • Feedback on lecture style, assignments, office
    hours are encouraged.
  • Constant interaction encouraged to better
    understand material.

6
Guidelines (Contd.)
  • All re-grade requests (neatly written) should be
    handed to the instructor on or before the due
    date.
  • Communications via email should be used only in
    case of emergencies, for appointments, and short
    questions.
  • Only a tentative guideline of the exam format
    will be provided before the exams.

7
Guidelines (Contd.)
  • The final grade is determined based on the
    assignments, midterms, and the final. No other
    special work/projects can be used as
    supplements.
  • These guidelines, rules for the course, can be
    changed at any time by the instructor.

8
Guidelines (Contd.)
  • Preparation for exams
  • Work on lectures
  • The book (remember you are required to have one
    the red thing)
  • Work on assignments
  • Lastly, work on the practice exams (without
    looking at the solutions)

9
Course Outline
  • Inference (Use Sample to infer population
    characteristics)
  • Confidence Intervals
  • Tests of Hypotheses (Test a variety of claims)
  • Analysis of Variance (compare more than two
    groups)
  • Regression (Relationships among variables)
  • Simple Linear (Single independent variable)
  • Multiple Linear (More than one independent
    variable)
  • Polynomial (Models that allow powers of Indep.
    Vars.)
  • Assorted Topics
  • Chi-Squared Tests (Tests for Qualitative Data)
  • Times Series Models (Detect patterns over time)
  • Forecasting (Predict outcome for future time
    periods)

10
Key Statistical Concepts
  • Statistics the art of data analysis. Involves
    classifying, summarizing, organizing, and
    interpreting numerical information.
  • Population the set of all items of interest in a
    statistical problem.
  • Sample a subset of items in the population.
  • Descriptive Statistics a body of methods used to
    summarize and organize the characteristics of
    sample data.
  • Inferential Statistics a body of methods used to
    draw inferences about characteristics of
    populations based on sample data.

11
  • Variable characteristic or property of an
    individual item of a population or sample.
  • Observation the value assigned to a variable.
  • Parameter descriptive measure of a population.
  • Statistic descriptive measure of a sample.
  • Statistical Inference process of making an
    estimate, prediction or decision about a
    population based on information contained in a
    sample.
  • Measure of Reliability a statement about the
    degree of uncertainty.

12
Example Cola Wars
  • Cola wars is the popular term for the intense
    competition between Coca-Cola and Pepsi displayed
    in their marketing campaigns. Their campaigns
    have featured movie and television stars, rock
    videos, athletic endorsements, and claims of
    consumer preference based on taste tests.
    Suppose, as part of a Pepsi marketing campaign,
    1,000 cola consumers are given a blind taste test
    (ie, a taste test in which the two brand names
    are disguised). Each consumer is asked to state
    their gender, age and a preference for brand A or
    brand B.

13
Answers to Key Questions
  • a. Population of interest the collection or set
    of all cola consumers.
  • b. Variables of interest gender, age and cola
    preference.
  • c. Sample 1,000 cola consumers selected from the
    population of all cola consumers.
  • d. Inference of interest generalization of the
    cola preferences of the 1,000 sampled consumers
    to the population of all cola consumers. In
    particular, the preferences of the consumers in
    the sample can be used to estimate the percentage
    of all cola consumers who prefer each brand.

14
  • e. When the preferences of 1,000 consumers who
    are used to estimate the preference of all
    consumers in the region, the estimate will not
    exactly mirror the preferences of the population.
    For example, if the taste test shows that 56 of
    the 1,000 consumers chose Pepsi, it does not
    follow (nor is it likely) that exactly 56 of all
    cola drinkers in the region prefer Pepsi.
  • Nevertheless, we can use sound statistical
    reasoning to ensure that our sampling procedure
    will generate estimates that are almost certainly
    within a specified limit of the true percentage
    of all consumers who prefer Pepsi.
  • For example, such reasoning might assure us
    that the estimate of the preference for Pepsi
    from the sample is almost certainly within 5 of
    the actual population preference. The implication
    is that the actual preference for Pepsi is
    between 51 ie, (56-5) and 61 ie, (565)-
    that is, (56 5) This interval represents a
    measure of reliability for the inference.

15
Sampling Distributions
  • Two widely used formulas from Stat101 are

16
Central Limit Theorem for the Sample Mean
  • If a random sample is drawn from any
    population
  • 1) The sampling distribution of the sample
    mean is approximately normal for
    sufficiently large sample size.
  • 2) The larger the sample size, the more the
    sampling distribution of will resemble a
    normal distribution.

17
The Sampling Distribution of the Mean of Random
Variables
18
The Sampling Distribution of the Sum of Random
Variables
19
Sampling Distribution of a Proportion
  • The mean, variance and SD of p-hat are
  • So, the variable
  • is approximately a standard normal RV.

20
How large should n be?
  • In general, the sample size required to apply
    the Central Limit Theorem depends on the
    population distribution.
  • But, as a rule of thumb, many people
    (including the book) use the minimum sample size
    of n30.

21
Sampling Distribution to Statistical Inference
  • Classical inference
  • 1) Assume a value of the parameter
  • 2) Find a test statistic and its distribution
  • 3) Calculate the value of the test statistic from
    the data.
  • 4) If the probability of seeing a value at least
    as extreme as what you observed is small, then
    you can reject the hypothesized parameter value.

22
Inference for the Mean
  • In order to optimize service, a bus company is
    trying to estimate the true rate of customers
    that arrive at a particular stop within a 10
    minute period during rush hour.
  • The company collected data from 5pm to 6pm every
    day for a month (n180), and found a sample mean
    of arrivals per 10 min. Does this
    support the managers hypothesis that the true
    arrival rate is

23
  • Compute the probability of seeing an sample
    average that is more extreme than 8.1, assuming
    the population mean is 9.
  • Conclusion it is very unlikely we would have
    seen data this extreme if the true mean were 9.

24
Confidence Intervals
  • Confidence interval estimates provide a range of
    plausible values for the unknown parameter.
  • Before the experiment, the probability that the
    confidence interval will cover the true parameter
    value is
  • After the experiment, we say that, with
    confidence,
    the interval covers the true parameter value.
    Equivalently, if we repeat our experiment over
    and over, and construct 95 confidence intervals
    each time, we would expect about 95 of the
    intervals to cover the true value of the mean.

25
  • The general form for a
  • confidence interval for a parameter
  • where is a point estimate for .
  • For example a C.I. for the
    mean is given by

26
Information and Confidence Intervals
  • Small interval ? more information.
  • Larger interval ? less information.

27
Inference using Confidence Intervals
  • 1) Assume a particular value for mu.
  • 2) Collect data construct confidence interval
  • 3) If the hypothesized value of mu is not
    contained in the interval ? evidence that the
    value is incorrect.

C.I.
28
Exercise 10.24
  • A statistics professor is investigating how many
    classes university students miss each semester.
    To answer this question, she took a random sample
    of 100 students and asked them how many classes
    they had missed in the previous semester.
  • Estimate the mean number of classes missed by all
    students at the university. Use a 99 confidence
    level and assume that the population SD is known
    to be 2.2 classes.

29
  • Given
  • The 99 confidence interval is

30
Components of a Confidence Interval
UCL
LCL
width
31
Take Away
  • Be comfortable with topics from Stat101
  • Sampling Distributions
  • Confidence Intervals (sigma known)
  • Inference using sampling distributions and CI
  • Use of z-tables, t, F, Chi Squared.
  • Basic JMP-IN (opening data files, descriptive
    statistics)
  • Reading Ch 11, 12.
Write a Comment
User Comments (0)
About PowerShow.com