P1252122010NtFbp - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

P1252122010NtFbp

Description:

Mix van 2e jaars en schakelaars. Presentielijst ... Rooster. Website cursus: Boek. Literatuur: Mario Triola: Essentials of Statistics, 3rd edition ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 84
Provided by: wouterja
Category:

less

Transcript and Presenter's Notes

Title: P1252122010NtFbp


1
STATISTICS
Essentials of
Statistiek voor Informatiekunde
docent Frits de Vries assistent Joris Stegeman
MARIO F. TRIOLA
EDITION
3rd
2
Programma vandaag
  • 1e uur
  • Welkom en kennismaking
  • Organisatie en opzet van het onderwijs
  • 2e uur
  • Waarom statistiek?
  • Vooruitblik op de stof hfst. 1,2 en 3

3
1. Welkom en kennismaking
  • Docent en assistent
  • Mix van 2e jaars en schakelaars
  • Presentielijst
  • Huishoudelijke mededeling oekaze CvB

4
2. Organisatie en opzet (1)
  • Werkgroepen - presentielijst
  • Website cursus
  • Introductie
  • Literatuur
  • Beoordeling en deadlines
  • Links
  • Proeftentamen
  • Rooster

5
Website cursus
6
Boek
  • Literatuur
  • Mario Triola
  • Essentials of Statistics, 3rd edition
  • Addison-Wesley Higher Education, 2007

7
2. Organisatie en opzet (2)
  • Website cursus (vervolg)
  • Regels !
  • Schema van de oefeningen
  • Tentamenstof
  • Opdrachten week 2 hfst 1,2 en 3
  • Boek kopie stof 1,2 en 3

8
Organisatie
  • Geen hoorcolleges
  • vragenuur op basis van ingediende vragen
  • heel veel oefenmateriaal
  • Verplichte werkcolleges
  • Het maken van opgaven is essentieel en daarom
    verplicht.
  • Steeds de uitwerkingen van de aangegeven
    exercises voorafgaand aan het werkcollege
    inleveren in 2-voud.
  • Werkgroepen begeleiding
  • groep 1 woensdag 1
  • groep 2 woensdag 2,
  • groep 3 vrijdag
  • Computerpracticum?

9
3. Waarom statistiek?
  • Lezen en schrijven artikelen vakgebied IK
  • Voorbeeld artikel MIS Quarterly
  • Lezen en schrijven in het dagelijks leven
  • Voorbeeld tabel actiecommitee in de buurt
  • Baisvoorwaarde logisch denken en redeneren
  • Voorbeeld het Monty Hall-probleem
  • Voorbeeld Doping gebruik

10
Tabel (1) artikel MIS Quarterly
11
Tabel (2) artikel MIS Quarterly
12
Tabel buurtcomité
13
Intuitie is moeilijk
  • Quiz hoofdprijs U mag kiezen uit 3 deurenU
    kiest een deur Welke kans heeft U op de
    hoofdprijs?
  • 1/3

14
Maar
  • Stel de quizmaster opent NA UW KEUZE een van de
    twee overgebleven deuren en laat zien dat daar
    niets in zit.
  • U mag nu nog van deur wisselen.
  • Doet U dit?
  • Ja !! want dit vergroot Uw kans !!!

15
Analyse
  • Stel de hoofdprijs zit achter deur 1
  • U koos deur 1 (auto). De quizmaster opent een
    andere deur waarachter niets staat. Ruilen levert
    verlies op
  • U koos deur 2 (leeg). De quizmaster opent deur 3
    waarachter niets staat. Ruilen levert hoofdprijs!
  • U koos deur 3 (leeg). De quizmaster opent deur 2
    waarachter niets staat. Ruilen levert hoofdprijs!

16
pauze
17
Triola, hoofdstuk 1
  • Belangrijke definities voor gebruik bij de
    statistiek

18
Sektie 1.1 Belangrijke definities
  • Data
  • Statistiek
  • Populatie
  • Census
  • Steekproef

19
Definitie Statistiek
  • a collection of methods for- planning studies
    and experiments,- obtaining data, - and then
    organizing, summarizing, presenting, analyzing,
    interpreting, - and drawing conclusions based on
    the data

20
Chapter Key Concepts
  • Sample data must be collected in an
    appropriate way, such as through a process
    of random selection.
  • If sample data are not collected in an
    appropriate way, the data may be so completely
    useless that no amount of statistical torturing
    can salvage them.

21
Sektie 1.2 Data typen
  • Definities
  • Populatie parameter versus steekproef statistic
  • Kwantitatieve versus kwalitatieve data
  • Discrete versus continue data
  • Meetnivos nominaal, ordinaal, interval, ratio

22
Levels of Measurement
  1. Nominal - categories only
  2. Ordinal - categories with some order
  3. Interval - differences but no natural starting
    point
  4. Ratio - differences and a natural starting point

23
Sektie 1.3 Kritisch denken
  • Misbruik, ondeskundig gebruik, verkeerd gebruik
    van de statistiek

24
Misuse 1- Bad Samples
  • Voluntary response sample
  • (or self-selected sample)- one in which the
    respondents themselves decide whether to be
    included. In this case, valid conclusions can be
    made only about the specific group of people who
    agree to participate.

25
Misuse 2- Small Samples
Conclusions should not be based on samples that
are far too small. Example Basing a school
suspension rate on a sample of only three students
26
Misuse 3- Graphs
To correctly interpret a graph, you must analyze
the numerical information given in the graph, so
as not to be misled by the graphs shape.
27
Misuse 4- Pictographs
Part (b) is designed to exaggerate the difference
by increasing each dimension in proportion to the
actual amounts of oil consumption.
28
Misuse 5- Percentages
Misleading or unclear percentages are sometimes
used. For example, if you take 100 of a
quantity, you take it all. 110 of an effort
does not make sense.
29
Other Misuses of Statistics
  • Loaded Questions
  • Order of Questions
  • Refusals
  • Correlation Causality
  • Self Interest Study
  • Precise Numbers
  • Partial Pictures
  • Deliberate Distortions

30
Sektie 1.4 Ontwerp van het experiment
  • Soorten studies
  • Observationeel
  • Experimenteel
  • Retrospectief
  • Prospectief (longitudinaal, cohort)

31
Definition
  • Confounding
  • occurs in an experiment when the experimenter
    is not able to distinguish between the effects
    of different factors

32
Controlling Effects of Variables
  • Blinding
  • subject does not know he or she is receiving a
    treatment or placebo
  • Blocks
  • groups of subjects with similar characteristics
  • Completely Randomized Experimental Design
  • subjects are put into blocks through a process
    of random selection
  • Rigorously Controlled Design
  • subjects are very carefully chosen

33
steekproeven
34
Definitions
  • Random Sample
  • members of the population are selected in such
    a way that each individual member has an equal
    chance of being selected
  • Simple Random Sample (of size n)
  • subjects selected in such a way that
    every possible sample of the same size n has the
    same chance of being chosen

35
Methods of Sampling
  • Random
  • Systematic
  • Convenience
  • Stratified
  • Cluster

36
Saunders-hfst 6
37
Triola, hoofdstuk 2
  • Statistiek voor het samenvatten en weergeven van
    data

38
Sektie 2.1 Overview Important Characteristics of
DataCVDOT
  • 1. Center A representative or average value
    that indicates where the middle of the data set
    is located.2. Variation A measure of the
    amount that the values vary among themselves. 3.
    Distribution The nature or shape of the
    distribution of data (such as bell-shaped,
    uniform, or skewed).4. Outliers Sample values
    that lie very far away from the vast majority of
    other sample values.5. Time Changing
    characteristics of the data over time.

39
Sektie 2.2 Frequentieverdelingen
  • Gewone (rechte) telling van waarden in een tabel
  • Samenvoegen van waarden in categorieën (classes)

40
Frequency Distribution Ages of Best Actresses
Frequency Distribution
Original Data
41
Samenhangende definities
  • Lower class limits
  • Upper class limits
  • Class boundaries
  • Class midpoints
  • Class width
  • Relatieve frequenties
  • Cumulatieve frequenties
  • (cumulatieve percentages)

42
Frequency Tables
43
Sektie 2.3 Histogrammen
  • Grafische weergave van verdelingen

44
Histogram
A bar graph in which the horizontal scale
represents the classes of data values and the
vertical scale represents the frequencies
45
Relative Frequency Histogram
Has the same shape and horizontal scale as a
histogram, but the vertical scale is marked with
relative frequencies instead of actual frequencies
46
Critical ThinkingInterpreting Histograms
One key characteristic of a normal distribution
is that it has a bell shape. The histogram
below illustrates this.
47
Sektie 2.4 Statistical graphics
  • Andere vormen van visuele weergave
  • Polygon
  • Ogive
  • Dot plot
  • Stemplot
  • Pareto chart
  • Pie chart
  • Scatter plot
  • Time series

48
Ogive
A line graph that depicts cumulative frequencies
Insert figure 2-6 from page 58
49
Dot Plot
Consists of a graph in which each data value is
plotted as a point (or dot) along a scale of
values
50
Other Graphs
51
Triola, hoofdstuk 3
  • Statistiek voor het beschrijven, verkennen en
    vergelijken van data

52
Sektie 3.1 Overzicht
  • Descriptive Statistics
  • summarize or describe the important
    characteristics of a known set of data
  • Inferential Statistics
  • use sample data to make inferences (or
    generalizations) about a population

53
Sektie 3.2 Centrummaten
  • Gemiddelde (mean)
  • Van steekproef en van populatie (mu)
  • Mediaan (x-tilde)
  • Modus
  • Midrange
  • Gewogen gemiddelde

54
Notation
is pronounced x-bar and denotes the mean of a
set of sample values
  • µ is pronounced mu and denotes the mean of all
    values in a population

55
Round-off Rule for Measures of Center
  • Carry one more decimal place than is present in
    the original set of values.

56
Mean from a Frequency Distribution
  • use class midpoint of classes for variable x

57
Best Measure of Center
58
Skewness
59
Sektie 3.3 Variatiematen
  • Range
  • Standaard deviatie
  • steekproef en populatie (sigma)
  • Variantie
  • Variatiecoëfficiënt (CV)

60
Key Concept
Because this section introduces the concept of
variation, which is something so important in
statistics, this is one of the most important
sections in the entire book.
Place a high priority on how to interpret values
of standard deviation.
61
Definition
The standard deviation of a set of sample values
is a measure of variation of values about the
mean.
62
Sample Standard Deviation Formula
63
Rationale for using n-1 versus n
The end of Section 3-3 has a detailed explanation
of why n 1 rather than n is used. The student
should study it carefully.
64
Standard Deviation - Important Properties
  • The standard deviation is a measure of
    variation of all values from the mean.
  • The value of the standard deviation s is
    usually positive.
  • The value of the standard deviation s can
    increase dramatically with the inclusion of one
    or more outliers (data values far away from all
    others).
  • The units of the standard deviation s are the
    same as the units of the
    original data values.

65
Population Standard Deviation

? (x - µ)
2
?
N
This formula is similar to the previous formula,
but instead, the population mean and population
size are used.
66
Variance - Notation
standard deviation squared
s ??

2
Sample variance
Notation
2
Population variance
67
Estimation of Standard Deviation Range Rule of
Thumb
For estimating a value of the standard deviation
s, Use Where range (maximum value) (minimum
value)
68
Estimation of Standard Deviation Range Rule of
Thumb
For interpreting a known value of the standard
deviation s, find rough estimates of the minimum
and maximum usual sample values by using
69
The Empirical Rule
70
Definition
The coefficient of variation (or CV) for a set of
sample or population data, expressed as a
percent, describes the standard deviation
relative to the mean.
Sample
Population
71
Sektie 3.4 Maten van relatieve afwijking
  • Z-scores
  • Quartielen
  • Percentielen

72
Key Concept
This section introduces measures that can be used
to compare values from different data sets, or to
compare values within the same data set. The
most important of these is the concept of the z
score.
73
Definition
  • z Score (or standardized value)
  • the number of standard deviations that a given
    value x is above or below the mean

74
Measures of Position z score
  • Sample

Round z to 2 decimal places
75
Interpreting Z Scores
Whenever a value is less than the mean, its
corresponding z score is negative Ordinary
values z score between 2 and 2 Unusual
Values z score lt -2 or z score gt 2
76
Quartiles
Q1, Q2, Q3 divide ranked scores into four
equal parts
77
Percentiles
Just as there are three quartiles separating data
into four parts, there are 99 percentiles denoted
P1, P2, . . . P99, which partition the data into
100 groups.
78
Sektie 3.5 EDA
  • Uitbijters (outliers)
  • Boxplot

79
Important Principles
  • An outlier can have a dramatic effect on the
    mean.
  • An outlier can have a dramatic effect on the
    standard deviation.
  • An outlier can have a dramatic effect on the
    scale of the histogram so that the true nature
    of the distribution is totally obscured.

80
Definitions
  • For a set of data, the 5-number summary consists
    of the minimum value the first quartile Q1 the
    median (or second quartile Q2) the third
    quartile, Q3 and the maximum value.
  • A boxplot ( or box-and-whisker-diagram) is a
    graph of a data set that consists of a line
    extending from the minimum value to the maximum
    value, and a box with lines drawn at the first
    quartile, Q1 the median and the third quartile,
    Q3.

81
Boxplots
82
Boxplots - cont
83
Einde vooruitblik 1,2 en 3
Write a Comment
User Comments (0)
About PowerShow.com