ARCH 21266126 - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

ARCH 21266126

Description:

Data are organized, summarized, analysed and results presented ... Ranked/ordered/ordinal variables. Numerical/quantitative/metric variables ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 31
Provided by: anu9
Category:
Tags: arch | ordinal

less

Transcript and Presenter's Notes

Title: ARCH 21266126


1
ARCH 2126/6126
  • Session 2a Measurement

2
To recap, the purpose of statistics..
  • To provide insight into situations and problems
    by means of numbers
  • How is this provided?
  • Data are available or are collected
  • Data are organized, summarized, analysed and
    results presented
  • Conclusions are drawn, in context
  • Whole process is often guided by critical
    appraisal of similar work already done

3
From variation to variables
  • Variables that can be analysed numerically are of
    several different sorts
  • Categorical/qualitative/nominal variables
  • Ranked/ordered/ordinal variables
  • Numerical/quantitative/metric variables
  • Different kinds of variables allow different
    kinds of numerical analysis
  • This applies to the method of description or
    measurement, not the basic property

4
Data sets
  • Usually data do not come singly they come in,
    or are collected in, sets
  • We collect them because we want to test some idea
    fairly against them
  • E.g. we might want to test whether the stone
    artefacts from one site differ in size from stone
    artefacts from another
  • For this, we measure artefact sizes
    systematically consistently

5
What belongs in a data-set?
  • We have considered it prudent to adopt the years
    1919-1925, excluding the drought year of 1926, as
    a fair standard for the future Queensland Land
    Settlement Advisory Board, 1927
  • The tacit assumption that drought is an
    exceptional visitation to the inland country has
    shaped and infected public thought and official
    policy alike Francis Ratcliffe, 1937

6
Making a measurement
  • A variable is a measured property of a case
    measuring assigns numbers representing each
    cases value for that variable
  • Variables must be exactly defined measurements
    reliably carried out
  • Some variables are relatively simple but still
    need explicit specification, e.g. length
  • Some are more complex and/or depend on
    non-obvious definitions, e.g. unemployment

7
Measurement is never perfectly accurate but
  • Our measurement of scraper length is valid, to
    the extent that it measures what it is supposed
    to measure
  • Our measurement is reliable, to the extent that
    repetitions of the same measurement give the same
    result
  • Our measurement is unbiased, to the extent that
    it does not tend to under-state or over-state the
    true value of the variable

8
Recording a measurement
  • Rare important observations deserve recording as
    insight-giving anecdotes
  • But in many fields the bread butter of research
    are common observations where the issue is
    varying frequency
  • Importance of a recording system
  • Unsystematic recording is likely to lead to
    omissions or inconsistencies
  • Limits to the benefits of precision

9
Recording technology
  • Pen paper still have their place
  • Complex technology has its traps its
    vulnerability, your dependence
  • But early, direct or automatic data entry into
    computers can bring big benefits in efficient
    use of time labour error reduction
    cross-checks
  • Importance of duplicates back-ups

10
How much data to collect?
  • Limits to the benefit from measuring variables to
    many significant figures
  • Limits to the benefit from increasing sample size
    indefinitely
  • Limits to the benefit from increasing number of
    variables how many will you analyse?
  • Attention to limits can save lots of time
  • Limits not fixed, but depend on the situation
    under study the ideas under test

11
Spreadsheets (e.g. Excel) databases (e.g.
Access)
  • End point of data collection is often a matrix or
    table a column for each variable, a row for each
    case
  • Often convenient to enter these into a
    spreadsheet or database (linkable, searchable)
  • These can store, check, transform, calculate,
    apply conditions, select, test statistically,
    output to statpack

12
Study design experiment versus observation
  • How do we define? Variously but element of
    control often the key
  • For practical, ethical etc. reasons, experiments
    rare in our subjects
  • But experimental design important
  • Dependent variable response variable under study
  • Independent variable explanatory variable or
    factor

13
Contexts and confounds
  • Treatment a combination of specific conditions
    (levels of experimental factors)
  • Extraneous variables ones not being studied but
    which may influence dependent variable thus
    part of relevant context
  • Effects of different (independent or extraneous)
    variables are said to be confounded if they
    cannot be distinguished
  • Good study design requires data on context

14
Observational studies the risks of confounding
  • Well designed experiments minimize confounding by
    appropriate choice of variables, cases and
    treatments random sequence of treatments
    random allocation of cases to groups
  • Observational surveys lack this control
  • Groups may be self-selected
  • Differences in groups may have causes other than
    the variables under study
  • But much can be done despite limitations

15
ARCH 2126/6126
  • Session 2b Description

16
Examples of presentation
  • Even the simplest forms of stating findings
    percentages, averages and the simplest
    graphical presentations emphasize selected
    aspects
  • This can be legitimate can also be misleading
    much depends on honesty clarity with which
    procedure is described
  • What as a percentage of what?
  • Does the graph have linear scales? A zero?
  • Please bring in examples yourselves

17
How can we see patterns inherent in our data-set?
  • Start simple e.g. frequency tables
  • Frequency or
  • Relative frequency
  • Value of mental arithmetic cross-checks do
    figures make sense?
  • Note rounding errors
  • Keep an eye on sample sizes do they change?

18
Frequency absolute relative
  • Frequency of any value of a variable is the
    number of times that value is found i.e. it is a
    count, a whole number
  • Relative frequency of any value is its frequency,
    expressed as a proportion of all observations
    (often a percent)

19
Rates and ratios
  • Ratio the size of a number relative to another
    number
  • Proportion a ratio in which the second number
    includes the first
  • Percentage a proportion multiplied by 100
  • Rate a ratio of the number of events to the
    number of cases at risk of experiencing that event

20
Good to graph results but graphs can also mislead
  • Graphical depictions include line graphs, bar
    charts, histograms, pie charts, stem--leaf
    plots, scatterplots
  • Line graphs usually used to plot variable against
    time (on horizontal axis) show seasons trends
  • Different scales give different impressions, e.g
    non-zero base to vertical axis, unequal units,
    log scale

21
Bar charts and histograms
  • Bar charts compare the values of different
    variables, often categorical
  • Histograms display frequency or relative
    frequency distributions of one variable at a time
  • Width of histogram bars has meaning
  • Eyes respond to impressions of area symbols,
    unequal widths, pseudo-3D can give a misleading
    impression

22
Scatterplots
  • Show the distributions of two variables at once
    (i.e. bivariate data)
  • If one variable is independent and one dependent,
    independent goes on horizontal axis
  • Essence of any relationship between them is
    apparent visually ve or -ve, strong or weak,
    simple or complex
  • This can affect future statistical testing

23
Measures of central tendency
  • The arithmetic mean (average) add all
    observations together, divide the total by the
    number of observations
  • (Also geometric harmonic mean)
  • The median arrange all observations in order,
    find the middle one or the mid-point of the
    middle two
  • The mode find the commonest value

24
Central tendencies and distributions
  • In a normal distribution, graph is symmetrical
    mean, median mode are similar
  • But distributions may be different, e.g may be
    skewed to left or right
  • Mean is often convenient but is strongly affected
    by outliers
  • To avoid this, can use median less affected by
    outliers and skews

25
Measures of central tendency are useful but
  • If average income is above the poverty line, is
    poverty abolished?
  • If the average child is at a weight//age which is
    thought to indicate healthy growth, are all
    growing healthily?
  • If first agriculturalists diffused into Europe at
    an average rate of 1 km / year, does that imply
    rate was constant?
  • Variation is ubiquitous sample is not fully
    characterized by its mean/median

26
So we also need measures of dispersion
(variability)
  • Range maximum minimum outliers
  • Percentiles median is 50th percentile we can
    also find 25th 75th percentile (dividing sample
    into quartiles) or 20th, 40th, 60th 80th
    (quintiles) or 3rd 97th percentiles etc.
  • Interquartile range (75th percentile 25th) is
    more stable than range

27
Box plots
  • A simple box--whisker plot consists of a box
    (interquartile range) with a central line
    (median) and a further line each side of the box
    (to the extremes)
  • More elaborate versions represent outliers (gt1½ x
    box length from box) by dots not joined to the
    main whisker, far outliers (gt3 x box length)

28
Dispersion around the mean
  • Variance and standard deviation
  • Standard deviation ? variance
  • A little more complex to calculate but has some
    very useful properties
  • Main component of variance is sum of squares,
    i.e. subtract mean from each observation, square
    result, add them up then divide SoS by sample
    size - 1

29
Means, standard deviations normal
distributions
  • Normal curves are symmetric, bell-shaped, drop
    off quickly, few outliers
  • Mean median mode
  • There are many normal curves mean standard
    deviation specify shape
  • In a normal curve, point where tails flatten
    out is 1 SD from mean
  • SD/mean coefficient of variation

30
Properties of normal curve
  • 68 of observations fall within 1 SD of mean
  • 95 fall within 2 SDs of mean
  • 99.7 fall within 3 SDs of mean
  • A transformation may help to make a distribution
    approximately normal
  • A raw observation can be converted into a
    standardized (z) score, to find the probability
    of its occurrence, with mean 0 SD 1
Write a Comment
User Comments (0)
About PowerShow.com