Anna Bombak, Chuck Humphrey, Lindsay Johnston and Leah Vanderjagt - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Anna Bombak, Chuck Humphrey, Lindsay Johnston and Leah Vanderjagt

Description:

Non-official statistics. Day 1. Day 2. Day 3. Introductions: your backgrounds ... You are equally split between non-academic and academic libraries. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 41
Provided by: datalibLi
Category:

less

Transcript and Presenter's Notes

Title: Anna Bombak, Chuck Humphrey, Lindsay Johnston and Leah Vanderjagt


1
The Winter Institute on Statistical Literacy for
Librarians
  • Demystifying statistics for the practitioner

2
Outline
  • Introductions
  • Statistics and data what are we talking about?
  • Definitions, standards and metadata
  • Official statistics national
  • Official statistics international
  • Census geography and small area statistics
  • Non-official statistics

Day 1
Day 2
Day 3
3
Introductions your backgrounds
  • Please introduce yourself
  • Your name
  • Your institutional affiliation
  • Your librarian responsibilities
  • Is there anything in particular that you are
    hoping in covered this workshop?

4
Introductions your backgrounds
  • You are equally split between non-academic and
    academic libraries.
  • The largest group, with 13, is from universities
    other than the U of A.
  • The second largest group, with 10, is from
    government libraries.

5
Introductions your backgrounds
  • Geographically, 21 of you are from Alberta and
    nine are from other provinces.
  • We have representa-tion from Ontario, Manitoba,
    Saskatch-ewan and Alberta.
  • Thirteen are from the Edmonton region.

6
Statistics what are we talking about
7
Statistics are ubiquitous
  • Statistics are generated today about nearly
    every activity on the planet. Never before have
    we had so much statistical information about the
    world in which we live. Why is this type of
    information so abundant? For one thing,
    statistics have become a form of currency in
    todays information society. Through computing
    technology, society has become very proficient in
    calculating statistics from the vast quantities
    of data that are collected. As a result, our
    lives involve daily transactions revolving around
    some use of statistical information.

Data Basics, page 1.1
8
Numeric information
  • Statistics
  • numeric facts/figures
  • created from data, i.e, already processed
  • presentation-ready
  • Data
  • numeric files created and organized for
    analysis/processing
  • requires processing
  • not display-ready

9
Numeric information
Geography Region Time Periods Unit of
Observation Attributes Smokers Education Age S
ex
The cells in the table are the number
of estimated smokers.
10
Statistics are about definitions!
Definitions Sex Total Male Female Periods 1994
-1995 1996-1997
11
Statistics are about definitions!
Some definitions are based on standards while
others are based on convention or practice. For
example, Standard Geography classifications
12
(No Transcript)
13
Numeric information
14
Stories are told through statistics
  • The National Population Survey in the previous
    example had over 80,000 respondents in 1996-97
    sample and the Canadian Community Health Survey
    in 2005 has over 130,000 cases. How do we tell
    the stories about each of these respondents?
  • We create summaries of these life experiences
    using statistics.

15
Summary
  • Statistics are derived from observational,
    experimental or simulated data .
  • A table is a format for displaying statistics and
    presents a summary or one view of the data.
  • Tables are structured around geography, time and
    attributes of the unit of observation.
  • Statistics are dependent on definitions.
  • Statistics summarize individual stories into
    common or general stories.

16
Methods producing data
17
Methods producing data
  • A particular discipline or field will tend to be
    dominated by one of these three methods, although
    outputs may also exist from the other two
    methods.
  • Consequently, the knowledge disseminated within a
    field is often fairly homogeneous in how
    statistical information is used and reported.
  • Knowing this and the life cycle in which
    statistics are produced can help in the search
    for statistics.

18
Life cycle of survey statistics
19
Life cycle of survey statistics
20
Life cycle applied to health statistics
Health Information Roadmap Initiative
21
Life cycle applied to health statistics
Health Information Roadmap Initiative
22
Reconstructing statistics
  • One way to see the relationship between
    statistics and the data upon which they were
    derived is to reconstruct statistics that someone
    else has produced from data that are publicly
    accessible.

23
Reconstructing statistics
1
2
9
3
Health Information Roadmap Initiative
8
4
7
5
6
24
Reconstructing statistics
  • The statistics that we will reconstruct are
    reported in Health Facts from the 1994 National
    Population Health Survey, Canadian Social
    Trends, Spring 1996, pp. 24-27.
  • The steps we will follow are
  • identify the characteristics of the respondents
    in the article
  • identify the data source
  • locate these characteristics in the data
    documentation
  • find the original questions used to collect the
    data
  • retrieve the data and
  • run an analysis to reproduce the statistics.

25
The findings to be replicated
  • Page 26

26
Summary of variables identified
  • Findings apply to Canadian adults
  • Likely need age of respondents
  • Men and women
  • Look for the sex of respondents
  • Type of drinkers
  • Look for frequency of drinking or a variable
    categorizing types of drinkers
  • Age
  • Look for actual age or age in categories
  • Smokers
  • Look for smoking status

27
Identify the data source
  • Survey title is identified National Population
    Health Survey, 1994-95
  • Public-use microdata file is announced
  • Page 25 of the article

28
Locate the variables
  • Examine the data documentation for the National
    Population Health Survey, 1994-95
  • PDF version is on-line
  • Use TOC and link to Data Dictionary for Health
  • Identify the variables from their content
  • NOTE check how missing data were handled
  • Trace the variables back the questionnaire
  • Did sampling method require weighting cases?
  • NOTE in addition to the other variables, is a
    weight variable needed to adjust for the sampling
    method?

29
Retrieve and analyze the data
  • For universities subscribed to the Statistics
    Canada Data Liberation Initiative (DLI), the
    public use microdata from the NPHS can be
    downloaded without additional cost. See the
    Statistics Canada Online Catalogue for further
    cost details.
  • Make use of local data services to retrieve data
    from the NPHS.

30
Lessons from the NPHS example
  • This example demonstrates the distinction between
    creating statistics and interpreting statistics
    that have been created by others.
  • This is an important distinction because
  • Choices are made in creating statistics.
  • Interpreting statistics requires an ability to
    understand the choices that were made.
  • Searching for statistics that others have created
    can be facilitated by understanding these points.

31
Provide a different perspective
  • Building on the previous example using the NPHS,
    compare the statistics from an article about
    young adults giving and receiving help to their
    parents age cohort.

32
Statistics are about definitions
33
Statistics are about definitions
34
Statistics are about definitions
  • Look at the Census definitions
  • Definitions are in the Census Handbook (2001) and
    the Census Dictionary (2006)
  • Search by Census Variable under Topic-Based
    Tabulations (2006) for value categorizations
  • Look at some standard classifications used in
    statistics
  • SIC, NAICS, NOC, Standard Classification of Goods
    (SCG), Standard Geographic Classification (SGC),
    Classification of Instructional Programs (CIP),
    ICD10

35
Statistics in the News
  • Three recent newspaper articles that include
    statistics in them have been selected for this
    exercise. For each of the articles, answer the
    following questions.
  • What is the concept represented by the statistic
    or statistics in this story?
  • Is a definition for this concept provided? If it
    is, what is it? Or is the definition implicit?
  • Are any classifications identifiable? What are
    they?
  • Are the data from which this statistic was
    derived identified in the article?

36
Metadata for describing tables
  • As we have discussed, tables are a typical
    display format for statistics. Because tables
    are often published within an article, they dont
    get indexed. Therefore, to find published tables
    requires a connection between characteristics in
    the table with other indexed content.
  • Two indices of tables that exist are Statistical
    Universe and Tablebase. They use traditional
    elements to index tables without defining unique
    properties of tables.

37
Metadata for describing tables
  • What are the properties of a table that we might
    use to develop useful descriptors for describing
    their content?
  • What is the motivation for doing this exercise?
  • Searching for tables that were indexed using such
    descriptors would allow finding statistics much
    easier.
  • The movement toward open access journals and
    publishing lends an opportunity to introduce
    metadata elements for statistical tables.
  • Once we have statistical tables described more
    comprehensively, opportunities will exist to link
    tables to the data sources from which the
    statistics in the table were derived.

38
Title
Variables Average Tuition Discipline Academic
Year Province
Statistical Metric Dollars
Footnote
Producer
Date
39
What are the metadata characteristics of tables
graphs?
  • Is a title provided?
  • Is an author, producer or agency identifiable?
  • Is there a date of creation or publication?
  • What is the entity that has been observed to make
    this statistic? That is, what is the unit of
    observation?
  • Are the characteristics of the unit observation
    (i.e., variables) and their categories clearly
    identified and defined?
  • Is there a key to explain the use of colours or
    lines in the graph?
  • Is the type of statistic clearly identified?
    That is, does the table or graph contain
    percentages, counts, averages, etc.?
  • Is there a scale for the numbers presented in the
    table or graph?
  • Is there an overall figure or number (N)
    presented upon which the table or graph was
    calculated?
  • Are there footnotes?
  • Are geography, time and social content clearly
    expressed in the table or graph?

40
Summary
  • If statistical tables and graphs were described
    and indexed by rich metadata, our ability to
    locate statistics would be greatly enhanced.
  • In the absence of such metadata, we use elements
    of this metadata structure to search our existing
    databases.
  • The next generation of metadata in the field of
    data will work to integrate the description of
    both data and statistics.
Write a Comment
User Comments (0)
About PowerShow.com