Title: Anna Bombak, Chuck Humphrey, Lindsay Johnston and Leah Vanderjagt
1The Winter Institute on Statistical Literacy for
Librarians
- Demystifying statistics for the practitioner
2Outline
- Introductions
- Statistics and data what are we talking about?
- Definitions, standards and metadata
- Official statistics national
- Official statistics international
- Census geography and small area statistics
- Non-official statistics
Day 1
Day 2
Day 3
3Introductions your backgrounds
- Please introduce yourself
- Your name
- Your institutional affiliation
- Your librarian responsibilities
- Is there anything in particular that you are
hoping in covered this workshop?
4Introductions your backgrounds
- You are equally split between non-academic and
academic libraries. - The largest group, with 13, is from universities
other than the U of A. - The second largest group, with 10, is from
government libraries.
5Introductions your backgrounds
- Geographically, 21 of you are from Alberta and
nine are from other provinces. - We have representa-tion from Ontario, Manitoba,
Saskatch-ewan and Alberta. - Thirteen are from the Edmonton region.
6Statistics what are we talking about
7Statistics are ubiquitous
- Statistics are generated today about nearly
every activity on the planet. Never before have
we had so much statistical information about the
world in which we live. Why is this type of
information so abundant? For one thing,
statistics have become a form of currency in
todays information society. Through computing
technology, society has become very proficient in
calculating statistics from the vast quantities
of data that are collected. As a result, our
lives involve daily transactions revolving around
some use of statistical information.
Data Basics, page 1.1
8Numeric information
- Statistics
- numeric facts/figures
- created from data, i.e, already processed
- presentation-ready
- Data
- numeric files created and organized for
analysis/processing - requires processing
- not display-ready
9Numeric information
Geography Region Time Periods Unit of
Observation Attributes Smokers Education Age S
ex
The cells in the table are the number
of estimated smokers.
10Statistics are about definitions!
Definitions Sex Total Male Female Periods 1994
-1995 1996-1997
11Statistics are about definitions!
Some definitions are based on standards while
others are based on convention or practice. For
example, Standard Geography classifications
12(No Transcript)
13Numeric information
14Stories are told through statistics
- The National Population Survey in the previous
example had over 80,000 respondents in 1996-97
sample and the Canadian Community Health Survey
in 2005 has over 130,000 cases. How do we tell
the stories about each of these respondents? - We create summaries of these life experiences
using statistics.
15Summary
- Statistics are derived from observational,
experimental or simulated data . - A table is a format for displaying statistics and
presents a summary or one view of the data. - Tables are structured around geography, time and
attributes of the unit of observation. - Statistics are dependent on definitions.
- Statistics summarize individual stories into
common or general stories.
16Methods producing data
17Methods producing data
- A particular discipline or field will tend to be
dominated by one of these three methods, although
outputs may also exist from the other two
methods. - Consequently, the knowledge disseminated within a
field is often fairly homogeneous in how
statistical information is used and reported. - Knowing this and the life cycle in which
statistics are produced can help in the search
for statistics.
18Life cycle of survey statistics
19Life cycle of survey statistics
20Life cycle applied to health statistics
Health Information Roadmap Initiative
21Life cycle applied to health statistics
Health Information Roadmap Initiative
22Reconstructing statistics
- One way to see the relationship between
statistics and the data upon which they were
derived is to reconstruct statistics that someone
else has produced from data that are publicly
accessible.
23Reconstructing statistics
1
2
9
3
Health Information Roadmap Initiative
8
4
7
5
6
24Reconstructing statistics
- The statistics that we will reconstruct are
reported in Health Facts from the 1994 National
Population Health Survey, Canadian Social
Trends, Spring 1996, pp. 24-27. - The steps we will follow are
- identify the characteristics of the respondents
in the article - identify the data source
- locate these characteristics in the data
documentation - find the original questions used to collect the
data - retrieve the data and
- run an analysis to reproduce the statistics.
25The findings to be replicated
26Summary of variables identified
- Findings apply to Canadian adults
- Likely need age of respondents
- Men and women
- Look for the sex of respondents
- Type of drinkers
- Look for frequency of drinking or a variable
categorizing types of drinkers - Age
- Look for actual age or age in categories
- Smokers
- Look for smoking status
27Identify the data source
- Survey title is identified National Population
Health Survey, 1994-95 - Public-use microdata file is announced
- Page 25 of the article
28Locate the variables
- Examine the data documentation for the National
Population Health Survey, 1994-95 - PDF version is on-line
- Use TOC and link to Data Dictionary for Health
- Identify the variables from their content
- NOTE check how missing data were handled
- Trace the variables back the questionnaire
- Did sampling method require weighting cases?
- NOTE in addition to the other variables, is a
weight variable needed to adjust for the sampling
method?
29Retrieve and analyze the data
- For universities subscribed to the Statistics
Canada Data Liberation Initiative (DLI), the
public use microdata from the NPHS can be
downloaded without additional cost. See the
Statistics Canada Online Catalogue for further
cost details. - Make use of local data services to retrieve data
from the NPHS.
30Lessons from the NPHS example
- This example demonstrates the distinction between
creating statistics and interpreting statistics
that have been created by others. - This is an important distinction because
- Choices are made in creating statistics.
- Interpreting statistics requires an ability to
understand the choices that were made. - Searching for statistics that others have created
can be facilitated by understanding these points.
31Provide a different perspective
- Building on the previous example using the NPHS,
compare the statistics from an article about
young adults giving and receiving help to their
parents age cohort.
32Statistics are about definitions
33Statistics are about definitions
34Statistics are about definitions
- Look at the Census definitions
- Definitions are in the Census Handbook (2001) and
the Census Dictionary (2006) - Search by Census Variable under Topic-Based
Tabulations (2006) for value categorizations - Look at some standard classifications used in
statistics - SIC, NAICS, NOC, Standard Classification of Goods
(SCG), Standard Geographic Classification (SGC),
Classification of Instructional Programs (CIP),
ICD10
35Statistics in the News
- Three recent newspaper articles that include
statistics in them have been selected for this
exercise. For each of the articles, answer the
following questions. - What is the concept represented by the statistic
or statistics in this story? - Is a definition for this concept provided? If it
is, what is it? Or is the definition implicit? - Are any classifications identifiable? What are
they? - Are the data from which this statistic was
derived identified in the article?
36Metadata for describing tables
- As we have discussed, tables are a typical
display format for statistics. Because tables
are often published within an article, they dont
get indexed. Therefore, to find published tables
requires a connection between characteristics in
the table with other indexed content. - Two indices of tables that exist are Statistical
Universe and Tablebase. They use traditional
elements to index tables without defining unique
properties of tables.
37Metadata for describing tables
- What are the properties of a table that we might
use to develop useful descriptors for describing
their content? - What is the motivation for doing this exercise?
- Searching for tables that were indexed using such
descriptors would allow finding statistics much
easier. - The movement toward open access journals and
publishing lends an opportunity to introduce
metadata elements for statistical tables. - Once we have statistical tables described more
comprehensively, opportunities will exist to link
tables to the data sources from which the
statistics in the table were derived.
38Title
Variables Average Tuition Discipline Academic
Year Province
Statistical Metric Dollars
Footnote
Producer
Date
39What are the metadata characteristics of tables
graphs?
- Is a title provided?
- Is an author, producer or agency identifiable?
- Is there a date of creation or publication?
- What is the entity that has been observed to make
this statistic? That is, what is the unit of
observation? - Are the characteristics of the unit observation
(i.e., variables) and their categories clearly
identified and defined? - Is there a key to explain the use of colours or
lines in the graph?
- Is the type of statistic clearly identified?
That is, does the table or graph contain
percentages, counts, averages, etc.? - Is there a scale for the numbers presented in the
table or graph? - Is there an overall figure or number (N)
presented upon which the table or graph was
calculated? - Are there footnotes?
- Are geography, time and social content clearly
expressed in the table or graph?
40Summary
- If statistical tables and graphs were described
and indexed by rich metadata, our ability to
locate statistics would be greatly enhanced. - In the absence of such metadata, we use elements
of this metadata structure to search our existing
databases. - The next generation of metadata in the field of
data will work to integrate the description of
both data and statistics.