Secondary Data and Sources

About This Presentation

Title:

Secondary Data and Sources

Description:

... is easy to find ... Warranty Cards. Customer Registration/Sign-in. Data Mining of ... card transactions, loyalty cards, discount coupons, customer ... – PowerPoint PPT presentation

Number of Views:260

Avg rating:3.0/5.0

Slides: 49

Provided by: brianj80

Category:

more less

Transcript and Presenter's Notes

Title: Secondary Data and Sources

1
Chapter 4

Secondary Data and Sources

2
Secondary Data Defined

Secondary data is information that has been
collected previously for a purpose other than the
need at hand

3
Reasons for Secondary Research

Secondary research may solve the problem
Secondary information costs less
Supplementary uses
Defining the research problem
Planning collection of primary data
Defining the population and selecting the sample

4
Weighing evidence

There are dangers in relying entirely on
secondary data
Relevant Information pertains to the problem at
hand
Accurate Information reflects the reality
Current Information isnt dated.. Data is
perishable
Sufficient Enough information and detail
Available Information is easy to find
Measurement of Units Is it the same as in your
analysis (Sales, Profit, Employees, Sales/Sq.
Ft., Sq. Ft.)
Classification Categories50-64,999 65-79,999
80,000 and over what if you want 150,000 and
over?
Knowing how information was collected helps
determine how credible the data is.

5
Credibility of Secondary Data

What was the purpose of the study?
Is it Relevant? Biasedpolitical and civic
groups overstate information to make positions
look more/less attractive
Who collected the information?
Competence of the organization to do quality
research?
What information was collected?
How was the information collected?
Are there errors in research design, sampling,
analysis, non-response, etc., Measure reliability
and validity.
Are the findings consistent with other
studies/information?
It is inexpensive?
How does cost compare to the cost of primary data
collection.

6
Internal Sources

Internal secondary information is available
within the company
Prior research reports
Documents and databases (sales reports, warranty
information)
Key is knowing the information exists and how to
access it

7
Is Secondary Data Appropriate?

Define the Purpose
What do you want to know? Be as specific as
possible.
Industry Analysis
Analyze the structure of the market Who are the
major players?
Literature search (library/Internet)
These is the starting point for data collection.
Who has the information you need most?
Start with the most likely sources and work to
the least likely.
Share and discuss your information with other
team members.

8
External Sources

External data comes from outside sources
Huge amounts are available
Typically cover non-controllable (environmental)
factors
Market size
Competitive information
Market characteristics

9
Common External Sources

Government
Trade associations and trade press
Periodicals and professional journals
Institutions (universities)
Commercial services
Government data is the largest source
Experienced researchers rely on government and
trade sources

10
Consumer and Economic Data

Market and Consumer Information
U.S. Bureau of the Census
Bureau of Labor Statistics
County and City Databook (Census Bureau)
Demographics USA Reference Book
Lifestyle Market Analyst
Woods and Poole MSA Profile Reference Book
MediaMark Reporter
Sales and Marketing Management Survey of Buying
Power
General Economic Information
Survey of Current Business
Federal Reserve Bulletin
Statistical Abstract of the U.S.

11
Government Publications

Statistical Abstract of the United States
Demographic data from census reports
The State and Metropolitan Area Data Book
Same as above, broken down by state, county, and
metropolitan area
Census of Population and Census of Housing
Conducted every ten years
Available online
www.firstgov.gov (official government portal)
www.census.gov

12
Other Sources

Encyclopedia of Associations
Listings of trade groups, contact information,
and publications
Online databases
Periodicals indexes like EBSCO and Infotrac
Subject specific resources, like LexisNexis
The books are available in academic, public, and
corporate libraries the online databases
typically require a subscription and are often
available through university and public libraries

13
Online Sources

LEXIS-NEXIS
Business News contains magazine and newspaper
articles, broadcast transcripts.
Compare Companies shows what companies fit
certain size characteristics.
Patent searches for product information can be
found by selecting business, then patents.
Journalist Express, provides a complete listing
of
Wire services
News services
Search engines
International news

14
Financial Information

Hoovers
Financial information for publicly traded
companies.
company capsules, key competitors, and links to
SEC 10-A and 10-Q reports, links to articles on
companies, track insider trades for companies,
including who purchased/sold what
Public Registers Annual Report Service
Annual report information for over two thousand
companies.

15
Syndicated Services

Specialist research companies that generate
updated reports on a regular, ongoing bases that
they provide to a number of clients
Examples include IDC, Mediamark, ACNielsen

16
Database Marketing/Data Mining

Data mining is the process of analyzing vast
computer databases looking for patterns and
relationships between variables that will help
marketing efforts
Data explosion problem
Automated data collection tools and mature
database technology has produced huge amounts of
data stored in databases and data warehouses
Valuable data containing potentially valuable
information
Solution Data warehousing and data mining
Data warehousing and on-line analytical
processing
Extraction of interesting knowledge (rules,
regularities, patterns, constraints) from data
in large databases

17
Data Mining History

1960s
Data collection, database creation, DBMS
1970s
Relational data base management systems
1980s
Advanced RDBMA models (extended-relational,
deductive) and application-oriented DBMS
(spatial, scientific, engineering, etc.)
1990s2000s
Data mining and data warehousing, multimedia
databases, and Web databases

18
Data Mining

What is a database?
Internal database a database developed from data
within an organization.
Where does the data come from?
Sales Invoices
Salespersons Call Reports
Warranty Cards
Customer Registration/Sign-in

19
Data Mining of Internal Secondary Data

Database mining/marketing the micro marketing to
customers based on customers and potential
customers profiles and purchasing patterns.
Internal database mining of secondary data
enables firms to
evaluate sales territories
identify most and least profitable customers
identify potential market segments
identify which products, services, and segments
need the most marketing support
evaluate opportunities for offering new products
or services
identify most and least profitable products and
services
evaluate existing marketing programs

20
Database Applications

Database analysis and decision support
Market analysis and management
target marketing, customer relation management,
market basket analysis, cross selling, market
segmentation
Customer acquisition
discover attributes that predict customer
responses to mktg. programs
Customer retention
target customers who are on the verge of
switching to a competitor.
Customer abandonment
are some customers too costly to maintain?
Risk analysis and management
Forecasting, customer retention, quality control,
competitive analysis
Fraud detection and management
Other Applications
Text mining (news group, email, documents) and
Web analysis.
Intelligent query answering

21
Market Analysis Examples

Where are the data sources for analysis?
Credit card transactions, loyalty cards, discount
coupons, customer complaint calls, plus (public)
lifestyle studies
Target marketing
Find clusters of model customers who share the
same characteristics interest, income level,
spending habits, etc.
Determine customer purchasing patterns over time
Conversion of single to a joint bank account
marriage, etc.
Cross-market analysis
Associations/co-relations between product sales
Prediction based on the association information

22
Market Analysis Examples

Customer profiling
data mining can tell you what types of customers
buy what products (clustering or classification)
Identifying customer requirements
identifying the best products for different
customers
use prediction to find what factors will attract
new customers
Provides summary information
various multidimensional summary reports
statistical summary information (data central
tendency and variation)

23
Finance/Risk Examples

Finance planning and asset evaluation
cash flow analysis and prediction
contingent claim analysis to evaluate assets
cross-sectional and time series analysis
(financial-ratio, trend analysis, etc.)
Resource planning
summarize and compare the resources and spending
Competition
monitor competitors and market directions
group customers into classes and a class-based
pricing procedure
set pricing strategy in a highly competitive
market

24
Fraud Detection Examples

Applications
widely used in health care, retail, credit card
services, telecommunications (phone card fraud),
etc.
Approach
use historical data to build models of fraudulent
behavior and use data mining to help identify
similar instances
Examples
auto insurance detect a group of people who
stage accidents to collect on insurance
money laundering detect suspicious money
transactions (US Treasury's Financial Crimes
Enforcement Network)
medical insurance detect professional patients
and ring of doctors and ring of references

25
Fraud Detection Examples

Sports
IBM Advanced Scout analyzed NBA game statistics
(shots blocked, assists, and fouls) to gain
competitive advantage for New York Knicks and
Miami Heat
Astronomy
JPL and the Palomar Observatory discovered 22
quasars with the help of data mining
Internet Web Surf-Aid
IBM Surf-Aid applies data mining algorithms to
Web access logs for market-related pages to
discover customer preference and behavior pages,
analyzing effectiveness of Web marketing,
improving Web site organization, etc.

26
Case Example Bell South

Bell South data mining to eliminate least likely
to purchase
Used the attributes of existing customers to
identify, model and predict potential customers

27
The Data Mining Process
Insight and Knowledge
Pattern Evaluation
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
28
Data Mining of What Kind of Data?

Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW

29
Data Mining Analysis

Concept description Characterization and
discrimination
Generalize, summarize, and contrast data
characteristics, e.g., dry vs. wet regions
Association (correlation and causality)
Multi-dimensional vs. single-dimensional
association
age(X, 20..29) income(X, 20..29K) à buys(X,
PC) support 2, confidence 60
contains(T, computer) à contains(x, software)
1, 75

30
Data Mining Analysis

Classification and Prediction
Finding models (functions) that describe and
distinguish classes or concepts for future
prediction
E.g., classify countries based on climate, or
classify cars based on gas mileage
Presentation decision-tree, classification rule,
neural network
Prediction Predict some unknown or missing
numerical values
Cluster analysis
Class label is unknown Group data to form new
classes, e.g., cluster houses to find
distribution patterns
Clustering based on the principle maximizing the
intra-class similarity and minimizing the
interclass similarity

31
Data Mining Analysis

Outlier analysis
Outlier a data object that does not comply with
the general behavior of the data
It can be considered as noise or exception but is
quite useful in fraud detection, rare events
analysis
Trend and evolution analysis
Trend and deviation regression analysis
Sequential pattern mining, periodicity analysis
Similarity-based analysis
Other pattern-directed or statistical analyses

32
Discovering Interesting Patterns

A data mining system/query may generate thousands
of patterns, not all of them are interesting.
Suggested approach Human-centered, query-based,
focused mining
Interestingness measures A pattern is
interesting if it is easily understood by humans,
valid on new or test data with some degree of
certainty, potentially useful, novel, or
validates some hypothesis that a user seeks to
confirm
Objective vs. subjective interestingness
measures
Objective based on statistics and structures of
patterns, e.g., support, confidence, etc.
Subjective based on users belief in the data,
e.g., unexpectedness, novelty, actionability, etc.

33
Can We Find Interesting Patterns?

Find all the interesting patterns Completeness
Can a data mining system find all the interesting
patterns?
Association vs. classification vs. clustering
Search for only interesting patterns
Optimization
Can a data mining system find only the
interesting patterns?
Approaches
First general all the patterns and then filter
out the uninteresting ones.
Generate only the interesting patternsmining
query optimization

34
Major Issues in Data Mining

Mining methodology and user interaction
Mining different kinds of knowledge in databases
Interactive mining of knowledge at multiple
levels of abstraction
Incorporation of background knowledge
Data mining query languages and ad-hoc data
mining
Expression and visualization of data mining
results
Handling noise and incomplete data
Pattern evaluation the interestingness problem
Performance and scalability
Efficiency and scalability of data mining
algorithms
Parallel, distributed and incremental mining
methods

35
Major Issues in Data Mining

Issues relating to the diversity of data types
Handling relational and complex types of data
Mining information from heterogeneous databases
and global information systems (WWW)
Issues related to applications and social impacts
Application of discovered knowledge
Domain-specific data mining tools
Intelligent query answering
Process control and decision making
Integration of the discovered knowledge with
existing knowledge A knowledge fusion problem
Protection of data security, integrity, and
privacy

36
Meta-Analysis

1952 Hans J. Eysenck concluded that there were
no favorable effects of psychotherapy, starting a
raging debate
20 years of evaluation research and hundreds of
studies failed to resolve the debate
1978 To prove Eysenck wrong, Gene V. Glass
statistically aggregated the findings of 375
psychotherapy outcome studies
Glass (and colleague Smith) concluded that
psychotherapy did indeed work
Glass called his method meta-analysis

37
The Emergence of Meta-Analysis

Ideas behind meta-analysis predate Glass work by
several decades
R. A. Fisher (1944)
When a number of quite independent tests of
significance have been made, it sometimes happens
that although few or none can be claimed
individually as significant, yet the aggregate
gives an impression that the probabilities are on
the whole lower than would often have been
obtained by chance (p. 99).
Source of the idea of cumulating probability
values
W. G. Cochran (1953)
Discusses a method of averaging means across
independent studies
Laid-out much of the statistical foundation that
modern meta-analysis is built upon (e.g., inverse
variance weighting and homogeneity testing)

38
The Logic of Meta-Analysis

Traditional methods of review focus on
statistical significance testing
Significance testing is not well suited to this
task
highly dependent on sample size
null finding does not carry to same weight as a
significant finding
Meta-analysis changes the focus to the direction
and magnitude of the effects across studies
Isnt this what we are interested in anyway?
Direction and magnitude represented by the effect
size

39
When Can You Do Meta-Analysis?

Meta-analysis is applicable to collections of
research that
are empirical, rather than theoretical
produce quantitative results, rather than
qualitative findings
examine the same constructs and relationships
have findings that can be configured in a
comparable statistical form (e.g., as effect
sizes, correlation coefficients, odds-ratios,
etc.)
are comparable given the question at hand

40
Effect Size The Key to Meta-Analysis

Central Tendency Research
prevalence rates
Pre-Post Contrasts
growth rates
Group Contrasts
experimentally created groups
comparison of outcomes between treatment and
comparison groups
naturally occurring groups
comparison of spatial abilities between boys and
girls
Association Between Variables
measurement research
validity generalization
individual differences research
correlation between personality constructs

41
The Replication Continuum
You must be able to argue that the collection of
studies you are meta-analyzing examine the same
relationship. This may be at a broad level of
abstraction, such as the relationship between
criminal justice interventions and recidivism or
between school-based prevention programs and
problem behavior. Alternatively it may be at a
narrow level of abstraction and represent pure
replications. The closer to pure replications
your collection of studies, the easier it is to
argue comparability.
42
Which Studies to Include?

It is critical to have an explicit inclusion and
exclusion criteria the broader the research
domain, the more detailed they tend to become
developed iteratively as you interact with the
literature
To include or exclude low quality studies
the findings of all studies are potentially in
error (methodological quality is a continuum, not
a dichotomy)
being too restrictive may restrict ability to
generalize
being too inclusive may weaken the confidence
that can be placed in the findings
must strike a balance that is appropriate to your
research question

43
Searching for Studies to Include

Argument We only included published studies
because they have been peer-reviewed Significant
findings are more likely to be published than
non-significant findings
Critical to try to identify and retrieve all
studies that meet your eligibility criteria
Potential sources for identification of documents
computerized bibliographic databases
authors working in the research domain
conference programs
dissertations
review articles
hand searching relevant journal
government reports, bibliographies, clearinghouses

44
Strengths of Meta-Analysis

Imposes a discipline on the process of summing up
research findings
Represents findings in a more differentiated and
sophisticated manner than conventional reviews
Capable of finding relationships across studies
that are obscured in other approaches
Protects against over-interpreting differences
across studies
Can handle a large numbers of studies (this would
overwhelm traditional approaches to review)

45
Weaknesses of Meta-Analysis

Requires a good deal of effort
Mechanical aspects dont lend themselves to
capturing more qualitative distinctions between
studies
Apples and oranges comparability of studies is
often in the eye of the beholder
Most meta-analyses include blemished studies
Selection bias posses continual threat
negative and null finding studies that you were
unable to find
outcomes for which there were negative or null
findings that were not reported
Analysis of between study differences is
fundamentally correlational

46
Steps in Meta-Analysis

Define the research question and specific
hypotheses
Define the criteria for including and excluding
studies
Study designs (randomized vs. observational)
Publication and date thereof (vs. unpublished)
Language of publication
Multiple publications from the same sample
Sample size (large vs. small)
Method and length of follow-up/ascertainment
Population characteristics (high vs. low risk)
Treatment or exposure (drug name, dose)
Missing information about key effect sizes
Locate research studies
Determine which studies are eligible for
inclusion
Maintain log of reasons for ineligibility
Independent review by 2 or more abstractors
Blind abstractors to results of study (and
authors if possible)

47
Steps in Meta-Analysis

Classify and code important study characteristics
(e.g., sample size length of follow-up
definition of outcome Drug brand and dose)
Develop and pilot test abstraction form
Develop abstracting instructions and rules
Train abstractors and monitor their reliability
Consider using a quality rating system
Select or translate results from each study using
a common metric
Intention to treat vs. treatment received
Adjusted vs. unadjusted
Entire sample vs. subgroup
Truncate follow-up time if necessary

48
Steps in Meta-Analysis

Aggregate findings across studies, generating
weighted pooled estimates of effect size.
Fixed effects Did the treatment produce
benefit, on average, in the studies reported to
date?
Random effects Will the treatment produce
benefit, on average?
(Assumes that the reported studies are a sample
of some hypothetical population of studies)
Select or translate results from each study using
a common metric
Intention to treat vs. treatment received
Adjusted vs. unadjusted
Entire sample vs. subgroup
Truncate follow-up time if necessary
Evaluate the statistical homogeneity of pooled
studies
Use stratification or modeling (meta-regression)
techniques to explain variation in findings
across studies
Perform sensitivity analyses to assess the impact
of excluding or down-weighting unpublished
studies, studies of lower quality, out-of-date
studies, etc.