Secondary Data and Sources - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Secondary Data and Sources

Description:

... is easy to find ... Warranty Cards. Customer Registration/Sign-in. Data Mining of ... card transactions, loyalty cards, discount coupons, customer ... – PowerPoint PPT presentation

Number of Views:260
Avg rating:3.0/5.0
Slides: 49
Provided by: brianj80
Category:

less

Transcript and Presenter's Notes

Title: Secondary Data and Sources


1
Chapter 4
  • Secondary Data and Sources

2
Secondary Data Defined
  • Secondary data is information that has been
    collected previously for a purpose other than the
    need at hand

3
Reasons for Secondary Research
  • Secondary research may solve the problem
  • Secondary information costs less
  • Supplementary uses
  • Defining the research problem
  • Planning collection of primary data
  • Defining the population and selecting the sample

4
Weighing evidence
  • There are dangers in relying entirely on
    secondary data
  • Relevant Information pertains to the problem at
    hand
  • Accurate Information reflects the reality
  • Current Information isnt dated.. Data is
    perishable
  • Sufficient Enough information and detail
  • Available Information is easy to find
  • Measurement of Units Is it the same as in your
    analysis (Sales, Profit, Employees, Sales/Sq.
    Ft., Sq. Ft.)
  • Classification Categories50-64,999 65-79,999
    80,000 and over what if you want 150,000 and
    over?
  • Knowing how information was collected helps
    determine how credible the data is.

5
Credibility of Secondary Data
  • What was the purpose of the study?
  • Is it Relevant? Biasedpolitical and civic
    groups overstate information to make positions
    look more/less attractive
  • Who collected the information?
  • Competence of the organization to do quality
    research?
  • What information was collected?
  • How was the information collected?
  • Are there errors in research design, sampling,
    analysis, non-response, etc., Measure reliability
    and validity.
  • Are the findings consistent with other
    studies/information?
  • It is inexpensive?
  • How does cost compare to the cost of primary data
    collection.

6
Internal Sources
  • Internal secondary information is available
    within the company
  • Prior research reports
  • Documents and databases (sales reports, warranty
    information)
  • Key is knowing the information exists and how to
    access it

7
Is Secondary Data Appropriate?
  • Define the Purpose
  • What do you want to know? Be as specific as
    possible.
  • Industry Analysis
  • Analyze the structure of the market Who are the
    major players?
  • Literature search (library/Internet)
  • These is the starting point for data collection.
  • Who has the information you need most?
  • Start with the most likely sources and work to
    the least likely.
  • Share and discuss your information with other
    team members.

8
External Sources
  • External data comes from outside sources
  • Huge amounts are available
  • Typically cover non-controllable (environmental)
    factors
  • Market size
  • Competitive information
  • Market characteristics

9
Common External Sources
  • Government
  • Trade associations and trade press
  • Periodicals and professional journals
  • Institutions (universities)
  • Commercial services
  • Government data is the largest source
  • Experienced researchers rely on government and
    trade sources

10
Consumer and Economic Data
  • Market and Consumer Information
  • U.S. Bureau of the Census
  • Bureau of Labor Statistics
  • County and City Databook (Census Bureau)
  • Demographics USA Reference Book
  • Lifestyle Market Analyst
  • Woods and Poole MSA Profile Reference Book
  • MediaMark Reporter
  • Sales and Marketing Management Survey of Buying
    Power
  • General Economic Information
  • Survey of Current Business
  • Federal Reserve Bulletin
  • Statistical Abstract of the U.S.

11
Government Publications
  • Statistical Abstract of the United States
  • Demographic data from census reports
  • The State and Metropolitan Area Data Book
  • Same as above, broken down by state, county, and
    metropolitan area
  • Census of Population and Census of Housing
  • Conducted every ten years
  • Available online
  • www.firstgov.gov (official government portal)
  • www.census.gov

12
Other Sources
  • Encyclopedia of Associations
  • Listings of trade groups, contact information,
    and publications
  • Online databases
  • Periodicals indexes like EBSCO and Infotrac
  • Subject specific resources, like LexisNexis
  • The books are available in academic, public, and
    corporate libraries the online databases
    typically require a subscription and are often
    available through university and public libraries

13
Online Sources
  • LEXIS-NEXIS
  • Business News contains magazine and newspaper
    articles, broadcast transcripts.
  • Compare Companies shows what companies fit
    certain size characteristics.
  • Patent searches for product information can be
    found by selecting business, then patents.
  • Journalist Express, provides a complete listing
    of
  • Wire services
  • News services
  • Search engines
  • International news

14
Financial Information
  • Hoovers
  • Financial information for publicly traded
    companies.
  • company capsules, key competitors, and links to
    SEC 10-A and 10-Q reports, links to articles on
    companies, track insider trades for companies,
    including who purchased/sold what
  • Public Registers Annual Report Service
  • Annual report information for over two thousand
    companies.

15
Syndicated Services
  • Specialist research companies that generate
    updated reports on a regular, ongoing bases that
    they provide to a number of clients
  • Examples include IDC, Mediamark, ACNielsen

16
Database Marketing/Data Mining
  • Data mining is the process of analyzing vast
    computer databases looking for patterns and
    relationships between variables that will help
    marketing efforts
  • Data explosion problem
  • Automated data collection tools and mature
    database technology has produced huge amounts of
    data stored in databases and data warehouses
  • Valuable data containing potentially valuable
    information
  • Solution Data warehousing and data mining
  • Data warehousing and on-line analytical
    processing
  • Extraction of interesting knowledge (rules,
    regularities, patterns, constraints) from data
    in large databases

17
Data Mining History
  • 1960s
  • Data collection, database creation, DBMS
  • 1970s
  • Relational data base management systems
  • 1980s
  • Advanced RDBMA models (extended-relational,
    deductive) and application-oriented DBMS
    (spatial, scientific, engineering, etc.)
  • 1990s2000s
  • Data mining and data warehousing, multimedia
    databases, and Web databases

18
Data Mining
  • What is a database?
  • Internal database a database developed from data
    within an organization.
  • Where does the data come from?
  • Sales Invoices
  • Salespersons Call Reports
  • Warranty Cards
  • Customer Registration/Sign-in

19
Data Mining of Internal Secondary Data
  • Database mining/marketing the micro marketing to
    customers based on customers and potential
    customers profiles and purchasing patterns.
  • Internal database mining of secondary data
    enables firms to
  • evaluate sales territories
  • identify most and least profitable customers
  • identify potential market segments
  • identify which products, services, and segments
    need the most marketing support
  • evaluate opportunities for offering new products
    or services
  • identify most and least profitable products and
    services
  • evaluate existing marketing programs

20
Database Applications
  • Database analysis and decision support
  • Market analysis and management
  • target marketing, customer relation management,
    market basket analysis, cross selling, market
    segmentation
  • Customer acquisition
  • discover attributes that predict customer
    responses to mktg. programs
  • Customer retention
  • target customers who are on the verge of
    switching to a competitor.
  • Customer abandonment
  • are some customers too costly to maintain?
  • Risk analysis and management
  • Forecasting, customer retention, quality control,
    competitive analysis
  • Fraud detection and management
  • Other Applications
  • Text mining (news group, email, documents) and
    Web analysis.
  • Intelligent query answering

21
Market Analysis Examples
  • Where are the data sources for analysis?
  • Credit card transactions, loyalty cards, discount
    coupons, customer complaint calls, plus (public)
    lifestyle studies
  • Target marketing
  • Find clusters of model customers who share the
    same characteristics interest, income level,
    spending habits, etc.
  • Determine customer purchasing patterns over time
  • Conversion of single to a joint bank account
    marriage, etc.
  • Cross-market analysis
  • Associations/co-relations between product sales
  • Prediction based on the association information

22
Market Analysis Examples
  • Customer profiling
  • data mining can tell you what types of customers
    buy what products (clustering or classification)
  • Identifying customer requirements
  • identifying the best products for different
    customers
  • use prediction to find what factors will attract
    new customers
  • Provides summary information
  • various multidimensional summary reports
  • statistical summary information (data central
    tendency and variation)

23
Finance/Risk Examples
  • Finance planning and asset evaluation
  • cash flow analysis and prediction
  • contingent claim analysis to evaluate assets
  • cross-sectional and time series analysis
    (financial-ratio, trend analysis, etc.)
  • Resource planning
  • summarize and compare the resources and spending
  • Competition
  • monitor competitors and market directions
  • group customers into classes and a class-based
    pricing procedure
  • set pricing strategy in a highly competitive
    market

24
Fraud Detection Examples
  • Applications
  • widely used in health care, retail, credit card
    services, telecommunications (phone card fraud),
    etc.
  • Approach
  • use historical data to build models of fraudulent
    behavior and use data mining to help identify
    similar instances
  • Examples
  • auto insurance detect a group of people who
    stage accidents to collect on insurance
  • money laundering detect suspicious money
    transactions (US Treasury's Financial Crimes
    Enforcement Network)
  • medical insurance detect professional patients
    and ring of doctors and ring of references

25
Fraud Detection Examples
  • Sports
  • IBM Advanced Scout analyzed NBA game statistics
    (shots blocked, assists, and fouls) to gain
    competitive advantage for New York Knicks and
    Miami Heat
  • Astronomy
  • JPL and the Palomar Observatory discovered 22
    quasars with the help of data mining
  • Internet Web Surf-Aid
  • IBM Surf-Aid applies data mining algorithms to
    Web access logs for market-related pages to
    discover customer preference and behavior pages,
    analyzing effectiveness of Web marketing,
    improving Web site organization, etc.

26
Case Example Bell South
  • Bell South data mining to eliminate least likely
    to purchase
  • Used the attributes of existing customers to
    identify, model and predict potential customers

27
The Data Mining Process
Insight and Knowledge
Pattern Evaluation
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
Data Integration
Databases
28
Data Mining of What Kind of Data?
  • Relational databases
  • Data warehouses
  • Transactional databases
  • Advanced DB and information repositories
  • Object-oriented and object-relational databases
  • Spatial databases
  • Time-series data and temporal data
  • Text databases and multimedia databases
  • Heterogeneous and legacy databases
  • WWW

29
Data Mining Analysis
  • Concept description Characterization and
    discrimination
  • Generalize, summarize, and contrast data
    characteristics, e.g., dry vs. wet regions
  • Association (correlation and causality)
  • Multi-dimensional vs. single-dimensional
    association
  • age(X, 20..29) income(X, 20..29K) à buys(X,
    PC) support 2, confidence 60
  • contains(T, computer) à contains(x, software)
    1, 75

30
Data Mining Analysis
  • Classification and Prediction
  • Finding models (functions) that describe and
    distinguish classes or concepts for future
    prediction
  • E.g., classify countries based on climate, or
    classify cars based on gas mileage
  • Presentation decision-tree, classification rule,
    neural network
  • Prediction Predict some unknown or missing
    numerical values
  • Cluster analysis
  • Class label is unknown Group data to form new
    classes, e.g., cluster houses to find
    distribution patterns
  • Clustering based on the principle maximizing the
    intra-class similarity and minimizing the
    interclass similarity

31
Data Mining Analysis
  • Outlier analysis
  • Outlier a data object that does not comply with
    the general behavior of the data
  • It can be considered as noise or exception but is
    quite useful in fraud detection, rare events
    analysis
  • Trend and evolution analysis
  • Trend and deviation regression analysis
  • Sequential pattern mining, periodicity analysis
  • Similarity-based analysis
  • Other pattern-directed or statistical analyses

32
Discovering Interesting Patterns
  • A data mining system/query may generate thousands
    of patterns, not all of them are interesting.
  • Suggested approach Human-centered, query-based,
    focused mining
  • Interestingness measures A pattern is
    interesting if it is easily understood by humans,
    valid on new or test data with some degree of
    certainty, potentially useful, novel, or
    validates some hypothesis that a user seeks to
    confirm
  • Objective vs. subjective interestingness
    measures
  • Objective based on statistics and structures of
    patterns, e.g., support, confidence, etc.
  • Subjective based on users belief in the data,
    e.g., unexpectedness, novelty, actionability, etc.

33
Can We Find Interesting Patterns?
  • Find all the interesting patterns Completeness
  • Can a data mining system find all the interesting
    patterns?
  • Association vs. classification vs. clustering
  • Search for only interesting patterns
    Optimization
  • Can a data mining system find only the
    interesting patterns?
  • Approaches
  • First general all the patterns and then filter
    out the uninteresting ones.
  • Generate only the interesting patternsmining
    query optimization

34
Major Issues in Data Mining
  • Mining methodology and user interaction
  • Mining different kinds of knowledge in databases
  • Interactive mining of knowledge at multiple
    levels of abstraction
  • Incorporation of background knowledge
  • Data mining query languages and ad-hoc data
    mining
  • Expression and visualization of data mining
    results
  • Handling noise and incomplete data
  • Pattern evaluation the interestingness problem
  • Performance and scalability
  • Efficiency and scalability of data mining
    algorithms
  • Parallel, distributed and incremental mining
    methods

35
Major Issues in Data Mining
  • Issues relating to the diversity of data types
  • Handling relational and complex types of data
  • Mining information from heterogeneous databases
    and global information systems (WWW)
  • Issues related to applications and social impacts
  • Application of discovered knowledge
  • Domain-specific data mining tools
  • Intelligent query answering
  • Process control and decision making
  • Integration of the discovered knowledge with
    existing knowledge A knowledge fusion problem
  • Protection of data security, integrity, and
    privacy

36
Meta-Analysis
  • 1952 Hans J. Eysenck concluded that there were
    no favorable effects of psychotherapy, starting a
    raging debate
  • 20 years of evaluation research and hundreds of
    studies failed to resolve the debate
  • 1978 To prove Eysenck wrong, Gene V. Glass
    statistically aggregated the findings of 375
    psychotherapy outcome studies
  • Glass (and colleague Smith) concluded that
    psychotherapy did indeed work
  • Glass called his method meta-analysis

37
The Emergence of Meta-Analysis
  • Ideas behind meta-analysis predate Glass work by
    several decades
  • R. A. Fisher (1944)
  • When a number of quite independent tests of
    significance have been made, it sometimes happens
    that although few or none can be claimed
    individually as significant, yet the aggregate
    gives an impression that the probabilities are on
    the whole lower than would often have been
    obtained by chance (p. 99).
  • Source of the idea of cumulating probability
    values
  • W. G. Cochran (1953)
  • Discusses a method of averaging means across
    independent studies
  • Laid-out much of the statistical foundation that
    modern meta-analysis is built upon (e.g., inverse
    variance weighting and homogeneity testing)

38
The Logic of Meta-Analysis
  • Traditional methods of review focus on
    statistical significance testing
  • Significance testing is not well suited to this
    task
  • highly dependent on sample size
  • null finding does not carry to same weight as a
    significant finding
  • Meta-analysis changes the focus to the direction
    and magnitude of the effects across studies
  • Isnt this what we are interested in anyway?
  • Direction and magnitude represented by the effect
    size

39
When Can You Do Meta-Analysis?
  • Meta-analysis is applicable to collections of
    research that
  • are empirical, rather than theoretical
  • produce quantitative results, rather than
    qualitative findings
  • examine the same constructs and relationships
  • have findings that can be configured in a
    comparable statistical form (e.g., as effect
    sizes, correlation coefficients, odds-ratios,
    etc.)
  • are comparable given the question at hand

40
Effect Size The Key to Meta-Analysis
  • Central Tendency Research
  • prevalence rates
  • Pre-Post Contrasts
  • growth rates
  • Group Contrasts
  • experimentally created groups
  • comparison of outcomes between treatment and
    comparison groups
  • naturally occurring groups
  • comparison of spatial abilities between boys and
    girls
  • Association Between Variables
  • measurement research
  • validity generalization
  • individual differences research
  • correlation between personality constructs

41
The Replication Continuum
You must be able to argue that the collection of
studies you are meta-analyzing examine the same
relationship. This may be at a broad level of
abstraction, such as the relationship between
criminal justice interventions and recidivism or
between school-based prevention programs and
problem behavior. Alternatively it may be at a
narrow level of abstraction and represent pure
replications. The closer to pure replications
your collection of studies, the easier it is to
argue comparability.
42
Which Studies to Include?
  • It is critical to have an explicit inclusion and
    exclusion criteria the broader the research
    domain, the more detailed they tend to become
  • developed iteratively as you interact with the
    literature
  • To include or exclude low quality studies
  • the findings of all studies are potentially in
    error (methodological quality is a continuum, not
    a dichotomy)
  • being too restrictive may restrict ability to
    generalize
  • being too inclusive may weaken the confidence
    that can be placed in the findings
  • must strike a balance that is appropriate to your
    research question

43
Searching for Studies to Include
  • Argument We only included published studies
    because they have been peer-reviewed Significant
    findings are more likely to be published than
    non-significant findings
  • Critical to try to identify and retrieve all
    studies that meet your eligibility criteria
  • Potential sources for identification of documents
  • computerized bibliographic databases
  • authors working in the research domain
  • conference programs
  • dissertations
  • review articles
  • hand searching relevant journal
  • government reports, bibliographies, clearinghouses

44
Strengths of Meta-Analysis
  • Imposes a discipline on the process of summing up
    research findings
  • Represents findings in a more differentiated and
    sophisticated manner than conventional reviews
  • Capable of finding relationships across studies
    that are obscured in other approaches
  • Protects against over-interpreting differences
    across studies
  • Can handle a large numbers of studies (this would
    overwhelm traditional approaches to review)

45
Weaknesses of Meta-Analysis
  • Requires a good deal of effort
  • Mechanical aspects dont lend themselves to
    capturing more qualitative distinctions between
    studies
  • Apples and oranges comparability of studies is
    often in the eye of the beholder
  • Most meta-analyses include blemished studies
  • Selection bias posses continual threat
  • negative and null finding studies that you were
    unable to find
  • outcomes for which there were negative or null
    findings that were not reported
  • Analysis of between study differences is
    fundamentally correlational

46
Steps in Meta-Analysis
  • Define the research question and specific
    hypotheses
  • Define the criteria for including and excluding
    studies
  • Study designs (randomized vs. observational)
  • Publication and date thereof (vs. unpublished)
  • Language of publication
  • Multiple publications from the same sample
  • Sample size (large vs. small)
  • Method and length of follow-up/ascertainment
  • Population characteristics (high vs. low risk)
  • Treatment or exposure (drug name, dose)
  • Missing information about key effect sizes
  • Locate research studies
  • Determine which studies are eligible for
    inclusion
  • Maintain log of reasons for ineligibility
  • Independent review by 2 or more abstractors
  • Blind abstractors to results of study (and
    authors if possible)

47
Steps in Meta-Analysis
  • Classify and code important study characteristics
    (e.g., sample size length of follow-up
    definition of outcome Drug brand and dose)
  • Develop and pilot test abstraction form
  • Develop abstracting instructions and rules
  • Train abstractors and monitor their reliability
  • Consider using a quality rating system
  • Select or translate results from each study using
    a common metric
  • Intention to treat vs. treatment received
  • Adjusted vs. unadjusted
  • Entire sample vs. subgroup
  • Truncate follow-up time if necessary

48
Steps in Meta-Analysis
  • Aggregate findings across studies, generating
    weighted pooled estimates of effect size.
  • Fixed effects Did the treatment produce
    benefit, on average, in the studies reported to
    date?
  • Random effects Will the treatment produce
    benefit, on average?
  • (Assumes that the reported studies are a sample
    of some hypothetical population of studies)
  • Select or translate results from each study using
    a common metric
  • Intention to treat vs. treatment received
  • Adjusted vs. unadjusted
  • Entire sample vs. subgroup
  • Truncate follow-up time if necessary
  • Evaluate the statistical homogeneity of pooled
    studies
  • Use stratification or modeling (meta-regression)
    techniques to explain variation in findings
    across studies
  • Perform sensitivity analyses to assess the impact
    of excluding or down-weighting unpublished
    studies, studies of lower quality, out-of-date
    studies, etc.
Write a Comment
User Comments (0)
About PowerShow.com