DATA ACQUISITION: FOCUSING ON THE CHALLENGE - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

DATA ACQUISITION: FOCUSING ON THE CHALLENGE

Description:

Develop disciplined process for data acquisition ... Preferred: Course in data acquisition as second required course in statistics ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 13
Provided by: unkn878
Category:

less

Transcript and Presenter's Notes

Title: DATA ACQUISITION: FOCUSING ON THE CHALLENGE


1
DATA ACQUISITION FOCUSING ON THE CHALLENGE
  • Gerald J. Hahn and Necip Doganaksoy
  • Adjunct Faculty, RPI GE Global
    Research
  • Presentation to
  • 2003 Quality Productivity Research
    Conference
  • IBM Watson Research Center
  • May 2003

2
THE OBVIOUS, THE EXPECTATION, THE REALITY AND THE
CHALLENGES
  • The Obvious
  • Statistical analyses are based upon data (and
    assumptions about sampled populations, etc.)
  • Such analyses are only as good as the data upon
    which they are based
  • Bad data lead to more complex, less powerful or
    invalid analyses
  • The Expectation Much attention is given to the
    data acquisition process in training
    practitioners and statisticians
  • The Reality Little attention is generally given
    to the data acquisition process in training
    practitioners and statisticians
  • The Challenge Move data-acquisition to front
    burner
  • Understand limitations of observational data
  • Develop disciplined process for data acquisition
  • Emphasize data acquisition at all levels of
    training

3
PROBLEMS WITH OBSERVATIONAL DATA
  • Problems with available databases
  • Data obtained for purposes other than
    statistical analysis
  • Data resides in different data bases
  • Data purging
  • Problems with observational data
  • Missing variables, values and events
  • Unrepresentative (non-random) observations
  • Loss of traceability
  • Loss of timeliness
  • Inconsistent or imprecise measurements
  • Correlated variables and limited variability
  • Observation from the trenches (Kati Illouz, GE)
    Data owners tend to be overly optimistic about
    their data
  • Key point (not always recognized by
    practitioners) The quality, rather than the
    quantity, of the data is what counts
  • Data inadequacies help define future information
    needs
  • Two types of situations
  • Routine operations (e.g., process monitoring)
  • Special investigations (e.g., process
    optimization)

4
PROCESS FOR DATA ACQUISITION (DEUPM) (in spirit
of Six Sigma)
  • Proposed process
  • D Define the problem
  • E Evaluate the available data
  • U Understand data acquisition opportunities and
    limitations
  • P Plan data acquisition and implement
  • M Monitor, clean data, analyze and validate
  • Example Demonstrate, in 6 months, ten-year
    reliability for new washing machine design
  • Basic idea Disciplined, targeted plan for data
    acquisition

5
D DEFINE THE PROBLEM
  • Steps Define
  • Specific questions to be answered and resulting
    actions
  • Population or process of interest
  • Data we would ideally like to have (Wayne Nelson)
  • Washing machine design example
  • Stated objective Show within 6 months and with
    95 confidence that following goals can be met
  • 95 reliability in first year of operation
  • 90 reliability after five years
  • 80 reliability after ten years
  • (reliability defined as no repair or servicing
    need)
  • Resulting actions Proceed with full scale
    production (if validated)
  • Further question How can we improve?
  • Population 6 million machines to be built in
    next 5 years
  • Ideal data Field repair and servicing needs for
    6 million future machines

6
E EVALUATE THE AVAILABLE DATA
  • Steps
  • Understand the process and its physical basis
  • Analyze existing data
  • Ask Is available data sufficient (and if not,
    what is needed)?
  • Washing machine design example
  • Participate in design reviews, FMEAs (Failure
    Mode and Effects Analysis), etc.
  • Analyze
  • In-house test results on previous designs
  • Field data on previous designs
  • Component and sub-assembly test results (e.g.,
    motor testing)
  • Conclusions
  • Previous design does not meet current reliability
    goals
  • Proposed new design corrects many past problems
  • Possible concern Introduction of new failure
    modes
  • Component and sub-assembly test results look
    promising
  • No information about system performance in
    realistic environmentneed such information

7
U UNDERSTAND DATA ACQUISITION OPPORTUNITIES AND
LIMITATIONS
  • Steps Gain understanding of
  • Data acquisition process, measurement error, etc.
  • Limitations in data acquisition
  • Limitations in inferencing
  • Washing machine design example
  • Data acquisition Conduct in-house accelerated
    cycling of washing machines
  • Simulate 3.5 years of operation per month
  • Evaluate weekly for failures
  • Take apart at end of test and measure degradation
  • Limitations in data acquisition
  • 6 months of testing
  • 36 available test stands
  • 3 prototype lots
  • Limitations in inferencing
  • Assume prototype lots are from same population
    as high volume production
  • Assume failures, etc. are cycle dependent
  • Assume realistic simulation of field environment
  • Conclusion This is analytic (not enumerative)
    study statistical confidence bounds only
    partially capture uncertainty

8
PPLAN DATA ACQUISITION AND IMPLEMENT
  • Steps Develop and evaluate specific strategy,
    including
  • Testing conditions or operational environment
  • Samples size and selection process
  • Assessment of sampling plan
  • Testing protocol
  • Pilot study
  • Washing machine example
  • Testing conditions Run washing machines with
    full load of soiled towels, mixed with sand,
    wrapped in plastic bag
  • Sample size 12 units each from 3 prototype lots
  • After 3 months
  • Remove 4 units from each of 3 lots and measure
    degradation
  • Replace with 12 units from 4th lot
  • After 6 months To have 95 probability of
    demonstrating 80 reliability after 10 years in
    field with 95 confidence requires actual
    reliability to be 95--or sample size of 96 if
    actual reliability is 90
  • (assuming Weibull distribution with shape
    parameter of 2.5)
  • Specify protocol, including high-precision
    measurements, definition of failure, data
    recording requirements, replacements of failed
    units, etc.
  • Pilot study Three washing machines run for one
    week

9
M MONITOR, CLEAN DATA, ANALYZE AND VALIDATE
  • Steps
  • Clean dataas gathered
  • Monitor to ensure that process is being followed
  • Conduct preliminary analyses determine whether
    process need be changed
  • Conduct final analysis
  • Validate Propose appropriate validation testing
  • Washing machine design example
  • Clean data Develop proactive checks for missing
    or inconsistent data that automatically query
    data provider
  • Monitor Continued involvement
  • Analyze failure data after 1 week, 1 month and 3
    months identify problems for correction
  • Do final analyses after 6 months (failure and
    degradation data)
  • Validate propose added programs
  • Continue 6 of 36 units on test beyond 6 months
  • Beta test 100 machines with company employees and
    60 in laundromats
  • Audit sample 6 production units each week Test
    five for 1 week one for 3 months
  • Develop system for capturing and analyzing field
    reliability data

DISCIPLINED, TARGETED DATA ACQUISITION PROCESS
10
TEACHING DATA ACQUISITION PROPOSAL
  • Preferred Course in data acquisition as second
    required course in statistics for practitioners
    and aspiring statisticians
  • Compromise Devote one third of one-semester
    introductory course to data acquisition
  • Industrial Devote one third of short courses to
    data acquisition
  • In addition Discuss data acquisition process and
    challenges in all data analysis examples
  • P.S. Most courses on design of experiments and
    survey sampling cover only tip of iceberg and are
    offered to limited audience

11
PROPOSED COURSE IN DATA ACQUISITION OUTLINE
  • Motivation Need for good data and limitations of
    observational studies
  • Key concepts
  • Populations, sampling frames, processes, random
    (and other) samples
  • Analytic versus enumerative studies
  • Measurement error
  • Disciplined, targeted process for data
    acquisition (and examples)
  • Some formal approaches
  • Design of experiments (including factorial,
    fractional factorial, response surface)
  • Survey sampling (including questionnaire
    construction, non-response problems)
  • Data acquisition systems (e.g., for SPC, field
    reliability, student performance assessment)
  • Some special studies and situations (e.g., life
    testing, dosage studies, attribute ys)
  • Data acquisition as a learning process (Box et
    al)
  • Graphical data analyses
  • Sample size determination Analytical and
    simulation approaches
  • In process data cleaning
  • Statistics in the news Data acquisition
    considerations (Source Laurie Snell Chance
    News-www.dartmouth.edu/chance/)
  • Student generated studies and critiques (Source
    Bill Hunter, 1977 American Statistician article
    Some Ideas about Teaching Design of
    Experiments)

12
ELEVATOR SPEECH
  • We need put the the horse (data acquisition)
    before the CART (data analysis)
  • Specific proposals
  • Formal process for data acquisition
  • High focus on training, including required course
    on data acquisition
  • To analyze data is human--to plan to gather the
    right data is divine
  • P.S.
  • For copies of slides, contact gerryhahn_at_yahoo.com
  • Comments based upon chapter from Statistics in
    the Corporate WorldConnecting the Dots
    (tentative title) to be published in 2004 (we
    hope) by Wileyyour inputs invited!
Write a Comment
User Comments (0)
About PowerShow.com