Title: DATA ACQUISITION: FOCUSING ON THE CHALLENGE
1DATA ACQUISITION FOCUSING ON THE CHALLENGE
- Gerald J. Hahn and Necip Doganaksoy
- Adjunct Faculty, RPI GE Global
Research -
- Presentation to
- 2003 Quality Productivity Research
Conference - IBM Watson Research Center
- May 2003
2THE OBVIOUS, THE EXPECTATION, THE REALITY AND THE
CHALLENGES
- The Obvious
- Statistical analyses are based upon data (and
assumptions about sampled populations, etc.) - Such analyses are only as good as the data upon
which they are based - Bad data lead to more complex, less powerful or
invalid analyses - The Expectation Much attention is given to the
data acquisition process in training
practitioners and statisticians - The Reality Little attention is generally given
to the data acquisition process in training
practitioners and statisticians - The Challenge Move data-acquisition to front
burner - Understand limitations of observational data
- Develop disciplined process for data acquisition
- Emphasize data acquisition at all levels of
training
3PROBLEMS WITH OBSERVATIONAL DATA
- Problems with available databases
- Data obtained for purposes other than
statistical analysis - Data resides in different data bases
- Data purging
- Problems with observational data
- Missing variables, values and events
- Unrepresentative (non-random) observations
- Loss of traceability
- Loss of timeliness
- Inconsistent or imprecise measurements
- Correlated variables and limited variability
- Observation from the trenches (Kati Illouz, GE)
Data owners tend to be overly optimistic about
their data - Key point (not always recognized by
practitioners) The quality, rather than the
quantity, of the data is what counts - Data inadequacies help define future information
needs - Two types of situations
- Routine operations (e.g., process monitoring)
- Special investigations (e.g., process
optimization)
4PROCESS FOR DATA ACQUISITION (DEUPM) (in spirit
of Six Sigma)
- Proposed process
- D Define the problem
- E Evaluate the available data
- U Understand data acquisition opportunities and
limitations - P Plan data acquisition and implement
- M Monitor, clean data, analyze and validate
- Example Demonstrate, in 6 months, ten-year
reliability for new washing machine design -
- Basic idea Disciplined, targeted plan for data
acquisition -
-
5D DEFINE THE PROBLEM
- Steps Define
- Specific questions to be answered and resulting
actions - Population or process of interest
- Data we would ideally like to have (Wayne Nelson)
- Washing machine design example
- Stated objective Show within 6 months and with
95 confidence that following goals can be met - 95 reliability in first year of operation
- 90 reliability after five years
- 80 reliability after ten years
- (reliability defined as no repair or servicing
need) - Resulting actions Proceed with full scale
production (if validated) - Further question How can we improve?
- Population 6 million machines to be built in
next 5 years - Ideal data Field repair and servicing needs for
6 million future machines
6E EVALUATE THE AVAILABLE DATA
- Steps
- Understand the process and its physical basis
- Analyze existing data
- Ask Is available data sufficient (and if not,
what is needed)? - Washing machine design example
- Participate in design reviews, FMEAs (Failure
Mode and Effects Analysis), etc. - Analyze
- In-house test results on previous designs
- Field data on previous designs
- Component and sub-assembly test results (e.g.,
motor testing) - Conclusions
- Previous design does not meet current reliability
goals - Proposed new design corrects many past problems
- Possible concern Introduction of new failure
modes - Component and sub-assembly test results look
promising - No information about system performance in
realistic environmentneed such information
7U UNDERSTAND DATA ACQUISITION OPPORTUNITIES AND
LIMITATIONS
- Steps Gain understanding of
- Data acquisition process, measurement error, etc.
- Limitations in data acquisition
- Limitations in inferencing
- Washing machine design example
- Data acquisition Conduct in-house accelerated
cycling of washing machines - Simulate 3.5 years of operation per month
- Evaluate weekly for failures
- Take apart at end of test and measure degradation
- Limitations in data acquisition
- 6 months of testing
- 36 available test stands
- 3 prototype lots
- Limitations in inferencing
- Assume prototype lots are from same population
as high volume production - Assume failures, etc. are cycle dependent
- Assume realistic simulation of field environment
- Conclusion This is analytic (not enumerative)
study statistical confidence bounds only
partially capture uncertainty
8PPLAN DATA ACQUISITION AND IMPLEMENT
- Steps Develop and evaluate specific strategy,
including - Testing conditions or operational environment
- Samples size and selection process
- Assessment of sampling plan
- Testing protocol
- Pilot study
- Washing machine example
- Testing conditions Run washing machines with
full load of soiled towels, mixed with sand,
wrapped in plastic bag - Sample size 12 units each from 3 prototype lots
- After 3 months
- Remove 4 units from each of 3 lots and measure
degradation - Replace with 12 units from 4th lot
- After 6 months To have 95 probability of
demonstrating 80 reliability after 10 years in
field with 95 confidence requires actual
reliability to be 95--or sample size of 96 if
actual reliability is 90 - (assuming Weibull distribution with shape
parameter of 2.5) - Specify protocol, including high-precision
measurements, definition of failure, data
recording requirements, replacements of failed
units, etc. - Pilot study Three washing machines run for one
week
9M MONITOR, CLEAN DATA, ANALYZE AND VALIDATE
- Steps
- Clean dataas gathered
- Monitor to ensure that process is being followed
- Conduct preliminary analyses determine whether
process need be changed - Conduct final analysis
- Validate Propose appropriate validation testing
- Washing machine design example
- Clean data Develop proactive checks for missing
or inconsistent data that automatically query
data provider - Monitor Continued involvement
- Analyze failure data after 1 week, 1 month and 3
months identify problems for correction - Do final analyses after 6 months (failure and
degradation data) - Validate propose added programs
- Continue 6 of 36 units on test beyond 6 months
- Beta test 100 machines with company employees and
60 in laundromats - Audit sample 6 production units each week Test
five for 1 week one for 3 months - Develop system for capturing and analyzing field
reliability data -
DISCIPLINED, TARGETED DATA ACQUISITION PROCESS
10TEACHING DATA ACQUISITION PROPOSAL
- Preferred Course in data acquisition as second
required course in statistics for practitioners
and aspiring statisticians - Compromise Devote one third of one-semester
introductory course to data acquisition - Industrial Devote one third of short courses to
data acquisition - In addition Discuss data acquisition process and
challenges in all data analysis examples - P.S. Most courses on design of experiments and
survey sampling cover only tip of iceberg and are
offered to limited audience
11PROPOSED COURSE IN DATA ACQUISITION OUTLINE
- Motivation Need for good data and limitations of
observational studies - Key concepts
- Populations, sampling frames, processes, random
(and other) samples - Analytic versus enumerative studies
- Measurement error
- Disciplined, targeted process for data
acquisition (and examples) - Some formal approaches
- Design of experiments (including factorial,
fractional factorial, response surface) - Survey sampling (including questionnaire
construction, non-response problems) - Data acquisition systems (e.g., for SPC, field
reliability, student performance assessment) - Some special studies and situations (e.g., life
testing, dosage studies, attribute ys) - Data acquisition as a learning process (Box et
al) - Graphical data analyses
- Sample size determination Analytical and
simulation approaches - In process data cleaning
- Statistics in the news Data acquisition
considerations (Source Laurie Snell Chance
News-www.dartmouth.edu/chance/) - Student generated studies and critiques (Source
Bill Hunter, 1977 American Statistician article
Some Ideas about Teaching Design of
Experiments)
12ELEVATOR SPEECH
- We need put the the horse (data acquisition)
before the CART (data analysis) - Specific proposals
- Formal process for data acquisition
- High focus on training, including required course
on data acquisition - To analyze data is human--to plan to gather the
right data is divine - P.S.
- For copies of slides, contact gerryhahn_at_yahoo.com
- Comments based upon chapter from Statistics in
the Corporate WorldConnecting the Dots
(tentative title) to be published in 2004 (we
hope) by Wileyyour inputs invited!