Title: Research Overview of ICAMS Laboratory
1Research Overview of ICAMS Laboratory
- Present to Dean Robinson from GE Global Research
and Jim Dolle from GE Aircraft Engine
2Overview
Samuel H. Huang Associate Professor of Industrial
Engineering
3Intelligent CAM Systems Laboratory
- Established in September 1998 at the University
of Toledo, relocated to the University of
Cincinnati in September 2001 - Current team member
- 1 Post-Doc research associate
- 2 Ph.D. students (1 at University of Toledo)
- 8 M.S. students
- 1 undergraduate student
- Alumni 1 Post-Doc, 17 graduate students (2 Ph.D.
dissertations, 6 M.S. theses, 9 M.S. projects),
one undergraduate student
4Research Overview
5Knowledge-based Engineering
- John J. Shi
- Post-Doc Fellow
6KBE Framework Project Progress
7Raw data
- Determine parameters to collect
- Raw data processing
- Normalization
- Transformation
- Discretization
- Equal width
- Greedy Chi-merge
- Data split
8Data Cleansing
- Noise/Error in data
- Filtering and correcting
- Missing Data Manipulation
- General methods (imputation) mean, closest-fit ,
and regression - A combined ML(Maximum Likelihood)/EM (Expectation
Maximization), - Reversed neural computing technique
Niharika
9Dimensionality Reduction
- PCA with Clustering technique
- Rank input parameters using PCA
- Stop/Reduce criterion
- Chi-square testing of the independence of
categorical data - Criterion Inconsistency rate
- Neural network pruning
- Discretization
Saurabh
10Rule Extraction
- Decision tree
- ID3, C4.5, etc.
- Chi2 statistic test
- Clustering
- Subtractive Clustering
- Two linguistic terms that are not statistically
different, can be merged. - Neural networks
- Classify continuous-valued inputs into linguistic
term sets - Represent sets using a binary scheme
- Dynamic Depth-first Searching
11Rule Refinement
- Transfer compound rules
- Remove redundant rules
- Remove overlapping rules
- Similar Rule
- Mergeable Similar Rule Group
- Combine rules
- Accuracy of Prediction
- Possibility of Merging
- Rule pruning according to weights
12AMFM rule tuning/adaptation
- Construct
- Tuning
- Adaptation
Ranga
13Model validation
- Validation criteria
- Traditional statistical criteria MSE/RMSE, R2,
F, etc. - PRESS (Predictive Sum of Square)
- Akaike Information Criterion (AIC)
- Primitive conclusions
- Some criteria cannot be applied to evaluate
soft-computing technique. - AIC is a good criterion.
14Applications When to use KBE?
Drop Hammer Forming
Atomizer Performance
Thermal Paint Calibration
- KBE can be used to
- Predict,
- Simulate,
- Analysis,
- Report, etc.
Seamless tubing process
15Data Cleansing-Dealing With Missing Data
16Introduction
- Incomplete data can arise in a number of cases
- insufficient Samples of data
- incorrect data collection
- sensor failure in time series data
- samples of data that are impossible to obtain
when modeling exploratory data - calibration transfer (from master to slave
Instruments)
17Types of Missing Data
- Missing data has been characterized into 3 main
types based on the patterns of missing values
that occur in data sets - MCAR - Missing Completely At Random
- probability that an element is missing is
independent of both Observed values and Missing
values - MAR - Missing At Random
- probability that an element is missing is
dependent only on Observed values - NON IGNORABLE
- probability that an element is missing is
dependent only on Missing values
18Characteristics of Methods to Deal with Missing
Data
- Missing Data Algorithms have been proposed based
on the assumption that data is missing at random - These methods have incorporated different
techniques of imputation and multivariate data
analysis - A single efficient algorithm that can be
universally be applied is yet to be proposed - Combinations of the existing methods are being
evaluated to get better convergence of predicted
and actual values in real data sets
19Methods Proposed To Deal With Missing Data
20Methods Proposed To Deal With Missing Data
- Current methods of data analysis are designed to
preserve the partial knowledge that was ignored
in the previous methods - Data variability is taken into consideration
- They have better performance as they are
designed to reproduce the data within
experimental error - These methods are built on a set of assumptions
which may not hold good in real data sets - They give better performance than the previously
mentioned methods
21Current Methods
- Multivariate Analysis
- Principal Components Analysis
- Statistical Methods
- Partial Least squares
- Principal Component Regression
- Combination Methods
- Expectation Maximization PCA
- Maximum Likelihood PCA
- Multiple Imputation
- Neural Networks
- Clustering based Techniques
22Comparison of Current Methods based on Performance
- The comparison of the current methods was done on
the basis of two factors - Similarity Factor
- This factor is used to judge the convergence of
the predicted values with the actual values of
the reference data set. - Similarity 100- Average of Sdiff
- Sdiff sum of percentage differences between
actual and predicted values - Number of Iterations
- This factor determines the length of run and thus
the complexity of the algorithm
23Comparison Statistics of Current Methods
24Scope of Current Research
- Based on the comparison statistics of the methods
tested ,the EM PCA has the best results form
tests done so far both in terms of similarity and
number of iterations - Algorithms incorporating cluster analysis and
combinations of the previously tested methods are
being analyzed to get better convergence in
lesser time - A solution which works towards not only filling
in missing values but also takes care of outliers
and noise in data sets is being worked upon
25Dimensionality Reduction
26Introduction
- What is Dimensionality Reduction?
- The procedure of selection of a subset of process
parameters which are necessary and sufficient to
represent the system under consideration (without
affecting the system accuracy significantly) is
referred to as 'Dimensionality Reduction'. - Why Dimensionality Reduction?
- to select sufficient parameters representing the
system - to discard redundant information
- to reduce time and cost of any further data
collection and its analysis for system
monitoring and control
27Basic Steps in Dimensionality Reduction
- A Generation Procedure Generates a subset of
features for evaluation. - An Evaluation Function Measures the goodness of
the subset produced by the generation procedure. - A Stopping Criterion A criteria to avoid the
exhaustive run of the dimensionality reduction
procedure. - Validation To test the validity of selected
subset of parameters by carrying out tests over
artificial, real world datasets.
28Dimensionality Reduction Procedure
Subset
Original Data
Generation
Evaluation
Goodness of Subset
Stopping criterion
No
Yes
Validation
29Characteristics of the Developed Technique for
Dimensionality Reduction
- The technique address two problems
- Classification problem The Output of interest
(response variable) considered in this case is
discrete. - Function Approximation Problem The Output of
interest considered in this case takes continuous
values. - The technique can deal with parameters (both
independent and dependent) that are either
discrete or continuous or both. - The subset of features (parameters) is generated
using a guided sequential backward search
mechanism. - Principal Components Analysis is used for the
ranking (guided search) of features. - Clustering is used to measure the goodness of any
subset. - For classification problem, classification error
is used as the Evaluation function whereas for
the approximation problem it is the ratio of
Inter Cluster variance to Intra Cluster variance.
30Flowchart for the Developed Technique
Complete Dataset
Normalization of the Inputs
Clustering
Calculation of the Evaluation Function
Ranking of Parameters using PCA
Decision Making
Reduced Dataset
31Advantages of the Developed Technique Over
Existing Methods
- The developed technique can be used with datasets
comprising both discrete as well as continuous
parameters. - The technique developed can be used both for
classification as well as approximation problems.
32Industrial Case Study Lorain Pipe Mills
- Lorain Pipe Mills, a division of United States
Steel (USS) is located in the west of
Cleveland-Ohio, in the city of Lorain. - The rotary rolling process developed can produce
seamless pipes with lengths exceeding 40 feet and
diameter up to 26 inches. - Lorain Pipe Mills was going through a problem of
'Low Yield'. Hence to address this, process data
was acquired and analyzed.
33Dimensionality Reduction for Lorain Data
- Process data was collected on 8 Input parameters
and a prediction model (using a Feed Forward
Neural Network) was developed. - After running the algorithm only 2 parameters
were left in the model. The remaining 6 were
discarded. - The developed algorithm for Dimensionality
Reduction was run on the acquired data. Following
table summarizes the results - -
34Adaptive Mamdani Fuzzy Model
35Adaptive Mamdani Fuzzy Model (AMFM)
- AMFM is an adaptive template that can create
real-time or offline models in all domains - AMFM is a combination of neural networks and
fuzzy inference systems - It can be used to create data driven models
(solutions) - It can also use high level heuristic knowledge in
the modeling process
36AMFM- Architecture
37AMFM Modeling Process
- Prepare data into patterns with inputs and
desired (observed) outputs - Split the data into training and validation
datasets - Acquire and formalize priori domain knowledge
- Extract and validate knowledge from training data
- Setup AMFM architecture from the integrated
knowledge - Initialize model parameters based on the training
data - Train the architecture to the desired accuracy
- Validate the developed model
38AMFM, HyFIS and ANFIS
39Intelligent Condition Based Maintenance (ICBM)
- Objectives of ICBM
- Failure prediction
- Failure diagnosis
- Necessity for ICBM
- Eliminate breakdowns
- Assist in production scheduling
- Synchronize the JIT components
- Elucidate the process and machine component
interactions - ICBM AMFM CBM
40ICBM Architecture
41ICBM Model Development Cycle
42Engine diagnosis case study
- Objective was to create an ICBM model that can
continuously monitor/diagnose the state of an
engine. - Possible failure modes are Turbine deterioration
and Compressor leak. - Data (11 dimensional) includes state variables
like inter-turbine temperature, fuel flow, shaft
speed and vibration. - Two sets of features (time series and diagnostic
based) were extracted from three parameters. - Kurtosis, Spike and Trend
- Knowledge was extracted using subtractive
clustering. - AMFM was used to create the diagnosis model.
- Extracted knowledge was used to setup the
architecture - Extracted feature data was split into development
and validation data - Model parameters were initialized from the
clusters - Model was recursively developed until the desired
accuracy was reached
43Model summary
- A total of seven inputs (features) were used.
- The output is in the form of probability of
occurrence of each failure mode - Seven rules were extracted using the clustering
technique. - The network was trained for 1000 epochs
- The model is 95 accurate
- There were no false alarms
- Signal based features were extracted using
wavelet decomposition and were found to be
equally effective.
44Conclusions
- ICBM has a generic algorithmic approach to
develop maintenance solutions - A defined schema for data acquisition to model
development to accumulation of maintenance
knowledge - AMFM is capable of creating adaptive and precise
models and is a suitable tool for ICBM - The models are real time, noise tolerant and
modify-able