Title: Genetic Algorithm and Feature Selection
1Genetic Algorithm and Feature Selection
- Zheng Li
- Michigan State University
2Overview
Gene Profile
GA
DATA
FEATURE
Metabolic profile
3Gene Profile
GA
DATA
FEATURE
Metabolic profile
4Genetic Algorithm
1.Encoding
Discrete
Floating
2.Crossover
1100101111011111 11001111
3.Mutation
11001001 gt  10001001
4.Target functionprediction accuracyfeature
subset size
5GA/PCA/PLS(1.sel scores)
Loading
Encoding
Optimize
Independent variables X
GA
Y
Scores
Regression
Feature space
T3
T4
Dependent variables
T1
6GA/PCA/PLS(2.sel original vars)
Encoding
Optimize
GA
Y
PCA/PLS
Feature space
X2
Dependent variables
X1
X4
7Results GA/PCA/PLS( Intra TG)
Those flux number with selection value larger
than 0.5 is selected in the PLS model
8Results GA/PLS/PCA (Intra TG)
PLS with all variables included has a fittness
value 0.1715
Fittness is defined as sum square error of PLS
prediction
9Results GA/PLS/PCA(Urea)
10Results GA/PLS/PCA(Urea)
PLS Model With All Vars has Fittness Of -0.006.
11Results GA/PLS/PCA
12GA/MBPLS/MBPCA
Encoding group info
Optimize
GA
Y
1
2
Block1
Block2
Block score
Dependent variables
Super score
13Results GA/MBPLS
14Results GA/MBPLS
Variance explained Prediction accuracy
15GA/KPLS/KPCA/SVM
Encoding
Optimize
GA
Y
KPCA/KPLS SVM
Dependent variables
16Results GA/SVM
17Results GA/SVM
Fittness of SVM with all variables included is
0.0083
18Results GA/SVM
19Discussion Biomarker Identification
- Application of GA coupled feature selection in
Bioinformatics - Select most relevant factors from the metabolic
and genetic - profiles that can optimally characterize cellular
states. Open new avenues for identifying complex
disease genes and biomarker for disease diagnosis
and for assessing drug efficiency. - Next step is to try on gene data to identify
marker genes instead of marker metabolic fluxes. - Find an appropriate optimization function for
MBPLS/MBPCA
20GA/KPCA/KPLS/SVM
- If we are solving a problem, we are usually
looking for some solution which will be the best
among others. The space of all feasible solutions
(the set of solutions among which the desired
solution resides) is called search space (also
state space). Each point in the search space
represents one possible solution. Each possible
solution can be "marked" by its value (or
fitness) for the problem. With GA we look for the
best solution among among a number of possible
solutions - represented by one point in the
search space.
21GA/MBPLS
Y
Super score
1
2
Block1
Block2
Block score
X