Feature Selection and Pattern Discovery from Microarray Gene Expressions - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Feature Selection and Pattern Discovery from Microarray Gene Expressions

Description:

Itemsets: Collection of items. Example: {Milk, Diaper} ... of size k that could be frequent, given Fk-1. Fk = those itemsets that are actually frequent, Fk ... – PowerPoint PPT presentation

Number of Views:125

Avg rating:3.0/5.0

Slides: 35

Provided by: KUA2

Category:

more less

Transcript and Presenter's Notes

Title: Feature Selection and Pattern Discovery from Microarray Gene Expressions

1
Feature Selection and Pattern Discovery from
Microarray Gene Expressions
CSCI5980 Functional Genomics, Systems Biology
and Bioinformtics

Rui Kuang and Chad Myers
Department of Computer Science and Engineering
University of Minnesota
kuang_at_cs.umn.edu

2
Feature Selection and Pattern Discovery

Feature selection identify the features that are
most relevant for characterizing the system and
its behavior.
Often improve classification accuracy
Draw focus on the key features to explain the
system
Often used for maker gene selection in microarray
data analysis
Pattern discovery identify special associations
between features and samples.
Often subset of samples and subset of features
like in bi-clustering
However, bi-clustering is difficult but there are
efficient algorithms finding all the discretized
bi-clusters

3
Finding Disease-causing Factors

Clinical factors and personalized genetic
/genomic information

Human Disease
4
Biomarkers Personalized Medicine

Molecular traits that can characterize a certain
phenotype (what we can observe or measure)
In cancer study, biomarkers can be used for
Prognosis predict the outcome of cancer
(metastasis) for deciding how aggressive to treat
the patient.
Treatment response predict a patients response
to a certain type of treatment.
Discovery
DNA copy-number
Gene expression profiling
Proteomic profiling
Etc.

5
Biomarker Identification in a Case-Control study
Genes
Genes
Controls
Cases
6
Statistical Methods
Genes
Cases
Controls
7
Statistical Methods (Cont.)

Each gene is considered independently
Cannot detect combined markers that are highly
discriminative between the groups
Use all the samples to quantify the difference
Cannot capture markers specific to a
subpopulation.

8
Feature Selection
Genes
Cases

Controls
Search for a minimum set of genes which lead to
maximum classification performance
9
Filters vs. Wrappers

Main goal rank subsets of useful features.

10
Forward Selection

Add the highest ranked feature
Check classification performance

Classification Accuracy 75
Forward Selection
Rank features
11
Forward Selection

Add the highest ranked feature
Check classification performance
Add the next highest ranked feature

Classification Accuracy 75?95
Forward Selection
Rank features
12
Forward Selection

Add the highest ranked feature
Check classification performance
Add the next highest ranked feature

Classification Accuracy 95?80
Forward Selection
Rank features
13
Backward Elimination

Remove the lowest ranked feature
Check classification performance

Classification Accuracy 60?75
Forward Selection
Rank features
14
Backward Elimination

Remove the lowest ranked feature
Check classification performance
Remove the next lowest ranked
feature until performance worse

Classification Accuracy 95
Forward Selection
Rank features
15
Feature Selection
Slightly more sophisticated version of Forward
Selection and Backward Elimination

Forward Selection Method
Steps
Build classifiers with 1 feature and rank all
features according to the predictive power of the
classifiers
Start with the first feature
Build classifier and check classification
performance
If performance is worse then previous round, then
stop
Else add the next feature and go to a).

Backward Elimination Method
Steps
Start with all the features
For each feature x in the set
Remove x from the set
Check the classification performance of
classifier build with the set without x
Remove the feature where the remaining features
have the best classification performance.
Repeat 2-3, until the performance start to drop

16
Feature Selection Embedded Methods

Embedded methods incorporate variable selection
as part of the model building process
Example SVM feature selection

Yes, stop!
All features
No, continue
Recursive Feature Elimination (RFE) SVM.
Guyon-Weston, 2000. US patent 7,117,188
17
Embedded methods
All features
Yes, stop!
No, continue
Recursive Feature Elimination (RFE) SVM.
Guyon-Weston, 2000. US patent 7,117,188
18
RFE SVM for cancer diagnosis
Differenciation of 14 tumors. Ramaswamy et al,
PNAS, 2001
19
Feature Selection (Cont.)

NP-hard to search through all the combinations
Need heuristic solutions
The assumption is based on the maximum
classification performance.
There might be more than one subset of features
that can give the optimal classification
performance.
Dont consider the modular structure of
co-expressed genes
May omit other important genes in the maker
module

20
Bi-cluster Structure
Genes

Bi-clusterings Relevant knowledge can be hidden
in a group of genes with common pattern across a
subset of the conditions e.g. genes co-expressed
under some conditions
It is NP-hard to discover even one bi-cluster
However, in the discretized case, the optimal
solution can be found efficiently with
association rule mining algorithms

Cases
Controls
21
Association Rule Mining

Proposed by Agrawal et al in 1993.
It is an important data mining model studied
extensively by the database and data mining
community.
Assume all data are categorical.
No good algorithm for numeric data.
Initially used for Market Basket Analysis to find
how items purchased by customers are related.

22
Transaction data supermarket data

Market basket transactions
t1 bread, cheese, milk
t2 apple, eggs, salt, yogurt
tn biscuit, eggs, milk
Concepts
An item an item/article in a basket
I the set of all items sold in the store
A transaction items purchased in a basket it
may have TID (transaction ID)
A transactional dataset A set of transactions

23
Rule strength measures

Support The rule holds with support sup in T
(the transaction data set) if sup of
transactions contain X ? Y.
sup Pr(X ? Y).
Confidence The rule holds in T with confidence
conf if conf of tranactions that contain X also
contain Y.
conf Pr(Y X)
An association rule is a pattern that states when
X occurs, Y occurs with certain probability.

24
Association Rule Mining

Two types of patterns
Itemsets Collection of items
Example Milk, Diaper
Association Rules X ? Y, where X and Y are
itemsets.
Example Milk ? Diaper

Set-Based Representation of Data
25
An example
t1 Beef, Chicken, Milk t2 Beef,
Cheese t3 Cheese, Boots t4 Beef, Chicken,
Cheese t5 Beef, Chicken, Clothes, Cheese,
Milk t6 Chicken, Clothes, Milk t7 Chicken,
Milk, Clothes

Transaction data
Assume
minsup 30
minconf 80
An example frequent itemset
Chicken, Clothes, Milk sup 3/7
Association rules from the itemset
Clothes ? Milk, Chicken sup 3/7, conf 3/3
Clothes, Chicken ? Milk, sup 3/7, conf
3/3

26
Association Rule Mining

Process of finding interesting patterns
Find frequent itemsets using a support threshold
Find association rules for frequent itemsets
Sort association rules according to confidence
Support filtering is necessary
To eliminate spurious patterns
To avoid exponential search
Support has anti-monotone
property X ? Y implies ?(Y) ?(X)
Confidence is used because of its interpretation
as conditional probability

Given d items, there are 2d possible candidate
itemsets
27
The Apriori algorithm

Two steps
Find all itemsets that have minimum support
(frequent itemsets, also called large itemsets).
Use frequent itemsets to generate rules.
E.g., a frequent itemset
Chicken, Clothes, Milk sup 3/7
and one rule from the frequent itemset
Clothes ? Milk, Chicken sup 3/7, conf 3/3

28
Step 1 Mining all frequent itemsets

A frequent itemset is an itemset whose support
is minsup.
Key idea The apriori property (downward closure
property) any subsets of a frequent itemset are
also frequent itemsets

ABC ABD ACD BCD
AB AC AD BC BD CD
A B C D
29
The Algorithm

Iterative algo. (also called level-wise search)
Find all 1-item frequent itemsets then all
2-item frequent itemsets, and so on.
In each iteration k, only consider itemsets that
contain some k-1 frequent itemset.
Find frequent itemsets of size 1 F1
From k 2
Ck candidates of size k those itemsets of size
k that could be frequent, given Fk-1
Fk those itemsets that are actually frequent,
Fk ? Ck (need to scan the database once).

30
Details the algorithm

Algorithm Apriori(T)
C1 ? init-pass(T)
F1 ? f f ? C1, f.count/n ? minsup // n
no. of transactions in T
for (k 2 Fk-1 ? ? k) do
Ck ? candidate-gen(Fk-1)
for each transaction t ? T do
for each candidate c ? Ck do
if c is contained in t then
c.count
end
end
Fk ? c ? Ck c.count/n ? minsup
end
return F ? ?k Fk

31
Step 2 Generate rules from frequent itemsets

Frequent itemsets ? association rules
One more step is needed to generate association
rules
For each frequent itemset X,
For each proper nonempty subset A of X,
Let B X - A
A ? B is an association rule if
Confidence(A ? B) minconf,
support(A ? B) support(A?B) support(X)
confidence(A ? B) support(A ? B) / support(A)

32
On Apriori Algorithm

Seems to be very expensive
Level-wise search
K the size of the largest itemset
It makes at most K passes over data
In practice, K is bounded (10).
The algorithm is very fast. Under some
conditions, all rules can be found in linear
time.
Scale up to large data sets

33
More on association rule mining

Clearly the space of all association rules is
exponential, O(2m), where m is the number of
items in I.
The mining exploits sparseness of data, and high
minimum support and high minimum confidence
values.
Still, it always produces a huge number of rules,
thousands, tens of thousands, millions, ...

34
Association Rule Mining

Association analysis is mainly about finding
frequent patterns. It is not clear how to
introduce label information to find the
discriminative patterns between two classes
It is hard to extend the existing algorithms to
handle tens of thousand of non-sparse features
However, if it works, the identified patterns can
provide much more information than just a set of
selection features.

Write a Comment

User Comments (0)