Interpreting Microarray Expression Data Using Text Annotating the Genes

About This Presentation

Title:

Interpreting Microarray Expression Data Using Text Annotating the Genes

Description:

Michael Molla, Peter Andreae, Jeremy Glasner, Frederick Blattner, Jude Shavlik ... Informed by text data, the leaner can make first-pass model for the scientist ... – PowerPoint PPT presentation

Number of Views:90

Avg rating:3.0/5.0

Slides: 22

Provided by: Mol5

Learn more at: https://ftp.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Interpreting Microarray Expression Data Using Text Annotating the Genes

1
Interpreting Microarray Expression DataUsing
Text Annotating the Genes

Michael Molla, Peter Andreae, Jeremy Glasner,
Frederick Blattner, Jude Shavlik
University of Wisconsin Madison

2
The Basic Task

Given
Microarray Expression Data
Text Annotations of Genes
Generate
Model of Expression

3
Motivation

Lots of Data Available on the Internet
Microarray Expression Data
Text Annotations of Genes
Maybe we can Make the Scientists Job Easier
Generate a Model of Expression Automatically
Easier First Step for the Human

4
Microarray Expression Data

Each spot represents a gene in E. coli
Colors Indicate Up- or Down-Regulation Under
Antibiotic Shock
Four our Purpose 3 Classes
Up-Regulated
Down-Regulated
No-Change

5
Microarray Expression Data
From Genome-Wide Expression in Escheria Coli
K-12, Blattner et al., 1999
6
Our Microarray Experiment

4290 genes
574 up-regulated
333 down-regulated
2747 un-regulated
636 non enough signal

7
Text Annotations of Genes

The text from a sample SwissProt entry (b1382)
The description field
HYPOTHETICAL 6.8 KDA PROTEIN IN LDHA-FEAR
INTERGENIC REGION
The keyword field
HYPOTHETICAL PROTEIN

8
Sample Rules From a Model for Up-Regulation

IF
The annotation contains FLAGELLAR AND does NOT
contain HYPOTHETICAL
OR
The annotation contains BIOSYNTHESIS
THEN
The gene is up-regulated

9
Why use Machine Learning?

Concerned with machines learning from available
data
Informed by text data, the leaner can make
first-pass model for the scientist

10
Desired Properties of a Model

Accurate
Measure with cross validation
Comprehensible
Measure with model size
Stable to Small Changes in the Data
Measure with random subsampling

11
Approaches

Naïve Bayes
Statistical method
Uses all of the words (present or absent)
PFOIL
Covering algorithm
Chooses words to use one at a time

12
Naïve Bayes

For each word wi, there are two likelihood ratios
(lr)
lr (wi present) p(wi present up) / p(wi
present down)
lr (wi absent) p(wi absent up) / p(wi
absent down)
For each annotation, the lrs are combined to form
a lr for a gene
where X is either present or absent.

13
PFOIL

Learn rules from data
Produces multiple if-then rules from data
Builds rules by adding one word at a time
Easy to interpret models

14
Accuracy/Comprehensibility Tradeoff
15
Stabilized PFOIL

Repeatedly run PFOIL on randomly sampled subsets
For each word, count the number of models it
appears in
Restrict PFOIL to only those words that appear in
a minimum of m models
Rerun PFOIL with only those words

16
Stability Measure

After running the algorithm N times to generate
N rule sets
Where
U the set of words appearing in any rule set
count(wi) number of rule sets containing word
wi

17
Accuracy/Stability Tradeoff
18
Discussion

Not very severe tradeoffs in Accuracy
vs. stability
vs. comprehensibility
PFOIL not as good at characterizing data
suggests not many dependencies
need for softer rules

19
Future Directions

M of N rules
Permutation Test
More Sources of Text Data

20
Take-Home Message

This is just a first step toward an aid for
understanding expression data
Make expression models based on text in stead of
DNA sequence.

21
Acknowledgements

This research was funded by the following grants
NLM 1 R01 LM07050-01,
NSF IRI-9502990,
NIH 2 P30 CA14520-29, and
NIH 5 T32 GM08349.

Write a Comment

User Comments (0)

About PowerShow.com

Interpreting Microarray Expression Data Using Text Annotating the Genes - PowerPoint PPT Presentation

Interpreting Microarray Expression Data Using Text Annotating the Genes

Michael Molla, Peter Andreae, Jeremy Glasner, Frederick Blattner, Jude Shavlik ... Informed by text data, the leaner can make first-pass model for the scientist ... – PowerPoint PPT presentation