Finding biologically relevant information using ADIOS - PowerPoint PPT Presentation

About This Presentation
Title:

Finding biologically relevant information using ADIOS

Description:

The current state of affairs in natural language processing ... cyclin D1 is more often correlated with prognosis in cancers of ampulla of vater ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 39
Provided by: Thai7
Category:

less

Transcript and Presenter's Notes

Title: Finding biologically relevant information using ADIOS


1
Finding biologically relevant information using
ADIOS
  • ThaiBinhs final project for CBB545

April 19, 2007
2
The current state of affairs in natural language
processing
  • NLP Converting human language into
    representations that are easier for computers to
    understand
  • Most natural language processing requires a
    tagged training set
  • Tagging time consuming/costly

http//en.wikipedia.org/wiki/Natural_language_proc
essing
3
http//www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?
catalogIdLDC99T42
4
http//www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?
catalogIdLDC99T42
5
ADIOS
  • Unsupervised learning of natural languages
  • ADIOS Automatic distillation of structure
  • Input A corpus of characters (most likely,
    untagged sentences)
  • Output A grammar

Unsupervised learning of natural languages,
Solan, et al., PNAS vol. 102, August 2005.
6
A very quick primer on grammars
  • A set of rules for making a sentence
  • Ex.

The grammarS ? S SS ? 1S ? a
A possible derivationSS SS S S1 S
S1 1 S1 1 a
?
?
?
?
?
7
A very quick primer on grammars
  • We can visualize the expansion as a tree, and
    read the leaves

The grammarS ? S SS ? 1S ? a
A possible derivationSS SS S S1 S
S1 1 S1 1 a
8
A very quick primer on grammars
  • We can visualize the expansion as a tree, and
    read the leaves

The grammarS ? S SS ? 1S ? a
A possible derivationSS SS S S1 S
S1 1 S1 1 a
9
ADIOS
  • The system builds a graph using the first
    sentence
  • With each successive sentence, it tries to find
    overlapping subpaths (patterns)

10
ADIOS
  • Also try to generalize the path by looking for
    equivalence classes
  • Search for patterns and equivalence classes until
    no new ones are found

11
ADIOS A quick example
  • Input a corpus of sentences

Chong had a presentation in CBB545 on
Tuesday Chong had a presentation next
Thursday Laura has a presentation ThaiBinh
has a presentation in CBB545 ThaiBinh has
a presentation today today ThaiBinh has a
presentation Chong had a presentation
Hugo has a presentation in CBB545 today
ThaiBinh has a presentation in CBB545 today
Laura has a presentation in CBB545 next Thursday
Hugo has a presentation today Chong
had a presentation on Tuesday Chong had a
presentation in CBB545 on Tuesday Laura
has a presentation next Thursday in CBB545
ThaiBinh has a presentation today ThaiBinh
has a presentation in CBB545
12
ADIOS A quick example
  • Output is a grammar

P18 (a,presentation) P19 (E20,has,P18) E20 Hugo
,Laura,ThaiBinh P21 (Chong,had) P22 (in,CBB545)
P23 (P19,P22) P24 (P21,P18)
13
(No Transcript)
14
P18 (a,presentation) P19 (E20,has,P18) E20 Hugo
,Laura,ThaiBinh P21 (Chong,had) P22 (in,CBB545)
P23 (P19,P22) P24 (P21,P18)
15
P18 (a,presentation) P19 (E20,has,P18) E20 Hugo
,Laura,ThaiBinh P21 (Chong,had) P22 (in,CBB545)
P23 (P19,P22) P24 (P21,P18)
(P19,P22)
(E20,has,P18)
(a,presentation)
(in,CBB545)
Hugo,Laura,ThaiBinh
16
P18 (a,presentation) P19 (E20,has,P18) E20 Hugo
,Laura,ThaiBinh P21 (Chong,had) P22 (in,CBB545)
P23 (P19,P22) P24 (P21,P18)
(P21,P18)
(Chong,had)
(a,presentation)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
P18 (Chong,had,a) P19 (has,a) P20 (E21,P19,prese
ntation) E21 Hugo,Laura,ThaiBinh P22 (in,CBB545)
P23 (P20,P22) P24 (P18,presentation)
21
Two different grammars Same end result
P18 (Chong,had,a) P19 (has,a) P20 (E21,P19,prese
ntation) E21 Hugo,Laura,ThaiBinh P22 (in,CBB545)
P23 (P20,P22) P24 (P18,presentation)
P18 (a,presentation) P19 (E20,has,P18) E20 Hugo
,Laura,ThaiBinh P21 (Chong,had) P22 (in,CBB545)
P23 (P19,P22) P24 (P21,P18)
22
ADIOS
  • Able to generate sentences using the grammar it
    created
  • Can test if new sentence fits one of the grammar
    rules
  • Can be applied to wide variety of domains
  • Bible in various languages
  • Classify protein function based on amino acid
    sequence

23
The Project
  • Use ADIOS to create grammar rules from biomedical
    sentences
  • Look for gene-gene associations
  • Look for gene-disease associations
  • Infer information about a pair of genes in an
    unseen sentence based on its sentence structure
    (pattern)

24
(No Transcript)
25
AbnerFind mentions of genes
26
MetamapFind mentions of diseases
The clinical effects of cortisone and ACTH
(adrenocorticotropic hormone) in the collagen
diseases acute disseminated lupus erythematosus,
periarteritis nodosa, dermatomyositis and
scleroderma interim report. Phrase "in the
collagen diseases" Meta Candidates (6) 1000
C0009326Collagen Diseases Disease or
Syndrome Phrase "periarteritis nodosa," Meta
Candidates (4) 1000 C0031036Periarteritis
Nodosa (Polyarteritis Nodosa) Disease or
Syndrome Phrase "dermatomyositis" Meta
Candidates (2) 1000 C0011633Dermatomyositis
Disease or Syndrome 1000 C0221056Dermatomyosi
tis (Dermatomyositis, Adult Type) Disease or
Syndrome Phrase "scleroderma" Meta Candidates
(4) 1000 C0011644Scleroderma Disease or
Syndrome 1000 C0036421Scleroderma (Systemic
Scleroderma) Disease or Syndrome
27
The Project Input
  • Replace any mention of a gene with a generic term
  • Ex.

GeneOne antagonizes GeneTwo signaling in the
nucleus
GeneOne negatively regulates expression of GeneTwo
28
The Project Input
  • Replace any mention of a gene/disease with a
    generic term
  • Ex.

p16 is consistently expressed in endometrial
tubal metaplasia
GeneOne is consistently expressed in DiseaseOne
The expression of cyclin D1 is more often
correlated with prognosis in cancers of ampulla
of vater
The expression of GeneOne is more often
correlated with prognosis in DiseaseOne
29
Let ADIOS work its magic
30
Let ADIOS work its magic Out pops patterns to
describe the sentences (the grammar)
31
Tagging the patterns
GeneOne
GeneTwo
antagonizes
GeneOne
GeneTwo
negatively regulates
GeneOnez
GeneTwo
increases transcription
GeneOnez
GeneTwo
positively regulates
32
Tagging the patterns
GeneOne
GeneTwo
antagonizes
GeneOne
GeneTwo
negatively regulates
GeneOnez
GeneTwo
increases transcription
GeneOnez
GeneTwo
positively regulates
33
Tagging the patterns
inhibits
GeneOne
GeneTwo
increases transcription
GeneOne
GeneTwo
positively regulates
activates
34
Seeing a new sentence
Ras/Erk pathway positively regulates Jak1/STAT6
activity
35
Seeing a new sentence
increases transcription
GeneOne
GeneTwo
positively regulates
Ras/Erk pathway positively regulates Jak1/STAT6
activity
activates
36
Seeing a new sentence
increases transcription
Ras/Erk
Jak1/STAT6
positively regulates
activates
37
The big pictureAutomatic extraction of
regulation
Smad7 antagonizes TGF-beta signaling in the
nucleus
GeneOne action GeneTwo
Smad7 inhibit TGF-Beta
PTEN inhibit Cyclin D1
Ras/ERK activate Jak1/Stat6
PTEN negatively regulates expression of cyclin D1
Ras/Erk pathway positively regulates Jak1/STAT6
activity
Loss of p53 Expression Correlates with Neck
Cancer
GeneOne expression DiseaseOne
p53 down-regulated Neck cancer
p16 upregulated Endo. tubal metaplasia
p16 is consistently expressed in endometrial
tubal metaplasia
38
Potential (inevitable) problems
  • The data/sentences
  • Amount
  • ADIOSS data usually had 1000s of sentences
  • Quality
  • ABNER/MetaMap (used for finding
    gene/disease-mentions) are not always accurate
  • Is it even feasible?
  • Biologists/Scientists are very creative in coming
    of with various ways of saying the same thing

39
Potential (inevitable) problems
  • The data/sentences
  • Amount
  • ADIOSS data usually had 1000s of sentences
  • Quality
  • ABNER/MetaMap (used for finding
    gene/disease-mentions) are not always accurate
  • Is it even feasible?
  • Stay tuned
Write a Comment
User Comments (0)
About PowerShow.com