Automating Discovery from Biomedical Texts - PowerPoint PPT Presentation

About This Presentation

Title:

Automating Discovery from Biomedical Texts

Description:

UIs for building and reusing hypothesis seeking strategies. ... PAP. h? PSA. Kall. PAP. g? Other possibilities as well. Make use of the literature ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 36

Provided by: melody87

Learn more at: https://people.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Automating Discovery from Biomedical Texts

1
Automating Discovery from Biomedical Texts

Marti Hearst Barbara Rosario
UC Berkeley
Agyinc Visit
August 16, 2000

2
The LINDI ProjectLinking Information for New
Discoveries
Two Main Thrusts

UIs for building and reusing hypothesis seeking
strategies.

Statistical language analysis techniques for
extracting propositions

3
Scenario Explore Functions of a Gene

Objective
Determine the functions of a newly sequenced Gene
X.
Known facts
Gene X co-expresses (activated in the same cell)
with Gene A, B, C
The relationship of Gene A, B, C with certain
types of diseases (from medical literature)
Question
What types of diseases are Gene X related to?

4
Gene Co-expressionRole in the genetic pathway
Kall.
Kall.
g?
h?
PSA
PSA
PAP
PAP
g?
Other possibilities as well
5
Make use of the literature

Look up what is known about the other genes.
Different articles in different collections
Look for commonalities
Similar topics indicated by Subject Descriptors
Similar words in titles and abstracts
adenocarcinoma, neoplasm, prostate, prostatic
neoplasms, tumor markers, antibodies ...

6
Developing Strategies

Different strategies seem needed for different
situations
First see what is known about Kallikrein.
7341 documents. Too many
AND the result with disease category
If result is non-empty, this might be an
interesting gene
Now get 803 documents

7
Explore Functions of New Gene X
Medical Literature
Query
Projection
Mapping
Slide adapted from K. Patel
8
Developing Strategies

Different strategies seem needed for different
situations
First see what is known about Kallikrein.
7341 documents. Too many
AND the result with disease category
If result is non-empty, this might be an
interesting gene
Now get 803 documents
AND the result with PSA
Get 11 documents. Better!

9
Explore Functions of New Gene X
Medical Literature
Query
Projection
Intersection
10
Developing Strategies

Look for commalities among these documents
Manual scan through 100 category labels
Would have been better if
Automatically organized
Intersections of important categories scanned
for first

11
Explore Functions of New Gene X
Medical Literature
Query
Projection
Intersection
Slicing
Mapping
Slide adapted from K. Patel
12
Try a new tack

Researcher uses knowledge of field to realize
these are related to prostate cancer and
diagnostic tests
New tack intersect search on all three known
genes
Hope they all talk about diagnostics and prostate
cancer
Fortunately, 7 documents returned
Bingo! A relation to regulation of this cancer

13
Explore Functions of New Gene X
Medical Literature
Possible Function For Gene-X
Query
Query
Projection
Intersection
Slicing
Mapping
Slide adapted from K. Patel
14
Formulate a Hypothesis

Hypothesis mystery gene has to do with
regulation of expression of genes leading to
prostate cancer
New tack do some lab tests
See if mystery gene is similar in molecular
structure to the others
If so, it might do some of the same things they
do

15
Strategies again

In hindsight, combining all three genes was a
good strategy.
Store this for later
Might not have worked
Need a suite of strategies
Build them up via experience and a good UI

16
The System

Doing the same query with slightly different
values each time is time-consuming and tedious
Same goes for cutting and pasting results
IR systems dont support varying queries like
this very well.
Each situation is a bit different
Some automatic processing is needed in the
background to eliminate/suggest hypotheses

17
The User Interface

A general search interface should support
History
Context
Comparison
Operators Intersection, Union, Slicing
Operator Reuse
Visualization (where appropriate)
We have an initial implementation
It needs lots of work

18
Architecture of LINDI UI

Data Layer
Annotation Layer
User Interface Layer

19
Data Layer

Purpose
Hide different formats of text collections
Components
Data Abstractions representing records of a text
collection
Operations performed on the data
Data
A set of records
Each record is a set of tuples with types
Operations
union, intersection, projection, mapping

20
Annotation Layer

Purpose
Associate data set with operations that produced
them (history)
History is a first class object
Advantage
Streamline a sequence of operations
Reuse operations
Parameterize operations

21
User Interface

Direct manipulation of information objects and
access operations
Query
Intersection
Union
Mapping
Slicing
Record and reuse of past operations
Parameterization of operations
Streamlining of operations

22
Initial Palette
23
Query Structure Determined by Collection Type
24
Query Operation Results
25
Projection Operation and Subsequent Results
26
Parameterized Query Repeat operations with
different values
GA
GB
GC
27
Intersection over Projected Attribute
28
Intersection over Projected Attribute
29
Example Interaction with UI Prototype
1 Query on Gene names 2 Project out only mesh
headings 3 Intersect the results 4 Map to create
a ranking 5 Slice out the top-ranked.
30
Future Work on UI