Title: Machete: Charting Excursions through Bioscience Literature
1Machete Charting Excursions through Bioscience
Literature
- Shannon Bradshaw1 and Marc Light2
- 1Department of Management Sciences
- 2School of Library and Information Science
- 2Department of Linguistics
- 1,2Department of Computer Science
- The University of Iowa
2Motivation
- In Biology, 40,000 new articles are published
each month (Fontanelo 2004) - Biologists spend as much as 20 of their time
gathering information (Hayes 2004) - For evolutionary biologists, the problem is
especially severe - A gene or protein of interest may lead them to
organisms (areas of the literature) with which
they have no familiarity
3Real-world example
- In recent wet lab work a graduate student
discovered proteins called ankyrins in the
organism he was studying - Learning the function of ankyrins and what is
known about ankyrin repeats was critical to his
research - He performed a time-consuming web/literature
search to uncover in what organisms these
proteins occur and what is known about them
4Personal Information Management
- Problem It is difficult for biologists to
assemble needed information and save it for later
review - Approach Developing a PIM application that
enables users to - Assemble and organize information addressing
specific information needs - Save it locally
- Make it searchable
5Text Mining
- Problem
- Too many relevant articles to sift through
- It is frequently not an entire article that an
investigator needs but a sentence or paragraph
within that article - Approach Developing text mining tools that allow
the user of the PIM application to - Find relevant information faster
- With greater coverage
6Knowledge transfer/management
- Problem The same information is gathered and
reviewed by hundreds and perhaps thousands of
people, often by people in the same research
laboratory - Approach Developing a client-server system that
- Gives users access to information assembled by
others with similar needs - Integrates this access into the PIM application.
7PIM Application with Text Mining
8Knowledge Management System
Ankyrins
Ankyrins KA
Ankyrins
Ankyrins
Ankyrins
9More specifically
10(No Transcript)
11(No Transcript)
12The text-mining system highlights passages of
potential interest as documents are viewed.
13(No Transcript)
14Researcher uses text-mining tools to extract
useful structured information
15Text Mining
Protein Pmid
6 IkappaBepsilon 94005619
5 Pp40 1533932
gt1 P85 11399775
gt1 Swi6 9521763
5 MAD-3 1891714
gt1 p16INK4a 7780957
gt1 p15INK4b 7780957
gt1 p16C 10843863
16Text Mining
- prot interacts-with prot
- prot inhibits prot
- prot ltfunctiongt organella
17Researcher selects highlighted passages/structured
data he finds useful.
18He manually highlights any other passages he
finds useful
19(No Transcript)
20He also highlights a few useful figures.
21(No Transcript)
22and selects some web-based information to be
saved.
23(No Transcript)
24Knowledge Artifact
Protein Pmid
6 IkappaBepsilon 94005619
5 Pp40 1533932
gt1 P85 11399775
gt1 Swi6 9521763
25(No Transcript)
26Instead of doing the digging again
27Others can reuse this
28More Information
- http//dollar.biz.uiowa.edu/sbradsha/Machete
- shannon-bradshaw_at_uiowa.edu