Information Extraction from Biomedical Text - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Information Extraction from Biomedical Text

Description:

Information Extraction from Biomedical Text Jerry R. Hobbs Artificial Intelligence Center SRI International – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 18
Provided by: Davi1327
Category:

less

Transcript and Presenter's Notes

Title: Information Extraction from Biomedical Text


1
Information Extraction from Biomedical Text
  • Jerry R. Hobbs
  • Artificial Intelligence Center
  • SRI International

2
Introduction
  • Information Extraction
  • Extract entities, relations, events
  • Capture structured information
  • Domain specific
  • Focus only relevant parts
  • Mainly on economic and military interest?
  • Biomedical domain

3
Cascaded Finite-State Transducers
  • Separate Processing into several stages
  • FASTUS (Finite-State Automaton Text Understanding
    System)
  • Earlier Stages
  • Smaller linguistic objects
  • Domain independent
  • Later Stages
  • Domain dependent patterns

4
Cascaded Finite-State Transducers
  • Complex Words
  • Basic Phrases
  • Complex phrases
  • Domain Patterns
  • Merging Structures

5
Example
  • gamma-Glutamyl kinase, the 1st enzyme of
    the proline biosynthetic pathway, was puried to a
    homogeneity from an Escherichia coli strain
    resistant to the proline analog
    3,4-dehydroproline. The enzyme had a native
    molecular weight of 236,000 and was apparently
    comprised of six identical 40,000-dalton subunits.

6
Target Database
  • Reaction Object
  • Attributes ID
  • Pathway
  • Enzyme
  • ..
  • Enzyme Object
  • Attribute ID
  • Name
  • Molecular-Weight
  • Subunit-Component
  • Subunit-Number

7
Complex Words
  • Recognizes
  • multiword fixed phrases
  • proper names
  • Rich in the biological domain
  • Use lexicon or ML and Statistic methods
  • gamma-Glutamyl kinase,
  • the 1st enzyme of the proline biosynthetic
  • pathway, was purified to a homogeneity
  • from an Escherichia coli strain resistant to
  • the proline analog 3,4-dehydroproline.
  • The enzyme had a native molecular
  • weight of 236,000 and was apparently
  • comprised of six identical
  • 40,000-dalton subunits.
  • gamma-Glutamyl kinase,
  • the 1st enzyme of the proline biosynthetic
  • pathway, was purified to a homogeneity
  • from an Escherichia coli strain resistant to
  • the proline analog 3,4-dehydroproline.
  • The enzyme had a native molecular
  • weight of 236,000 and was apparently
  • comprised of six identical
  • 40,000-dalton subunits.

8
Basic Phrases
  • Segment a sentence into noun groups, verb groups,
    and particles
  • Use Sager 1981 grammar

9
Complex Phrases
10
Complex Phrases
  • Structures of basic and complex phrases, entities
    and events

11
Clause-Level Domain Patterns
  • The enzyme had a native molecular weight of
    236,000 and was apparently comprised of six
    identical 40,000-dalton subunits.

12
Clause-Level Domain Patterns
  • The enzyme had a native molecular weight of
    236,000 and was apparently comprised of six
    identical 40,000-dalton subunits.

13
Merging Structures
  • First 4 levels processes within single sentence
  • This level collect and combine information for
    on entity or relationship
  • Three Criteria
  • The internal structure of noun groups
  • The nearness along some metric
  • Consistency and compatibility of the 2 structures

14
(No Transcript)
15
Compile Time Transformations
  • Subject-Verb-Object pattern ? linguistic patterns
    (passive, relative clauses, etc)

16
Types of Specialized Domains
  • noun-driven approach
  • The type of an entity is highly predictive of its
    role in event
  • Loose S-V-O patterns
  • verb-driven approach
  • The role of the entities in events cannot be
    predicted from their type
  • Tight S-V-O patterns

17
Limitation of IE Technology
  • MUC (1990)
  • Name recognition 95 recall and precision
  • Event recognition 60 recall and precision
  • Possible reasons
  • Process of merging
  • Only works with explicit information
  • Common cases are covered, how about those rare
    cases?
Write a Comment
User Comments (0)
About PowerShow.com