Machine Learning for Information Extraction - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Machine Learning for Information Extraction

Description:

IE for Free Text. Syntactic and semantic constraints. AutoSlog. LIEP. PALKA. CRYSTAL ... Bottom-up search. RAPIER. WHISK. Single-slot extraction rules. SRV. RAPIER ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 16
Provided by: Yihon6
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning for Information Extraction


1
Machine Learning for Information Extraction
  • Li Xu

2
Objective
  • Learn how to apply the machine learning concept
    to the application
  • Learn how to improve the performance of the
    existed application by applying the machine
    learning algorithms

3
Introduction
  • Information Extraction (IE) is concerned with
    extracting the relevant data from a collection of
    document.
  • Key component extraction patterns.
  • Machine Learning algorithms.

4
IE for Free Text
  • Syntactic and semantic constraints
  • AutoSlog
  • LIEP
  • PALKA
  • CRYSTAL
  • CRYSTAL Webfoot
  • HASTEN

5
IE from online Document
  • WHISK (Soderland 1998)
  • Domain Rental Ads
  • Precision 95 Recall 73-90
  • RAPIER (Califf Mooney 1997)
  • Domain software jobs
  • Precision 84 Recall 53
  • SRV (Freitag 1998)
  • Domain Seminar announcement
  • Precision Speaker, 75 Location,75 start time
    99, end time 96.

6
WHISK
7
RAPIER
8
SRV
9
Problems
  • Bottom-up search
  • RAPIER
  • WHISK
  • Single-slot extraction rules
  • SRV
  • RAPIER
  • Heavily depend on the layout pattern

10
Obituary Ontology
11
Improvement
12
Lexical Object
  • Relational Learning
  • FOIL
  • Feature design
  • Regular expression
  • Rote Learning

13
Multi-slot Hierarchy
14
Multi-slot Boundary
  • Relational Learning
  • Feature Design
  • Individual heuristics
  • Combining heuristics

15
Conclusion
  • How to applying the machine learning algorithm to
    IE?
  • What is the problem for each system?
  • How to improve an existed IE approach through
    machine learning? And how to avoid the problems
    appeared in other machine learning based IE
    systems?
Write a Comment
User Comments (0)
About PowerShow.com