Open Information Extraction From The Web - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Open Information Extraction From The Web

Description:

Open Information Extraction From The Web Rani Qumsiyeh What is Information Extraction This article surveys a range of Information Extraction methods. – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 12
Provided by: Rani157
Category:

less

Transcript and Presenter's Notes

Title: Open Information Extraction From The Web


1
Open Information Extraction From The Web
  • Rani Qumsiyeh

2
What is Information Extraction
  • This article surveys a range of Information
    Extraction methods. (Particularly Open)
  • A venerable technology that maps natural language
    text into structured relational data.
  • Open Information Extraction is where the
    identities of the relations to be extracted are
    unknown and the billions of documents found on
    the Web necessitate highly scalable processing.

3
Most common Ways to do IE
  • Direct knowledge-based encoding.
  • A human enters regular expressions or rules.
  • Supervised learning.
  • A human provides labeled training examples.
  • Self-supervised learning.
  • The system automatically finds and labels its own
    examples.

4
Direct Knowledge
  • Not efficient, has to be altered for different
    domains.
  • Class PhysicalTarget space to the term bank in
    the terrorism domain.
  • Class Corporation in the joint-ventures domain

5
Example of Supervised Learning
6
Self Supervised Knowledge
  • A system that labels its own training examples.
    (Example KnowItAll)
  • For a given relation
  • Use generic pattern ? instantiate
    relation-specific extraction rules ? learn
    domain-specific extraction rules ? apply rules to
    web pages and assign them probabilities.
  • Example X is a Y (X is a country).
  • China is a country.
  • Garth Brooks is a country singer

7
Open Information Extraction
  • The challenge of Web extraction is to be able to
    do Open Information Extraction.
  • Unbounded number of relations
  • Web corpus contains billions of documents.

8
How open IE systems work
  • learn a general model of how relations are
    expressed (in a particular language), based on
    unlexicalized features such as part-of-speech
    tags. (Identify a verb)
  • Learn domain-independent regular expressions.
    (Punctuations, Commas).

9
Is there a general model of relationships in
English
10
TextRunner
  • Works in two phases.
  • Using a conditional random field, the extractor
    learns to assign labels to each of the words in a
    sentence.
  • Extracts one or more textual triples that aim to
    capture (some of) the relationships in each
    sentence.

11
Additional Tasks to Accomplish
  • Opinion mining in which Open IE can extract
    opinion information about particular objects
    (including products, political candidates, and
    more) that are contained in blog, posts, reviews,
    and other texts.
  • Fact checking in which Open IE can identify
    assertions that directly or indirectly conflict
    with the body of knowledge extracted from the Web
    and various other knowledge bases.
Write a Comment
User Comments (0)
About PowerShow.com