Relational Learning via Propositional Algorithms: An Information Extraction Case Study

1 / 25
About This Presentation
Title:

Relational Learning via Propositional Algorithms: An Information Extraction Case Study

Description:

System Electronics Laboratory. Seoul National University. Relational Learning via ... Discriminates a specific desired fragment. It could be achieved by ... –

Number of Views:34
Avg rating:3.0/5.0
Slides: 26
Provided by: ms1113
Category:

less

Transcript and Presenter's Notes

Title: Relational Learning via Propositional Algorithms: An Information Extraction Case Study


1
Relational Learning via Propositional Algorithms
An Information Extraction Case Study
  • Chen Xi
  • System Electronics Laboratory
  • Seoul National University, Korea
  • 9 Oct 2007

2
Contents
  • Introduction
  • Relational Leaning
  • ILP method
  • Contribution of this paper
  • Propositional Relational Representations
  • Relation Generation Functions
  • Comparison to ILP Methods
  • Case Study Information Extraction
  • Problem description
  • Extracting Relational Features
  • Two-stage Architecture
  • Experimental Results
  • Conclusion of the paper
  • Summary

3
Introduction 1 of 3
  • Relational Learning
  • Machine learning
  • Problem of learning structured concept
    definitions from structured examples
  • Describing problems
  • First order model
  • Focus
  • Objects
  • Object relations
  • Appliance
  • Natural Language understanding
  • Visual interpretation and planning

4
Introduction 2 of 3
  • ILP (Inductive Logic Programming) Method
  • Formed at the intersection of Machine learning
    and Logic programmingz
  • Addresses Relational Learning
  • Generate relational descriptions
  • ILP systems develop predicate descriptions from
  • Examples
  • Background knowledge
  • Limitations (From the paper)
  • Inflexibility, brittleness and inefficient
    generic
  • Researchers have to develop their own problem
    specific ILP systems
  • Successful Applications
  • Structure-activity rules for drug design
  • Finite-element mesh analysis design rules
  • Primary-secondary prediction of protein structure
  • Fault diagnoses rules for satellites
  • Useful link
  • http//www.doc.ic.ac.uk/shm/ilp.html

5
Introduction 3 of 3
  • Contribution of this paper
  • Develops a different paradigm for relational
    learning
  • Enable
  • General purpose propositional algorithm
  • Efficient propositional algorithm
  • Suggests alternatives for
  • Limited ILP systems
  • Advantages
  • More flexible
  • Allows use of any propositional algorithm
  • Maintains advantages of ILP approaches
  • Allows more efficient learning
  • Improves expressivity and robustness

6
Propositional Relational Representations
  • Background
  • Two components of a knowledge representation
    language
  • A subset of first order logic (FOL)
  • A collection of structures (Graphs) defined over
    elements in the domain
  • Domain D
  • Relational Language R with respect to D
  • Restricted (Function free) first order language
  • Restrictions are applied by limiting the formulae
    allowed in the language to a collection of
    formulae that can be evaluated very efficiently
    on given instances.
  • Achieved by
  • Defining primitive formulae with limited scope of
    the quantifiers
  • General formulae are defined inductively in terms
    of primitive formulae in a restricted way that
    depends on the relational structures in the
    domain.

7
Propositional Relational Representations
  • Proposition
  • Variable-less atomic formula
  • Quantified Proposition
  • A quantified atomic formula

8
Propositional Relational Representations
  • Domain D (V,G)
  • V consists
  • words TIME,,3,30,pm/phrase 330 pm
  • G consists
  • Two lists (Solid line dashed line)

9
Propositional Relational Representations
  • Types of elements
  • To classify elements in the language according to
    their properties
  • In Set V for 2 types
  • Objects
  • Attributes
  • Type 1 predicates
  • First argument an element in O
  • Second argument an element in A
  • P(o,a) ? ? P(o) a
  • Pg(O1,O2) indicates that O1 and O2are nodes in
    the graph and there is an edge
    between them

10
Propositional Relational Representations
  • Given an instance x, a formula F in R is given a
    unique truth value, the value of F on x, defined
    inductively using the truth values of the
    predicates in F and the semantics of the
    connectives.

11
Propositional Relational Representations
12
Relation Generation Functions
  • Active relations
  • Word(TIME), word(PM), number(30)
  • Relations (Formulae)
  • Active is important
  • Inactive relations vastly outnumber active
    relations

13
Relation Generation Functions
  • Example
  • RGF (Relation generation function) generates
    active relations number(3), number(30)
  • RGFs are defined inductively using a relational
    calculus
  • Basic RGFs, called sensors
  • A sensor is a way to encode basic information one
    can extract from an instance.
  • A set of connectives

14
Relation Generation Functions
  • Example following are some sensors that are
    commonly used in NLP
  • The word sensor over word elements, which outputs
    active relations word(TIME), word(),word(3),word(
    30), and word(pm) from TIME 30pm
  • The length sensor over phrase elements, which
    outputs active relations len(4) from 330 pm
  • The is-a sensor outputs the semantic class of a
    word
  • The tag sensor outputs the part-of-speech tag of
    a word

15
Relation Generation Functions
  • The relational calculus allows one to inductively
    generate new RGFs by applying connective and
    quantifiers over existing RGFs.

16
Relation Generation Functions
  • Example
  • When applied with respect to the graph g which
    represents the linear structure of the sentence,
    collocg simply generates formulae that
    corresponds to ngrams.
  • Dr John Smith, colloc(word, word) extracts the
    bigrams word(Dr)-word(John) and
    word(John)-word(Smith)

17
Comparison to ILP Methods
  • Search
  • Features are generated as part of the search
    procedure (ILP)
  • Features tried by an ILP program during its
    search are generated up front (in a data driven
    way) by the RGFs.
  • Knowledge
  • Ability to incorporate background knowledge.
    (ILP)
  • Using sensors allows treating information that is
    readily available in the input, external
    information or even previously learned concepts
    in a uniform way
  • Expressivity (I)
  • Same as ILP
  • Learning
  • Provides a uniform domain for different learning
    algorithms(RGF)
  • Applying different algorithms is easy and
    straightforward
  • Expressivity (II)
  • The constructs of colloc and scolloc allows
    generating relational features which are
    conjunctions of predicates and are thus similar
    to a clause in the output representation of an
    ILP program

18
Case Study Information Extraction
  • Problem Description
  • Information Extraction task is defined as
    locating specific fragments of an article
    according to predefined slots in a template
  • Data used in experiments is a set of 485 seminar
    announcements from CMU
  • Extract four types of fragments from each article
  • Starting time of the seminar (stime)
  • end time of the seminar (etime)
  • Its location (location)
  • Seminars speaker (speaker)
  • Given an article, out system picks at most one
    fragment for one slot. If this fragment
    represents the slot
  • It is a correct prediction
  • Otherwise
  • It is a wrong prediction

19
Case Study Information Extraction
  • Extracting Relational Features
  • Classifier
  • Discriminates a specific desired fragment
  • It could be achieved by
  • Identifying candidate fragments in the document
  • For each candidate fragment, use the defined RGF
    features to re-represent it as an example which
    consists of al active features extracted this
    fragment
  • Let f (ti, ti1,, tj) be a fragment, with ti
    representing tokens and i, i1,, j are positions
    of tokens in the document.
  • RGFs are defined to extract features from three
    regions
  • Left window
  • Target fragment
  • Right window

20
Case Study Information Extraction
  • Two-stage Architecture
  • Filtering
  • Reduce the amount of candidates from all possible
    fragments to a small number
  • Classifying
  • Pick the right fragment from the preserved
    fragments by the learned classifier.

21
Two stage architecture
  • Filtering
  • Positive examples
  • Fragments that represent legitimate slots
  • Negative examples
  • Irrelevant fragments
  • Two learned classifiers and a fragment is
    filtered out if it meets one of the following
    criteria
  • Single feature classifier Fragment doesnt
    contain a feature that should be active in
    positive examples
  • General Classifier Fragments confidence value
    is below the threshold

22
Two stage architecture
  • Classifying
  • Use the survived fragments in the first stage
  • Using SNoW classifier
  • A multi-class classifier that is specifically
    tailored for large scale learning tasks.
    Roth,1998Carleson et al., 1999
  • First step
  • An additional collection of RGFs is applied to
    enhance the representation of the candidate
    fragments
  • In training
  • Remaining fragments are annotated and are used as
    positive or negative examples
  • In testing
  • Remaining fragments are evaluated on the learned
    classifiers to determine if they can fill one of
    the slots

23
Two stage architecture
  • Classifying -- RGFs added
  • Etime and Stime
  • Scollocwordloc(-1)l_window,wordr_window
  • A sparse structural conjunction of the word
    directly left of the target region, and of words
    and tags in the right window
  • Scollocwordloc(-1)l_window,tager_window
  • A sparse structural conjunction of the last two
    words in the left window, a tag in the target,
    and the first tag is the right window.
  • Location and speaker
  • Scollocwordloc(-2)l_window,tatarg,tageloc(1
    )r_window
  • Result
  • A decision is made by each of the 4 classifiers.
    A fragment is classified as type t if tth
    classifier decides so. At most one fragment of
    type t is chosen in each article, based on the
    activation value of the corresponding classifier.

24
Experimental Results
  • Two tested propositional algorithm
  • SNoW-IE -- good performance
  • NB-IE (Naïve Bayes) -- not as good as the one
    above
  • Relationship between architecture and result
  • NO provement

25
Conclusion of the paper
  • Contribution
  • The paradigm has different tradeoffs than the ILP
    approaches.
  • More flexible
  • By using any propositional algorithm within it
    including probabilistic approaching
  • The paradigm is exemplified on Information
    Extraction.
  • A new approach to learning for IE.
Write a Comment
User Comments (0)
About PowerShow.com