Relational Learning via Propositional Algorithms: An Information Extraction Case Study

1 / 25

About This Presentation

Title:

Relational Learning via Propositional Algorithms: An Information Extraction Case Study

Description:

System Electronics Laboratory. Seoul National University. Relational Learning via ... Discriminates a specific desired fragment. It could be achieved by ... –

Number of Views:34

Avg rating:3.0/5.0

Slides: 26

Provided by: ms1113

Category:

more less

Transcript and Presenter's Notes

Title: Relational Learning via Propositional Algorithms: An Information Extraction Case Study

1
Relational Learning via Propositional Algorithms
An Information Extraction Case Study

Chen Xi
System Electronics Laboratory
Seoul National University, Korea
9 Oct 2007

2
Contents

Introduction
Relational Leaning
ILP method
Contribution of this paper
Propositional Relational Representations
Relation Generation Functions
Comparison to ILP Methods
Case Study Information Extraction
Problem description
Extracting Relational Features
Two-stage Architecture
Experimental Results
Conclusion of the paper
Summary

3
Introduction 1 of 3

Relational Learning
Machine learning
Problem of learning structured concept
definitions from structured examples
Describing problems
First order model
Focus
Objects
Object relations
Appliance
Natural Language understanding
Visual interpretation and planning

4
Introduction 2 of 3

ILP (Inductive Logic Programming) Method
Formed at the intersection of Machine learning
and Logic programmingz
Addresses Relational Learning
Generate relational descriptions
ILP systems develop predicate descriptions from
Examples
Background knowledge
Limitations (From the paper)
Inflexibility, brittleness and inefficient
generic
Researchers have to develop their own problem
specific ILP systems
Successful Applications
Structure-activity rules for drug design
Finite-element mesh analysis design rules
Primary-secondary prediction of protein structure
Fault diagnoses rules for satellites
Useful link
http//www.doc.ic.ac.uk/shm/ilp.html

5
Introduction 3 of 3

Contribution of this paper
Develops a different paradigm for relational
learning
Enable
General purpose propositional algorithm
Efficient propositional algorithm
Suggests alternatives for
Limited ILP systems
Advantages
More flexible
Allows use of any propositional algorithm
Maintains advantages of ILP approaches
Allows more efficient learning
Improves expressivity and robustness

6
Propositional Relational Representations

Background
Two components of a knowledge representation
language
A subset of first order logic (FOL)
A collection of structures (Graphs) defined over
elements in the domain
Domain D
Relational Language R with respect to D
Restricted (Function free) first order language
Restrictions are applied by limiting the formulae
allowed in the language to a collection of
formulae that can be evaluated very efficiently
on given instances.
Achieved by
Defining primitive formulae with limited scope of
the quantifiers
General formulae are defined inductively in terms
of primitive formulae in a restricted way that
depends on the relational structures in the
domain.

7
Propositional Relational Representations

Proposition
Variable-less atomic formula
Quantified Proposition
A quantified atomic formula

8
Propositional Relational Representations

Domain D (V,G)
V consists
words TIME,,3,30,pm/phrase 330 pm
G consists
Two lists (Solid line dashed line)

9
Propositional Relational Representations

Types of elements
To classify elements in the language according to
their properties
In Set V for 2 types
Objects
Attributes
Type 1 predicates
First argument an element in O
Second argument an element in A
P(o,a) ? ? P(o) a
Pg(O1,O2) indicates that O1 and O2are nodes in
the graph and there is an edge
between them

10
Propositional Relational Representations

Given an instance x, a formula F in R is given a
unique truth value, the value of F on x, defined
inductively using the truth values of the
predicates in F and the semantics of the
connectives.

11
Propositional Relational Representations
12
Relation Generation Functions

Active relations
Word(TIME), word(PM), number(30)
Relations (Formulae)
Active is important
Inactive relations vastly outnumber active
relations

13
Relation Generation Functions

Example
RGF (Relation generation function) generates
active relations number(3), number(30)
RGFs are defined inductively using a relational
calculus
Basic RGFs, called sensors
A sensor is a way to encode basic information one
can extract from an instance.
A set of connectives

14
Relation Generation Functions

Example following are some sensors that are
commonly used in NLP
The word sensor over word elements, which outputs
active relations word(TIME), word(),word(3),word(
30), and word(pm) from TIME 30pm
The length sensor over phrase elements, which
outputs active relations len(4) from 330 pm
The is-a sensor outputs the semantic class of a
word
The tag sensor outputs the part-of-speech tag of
a word

15
Relation Generation Functions

The relational calculus allows one to inductively
generate new RGFs by applying connective and
quantifiers over existing RGFs.

16
Relation Generation Functions

Example
When applied with respect to the graph g which
represents the linear structure of the sentence,
collocg simply generates formulae that
corresponds to ngrams.
Dr John Smith, colloc(word, word) extracts the
bigrams word(Dr)-word(John) and
word(John)-word(Smith)

17
Comparison to ILP Methods

Search
Features are generated as part of the search
procedure (ILP)
Features tried by an ILP program during its
search are generated up front (in a data driven
way) by the RGFs.
Knowledge
Ability to incorporate background knowledge.
(ILP)
Using sensors allows treating information that is
readily available in the input, external
information or even previously learned concepts
in a uniform way
Expressivity (I)
Same as ILP
Learning
Provides a uniform domain for different learning
algorithms(RGF)
Applying different algorithms is easy and
straightforward
Expressivity (II)
The constructs of colloc and scolloc allows
generating relational features which are
conjunctions of predicates and are thus similar
to a clause in the output representation of an
ILP program

18
Case Study Information Extraction

Problem Description
Information Extraction task is defined as
locating specific fragments of an article
according to predefined slots in a template
Data used in experiments is a set of 485 seminar
announcements from CMU
Extract four types of fragments from each article
Starting time of the seminar (stime)
end time of the seminar (etime)
Its location (location)
Seminars speaker (speaker)
Given an article, out system picks at most one
fragment for one slot. If this fragment
represents the slot
It is a correct prediction
Otherwise
It is a wrong prediction

19
Case Study Information Extraction

Extracting Relational Features
Classifier
Discriminates a specific desired fragment
It could be achieved by
Identifying candidate fragments in the document
For each candidate fragment, use the defined RGF
features to re-represent it as an example which
consists of al active features extracted this
fragment
Let f (ti, ti1,, tj) be a fragment, with ti
representing tokens and i, i1,, j are positions
of tokens in the document.
RGFs are defined to extract features from three
regions
Left window
Target fragment
Right window

20
Case Study Information Extraction

Two-stage Architecture
Filtering
Reduce the amount of candidates from all possible
fragments to a small number
Classifying
Pick the right fragment from the preserved
fragments by the learned classifier.

21
Two stage architecture

Filtering
Positive examples
Fragments that represent legitimate slots
Negative examples
Irrelevant fragments
Two learned classifiers and a fragment is
filtered out if it meets one of the following
criteria
Single feature classifier Fragment doesnt
contain a feature that should be active in
positive examples
General Classifier Fragments confidence value
is below the threshold

22
Two stage architecture

Classifying
Use the survived fragments in the first stage
Using SNoW classifier
A multi-class classifier that is specifically
tailored for large scale learning tasks.
Roth,1998Carleson et al., 1999
First step
An additional collection of RGFs is applied to
enhance the representation of the candidate
fragments
In training
Remaining fragments are annotated and are used as
positive or negative examples
In testing
Remaining fragments are evaluated on the learned
classifiers to determine if they can fill one of
the slots

23
Two stage architecture

Classifying -- RGFs added
Etime and Stime
Scollocwordloc(-1)l_window,wordr_window
A sparse structural conjunction of the word
directly left of the target region, and of words
and tags in the right window
Scollocwordloc(-1)l_window,tager_window
A sparse structural conjunction of the last two
words in the left window, a tag in the target,
and the first tag is the right window.
Location and speaker
Scollocwordloc(-2)l_window,tatarg,tageloc(1
)r_window
Result
A decision is made by each of the 4 classifiers.
A fragment is classified as type t if tth
classifier decides so. At most one fragment of
type t is chosen in each article, based on the
activation value of the corresponding classifier.

24
Experimental Results

Two tested propositional algorithm
SNoW-IE -- good performance
NB-IE (Naïve Bayes) -- not as good as the one
above
Relationship between architecture and result
NO provement

25
Conclusion of the paper

Contribution
The paradigm has different tradeoffs than the ILP
approaches.
More flexible
By using any propositional algorithm within it
including probabilistic approaching
The paradigm is exemplified on Information
Extraction.
A new approach to learning for IE.

Write a Comment

User Comments (0)