Title: Relational Learning via Propositional Algorithms: An Information Extraction Case Study
1Relational Learning via Propositional Algorithms
An Information Extraction Case Study
- Chen Xi
- System Electronics Laboratory
- Seoul National University, Korea
- 9 Oct 2007
2Contents
- Introduction
- Relational Leaning
- ILP method
- Contribution of this paper
- Propositional Relational Representations
- Relation Generation Functions
- Comparison to ILP Methods
- Case Study Information Extraction
- Problem description
- Extracting Relational Features
- Two-stage Architecture
- Experimental Results
- Conclusion of the paper
- Summary
3Introduction 1 of 3
- Relational Learning
- Machine learning
- Problem of learning structured concept
definitions from structured examples - Describing problems
- First order model
- Focus
- Objects
- Object relations
- Appliance
- Natural Language understanding
- Visual interpretation and planning
4Introduction 2 of 3
- ILP (Inductive Logic Programming) Method
- Formed at the intersection of Machine learning
and Logic programmingz - Addresses Relational Learning
- Generate relational descriptions
- ILP systems develop predicate descriptions from
- Examples
- Background knowledge
- Limitations (From the paper)
- Inflexibility, brittleness and inefficient
generic - Researchers have to develop their own problem
specific ILP systems - Successful Applications
- Structure-activity rules for drug design
- Finite-element mesh analysis design rules
- Primary-secondary prediction of protein structure
- Fault diagnoses rules for satellites
- Useful link
- http//www.doc.ic.ac.uk/shm/ilp.html
5Introduction 3 of 3
- Contribution of this paper
- Develops a different paradigm for relational
learning - Enable
- General purpose propositional algorithm
- Efficient propositional algorithm
- Suggests alternatives for
- Limited ILP systems
- Advantages
- More flexible
- Allows use of any propositional algorithm
- Maintains advantages of ILP approaches
- Allows more efficient learning
- Improves expressivity and robustness
6Propositional Relational Representations
- Background
- Two components of a knowledge representation
language - A subset of first order logic (FOL)
- A collection of structures (Graphs) defined over
elements in the domain - Domain D
- Relational Language R with respect to D
- Restricted (Function free) first order language
- Restrictions are applied by limiting the formulae
allowed in the language to a collection of
formulae that can be evaluated very efficiently
on given instances. - Achieved by
- Defining primitive formulae with limited scope of
the quantifiers - General formulae are defined inductively in terms
of primitive formulae in a restricted way that
depends on the relational structures in the
domain.
7Propositional Relational Representations
- Proposition
- Variable-less atomic formula
- Quantified Proposition
- A quantified atomic formula
-
8Propositional Relational Representations
- Domain D (V,G)
- V consists
- words TIME,,3,30,pm/phrase 330 pm
- G consists
- Two lists (Solid line dashed line)
9Propositional Relational Representations
- Types of elements
- To classify elements in the language according to
their properties - In Set V for 2 types
- Objects
- Attributes
- Type 1 predicates
- First argument an element in O
- Second argument an element in A
- P(o,a) ? ? P(o) a
- Pg(O1,O2) indicates that O1 and O2are nodes in
the graph and there is an edge
between them
10Propositional Relational Representations
- Given an instance x, a formula F in R is given a
unique truth value, the value of F on x, defined
inductively using the truth values of the
predicates in F and the semantics of the
connectives.
11Propositional Relational Representations
12Relation Generation Functions
- Active relations
- Word(TIME), word(PM), number(30)
- Relations (Formulae)
- Active is important
- Inactive relations vastly outnumber active
relations
13Relation Generation Functions
- Example
- RGF (Relation generation function) generates
active relations number(3), number(30) - RGFs are defined inductively using a relational
calculus - Basic RGFs, called sensors
- A sensor is a way to encode basic information one
can extract from an instance. - A set of connectives
14Relation Generation Functions
- Example following are some sensors that are
commonly used in NLP - The word sensor over word elements, which outputs
active relations word(TIME), word(),word(3),word(
30), and word(pm) from TIME 30pm - The length sensor over phrase elements, which
outputs active relations len(4) from 330 pm - The is-a sensor outputs the semantic class of a
word - The tag sensor outputs the part-of-speech tag of
a word
15Relation Generation Functions
- The relational calculus allows one to inductively
generate new RGFs by applying connective and
quantifiers over existing RGFs.
16Relation Generation Functions
- Example
- When applied with respect to the graph g which
represents the linear structure of the sentence,
collocg simply generates formulae that
corresponds to ngrams. - Dr John Smith, colloc(word, word) extracts the
bigrams word(Dr)-word(John) and
word(John)-word(Smith)
17Comparison to ILP Methods
- Search
- Features are generated as part of the search
procedure (ILP) - Features tried by an ILP program during its
search are generated up front (in a data driven
way) by the RGFs. - Knowledge
- Ability to incorporate background knowledge.
(ILP) - Using sensors allows treating information that is
readily available in the input, external
information or even previously learned concepts
in a uniform way - Expressivity (I)
- Same as ILP
- Learning
- Provides a uniform domain for different learning
algorithms(RGF) - Applying different algorithms is easy and
straightforward - Expressivity (II)
- The constructs of colloc and scolloc allows
generating relational features which are
conjunctions of predicates and are thus similar
to a clause in the output representation of an
ILP program
18Case Study Information Extraction
- Problem Description
- Information Extraction task is defined as
locating specific fragments of an article
according to predefined slots in a template - Data used in experiments is a set of 485 seminar
announcements from CMU - Extract four types of fragments from each article
- Starting time of the seminar (stime)
- end time of the seminar (etime)
- Its location (location)
- Seminars speaker (speaker)
- Given an article, out system picks at most one
fragment for one slot. If this fragment
represents the slot - It is a correct prediction
- Otherwise
- It is a wrong prediction
19Case Study Information Extraction
- Extracting Relational Features
- Classifier
- Discriminates a specific desired fragment
- It could be achieved by
- Identifying candidate fragments in the document
- For each candidate fragment, use the defined RGF
features to re-represent it as an example which
consists of al active features extracted this
fragment - Let f (ti, ti1,, tj) be a fragment, with ti
representing tokens and i, i1,, j are positions
of tokens in the document. - RGFs are defined to extract features from three
regions - Left window
- Target fragment
- Right window
20Case Study Information Extraction
- Two-stage Architecture
- Filtering
- Reduce the amount of candidates from all possible
fragments to a small number - Classifying
- Pick the right fragment from the preserved
fragments by the learned classifier.
21Two stage architecture
- Filtering
- Positive examples
- Fragments that represent legitimate slots
- Negative examples
- Irrelevant fragments
- Two learned classifiers and a fragment is
filtered out if it meets one of the following
criteria - Single feature classifier Fragment doesnt
contain a feature that should be active in
positive examples - General Classifier Fragments confidence value
is below the threshold
22Two stage architecture
- Classifying
- Use the survived fragments in the first stage
- Using SNoW classifier
- A multi-class classifier that is specifically
tailored for large scale learning tasks.
Roth,1998Carleson et al., 1999 - First step
- An additional collection of RGFs is applied to
enhance the representation of the candidate
fragments - In training
- Remaining fragments are annotated and are used as
positive or negative examples - In testing
- Remaining fragments are evaluated on the learned
classifiers to determine if they can fill one of
the slots
23Two stage architecture
- Classifying -- RGFs added
- Etime and Stime
- Scollocwordloc(-1)l_window,wordr_window
- A sparse structural conjunction of the word
directly left of the target region, and of words
and tags in the right window - Scollocwordloc(-1)l_window,tager_window
- A sparse structural conjunction of the last two
words in the left window, a tag in the target,
and the first tag is the right window. - Location and speaker
- Scollocwordloc(-2)l_window,tatarg,tageloc(1
)r_window - Result
- A decision is made by each of the 4 classifiers.
A fragment is classified as type t if tth
classifier decides so. At most one fragment of
type t is chosen in each article, based on the
activation value of the corresponding classifier.
24Experimental Results
- Two tested propositional algorithm
- SNoW-IE -- good performance
- NB-IE (Naïve Bayes) -- not as good as the one
above - Relationship between architecture and result
- NO provement
25Conclusion of the paper
- Contribution
- The paradigm has different tradeoffs than the ILP
approaches. - More flexible
- By using any propositional algorithm within it
including probabilistic approaching - The paradigm is exemplified on Information
Extraction. - A new approach to learning for IE.