Title: Kernels for Relation Extraction
1Kernels for Relation Extraction
- No office hours this week due to open houses
- So I have more time to chat with prospective
students. - Next Tuesdays lecture will summarize
- Sarawagi Cohen, NIPS 2004
3The kernel perceptron
instance xi
Mathematically the same as before but allows
use of the kernel trick
4The kernel perceptron
instance xi
Mathematically the same as before but allows
use of the kernel trick
Other kernel methods (SVM, Gaussian processes)
arent constrained to limited set (1/-1/0) of
weights on the K(x,v) values.
5Kernels vs Structured Output Spaces
- Two kinds of structured learning
- HMMs, CRFs, VP-trained HMM, structured SVMs,
stacked learning, . the output of the learner
is structured. - Eg for linear-chain CRF, the output is a sequence
of labelsa string Yn - Bunescu Mooney (EMNLP, NIPS) the input to the
learner is structured. - EMNLP structure derived from a dependency graph.
6Dependency graphs for sentences
CFG dependency parsers ? dependency
trees Context-senstive formalisms ? dependency
7(No Transcript)
8Disclaimer this is a shortest path, not the
shortest path
9x ? x
? x1 x2 x3 x4 x5 41314 48
K( x1 xn, y1 yn ) ( x1 xn )
n (y1 yn)
-CCG, -CFG Context-sensitive CCG vs Collins
(CFG) parser S1, S2 one multi-class SVM vs two
SVMs (binary, then multiclass) Correct entity
output is assumed
11Now the NIPS paper
- Similar representation for relation instances x1
xn where each xi is a set. - but instead of informative dependency path
elements, the xs just represent adjacent tokens. - To compensate use a richer kernel
12Subsequence kernel
- set of all sparse subsequences u of
- x1 xn with each u downweighted according to
- Relaxation of old kernel
- We dont have to match everywhere, just at
selected locations - For every position we decide to match at, we get
a penalty of ? - To pick a feature inside (x1 xn)
- Pick a subset of locations ii1,,ik and then
- Pick a feature value in each location
- In the preprocessed vector x weight every
feature for i by ?length(i) ?ik-i11
13Subsequence kernel
14Dynamic programming computation
Only counts u that align with last char of s and t
Skipping position i in s
Including position i
Not aligned with end of s
Aligned with end of s
15Dynamic programming computation
Only counts u that align with last char of s and t
Matching at last pos of s,t
Skipping last position in t
Not aligned with end of s
Aligned with end of s
16Additional details
- Special domain-specific tricks for combining the
subsequences for what matches in the fore, aft,
and between sections of a relation-instance pair. - Subsequences are of length less than 4.
- Is DP needed for this now?
- Count fore-between, between-aft, and between
subsequences separately.
Protein-protein interaction