Title: Kernels for Relation Extraction
1Kernels for Relation Extraction
2Announcements
- No office hours this week due to open houses
- So I have more time to chat with prospective
students. - Next Tuesdays lecture will summarize
- Sarawagi Cohen, NIPS 2004
3The kernel perceptron
instance xi
B
A
Mathematically the same as before but allows
use of the kernel trick
4The kernel perceptron
instance xi
B
A
Mathematically the same as before but allows
use of the kernel trick
Other kernel methods (SVM, Gaussian processes)
arent constrained to limited set (1/-1/0) of
weights on the K(x,v) values.
5Kernels vs Structured Output Spaces
- Two kinds of structured learning
- HMMs, CRFs, VP-trained HMM, structured SVMs,
stacked learning, . the output of the learner
is structured. - Eg for linear-chain CRF, the output is a sequence
of labelsa string Yn - Bunescu Mooney (EMNLP, NIPS) the input to the
learner is structured. - EMNLP structure derived from a dependency graph.
New!
6Dependency graphs for sentences
CFG dependency parsers ? dependency
trees Context-senstive formalisms ? dependency
DAGs
7(No Transcript)
8Disclaimer this is a shortest path, not the
shortest path
9x ? x
? x1 x2 x3 x4 x5 41314 48
features
x3
x2
x4
x5
x1
K( x1 xn, y1 yn ) ( x1 xn )
n (y1 yn)
10Results
-CCG, -CFG Context-sensitive CCG vs Collins
(CFG) parser S1, S2 one multi-class SVM vs two
SVMs (binary, then multiclass) Correct entity
output is assumed
11Now the NIPS paper
- Similar representation for relation instances x1
xn where each xi is a set. - but instead of informative dependency path
elements, the xs just represent adjacent tokens. - To compensate use a richer kernel
12Subsequence kernel
- set of all sparse subsequences u of
- x1 xn with each u downweighted according to
sparsity
- Relaxation of old kernel
- We dont have to match everywhere, just at
selected locations - For every position we decide to match at, we get
a penalty of ? - To pick a feature inside (x1 xn)
- Pick a subset of locations ii1,,ik and then
- Pick a feature value in each location
- In the preprocessed vector x weight every
feature for i by ?length(i) ?ik-i11
13Subsequence kernel
or
14Dynamic programming computation
Only counts u that align with last char of s and t
Skipping position i in s
Including position i
Not aligned with end of s
Aligned with end of s
15Dynamic programming computation
Only counts u that align with last char of s and t
Matching at last pos of s,t
Skipping last position in t
Not aligned with end of s
Aligned with end of s
16Additional details
- Special domain-specific tricks for combining the
subsequences for what matches in the fore, aft,
and between sections of a relation-instance pair. - Subsequences are of length less than 4.
- Is DP needed for this now?
- Count fore-between, between-aft, and between
subsequences separately.
17Results
Protein-protein interaction