Kernels for Relation Extraction

About This Presentation

Title:

Kernels for Relation Extraction

Description:

No office hours this week due to open houses. So I have more time to chat with prospective students. ... If mistake: vk 1 = vk yi xi. Mathematically the same ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 18

Provided by: willia95

Category:

more less

Transcript and Presenter's Notes

Title: Kernels for Relation Extraction

1
Kernels for Relation Extraction

William Cohen
3-6-2007

2
Announcements

No office hours this week due to open houses
So I have more time to chat with prospective
students.
Next Tuesdays lecture will summarize
Sarawagi Cohen, NIPS 2004

3
The kernel perceptron
instance xi
B
A
Mathematically the same as before but allows
use of the kernel trick
4
The kernel perceptron
instance xi
B
A
Mathematically the same as before but allows
use of the kernel trick
Other kernel methods (SVM, Gaussian processes)
arent constrained to limited set (1/-1/0) of
weights on the K(x,v) values.
5
Kernels vs Structured Output Spaces

Two kinds of structured learning
HMMs, CRFs, VP-trained HMM, structured SVMs,
stacked learning, . the output of the learner
is structured.
Eg for linear-chain CRF, the output is a sequence
of labelsa string Yn
Bunescu Mooney (EMNLP, NIPS) the input to the
learner is structured.
EMNLP structure derived from a dependency graph.

New!
6
Dependency graphs for sentences
CFG dependency parsers ? dependency
trees Context-senstive formalisms ? dependency
DAGs
7
(No Transcript)
8
Disclaimer this is a shortest path, not the
shortest path
9
x ? x
? x1 x2 x3 x4 x5 41314 48
features
x3
x2
x4
x5
x1
K( x1 xn, y1 yn ) ( x1 xn )
n (y1 yn)

10
Results
-CCG, -CFG Context-sensitive CCG vs Collins
(CFG) parser S1, S2 one multi-class SVM vs two
SVMs (binary, then multiclass) Correct entity
output is assumed
11
Now the NIPS paper

Similar representation for relation instances x1
xn where each xi is a set.
but instead of informative dependency path
elements, the xs just represent adjacent tokens.
To compensate use a richer kernel

12
Subsequence kernel

set of all sparse subsequences u of
x1 xn with each u downweighted according to
sparsity

Relaxation of old kernel
We dont have to match everywhere, just at
selected locations
For every position we decide to match at, we get
a penalty of ?
To pick a feature inside (x1 xn)
Pick a subset of locations ii1,,ik and then
Pick a feature value in each location
In the preprocessed vector x weight every
feature for i by ?length(i) ?ik-i11

13
Subsequence kernel
or
14
Dynamic programming computation
Only counts u that align with last char of s and t
Skipping position i in s
Including position i
Not aligned with end of s
Aligned with end of s
15
Dynamic programming computation
Only counts u that align with last char of s and t
Matching at last pos of s,t
Skipping last position in t
Not aligned with end of s
Aligned with end of s
16
Additional details

Special domain-specific tricks for combining the
subsequences for what matches in the fore, aft,
and between sections of a relation-instance pair.
Subsequences are of length less than 4.
Is DP needed for this now?
Count fore-between, between-aft, and between
subsequences separately.

17
Results
Protein-protein interaction

Write a Comment

User Comments (0)