Fast Methods for Kernel-based Text Analysis - PowerPoint PPT Presentation

About This Presentation

Title:

Fast Methods for Kernel-based Text Analysis

Description:

Fast Methods for Kernel-based Text Analysis Taku Kudo Yuji Matsumoto NAIST (Nara Institute of Science and Technology) – PowerPoint PPT presentation

Number of Views:88

Avg rating:3.0/5.0

Slides: 38

Provided by: Taku9

Learn more at: http://chasen.org

Category:

more less

Transcript and Presenter's Notes

Title: Fast Methods for Kernel-based Text Analysis

1
Fast Methods for Kernel-based Text Analysis

Taku Kudo ?? ?
Yuji Matsumoto ?? ??
NAIST (Nara Institute of Science and Technology)

41st Annual Meeting of the Association for
Computational Linguistics , Sapporo JAPAN
2
Background

Kernel methods (e.g., SVM) become popular
Can incorporate prior knowledge independently
from the machine learning algorithms by giving
task dependent kernel (generalized dot-product)
High accuracy

3
Problem

Too slow to use kernel-based text analyzers to
the real NL applications (e.g., QA or text
mining) because of their inefficiency in testing
Some kernel-based parsers run only at 2 - 3
seconds/sentence

4
Goals

Build fast but still accurate kernel- based
text analyzers
Make it possible to use them to wider range of NL
applications

5
Outline

Polynomial Kernel of degree d
Fast Methods for Polynomial kernel
PKI
PKE
Experiments
Conclusions and Future Work

6
Outline

Polynomial Kernel of degree d
Fast Methods for Polynomial kernels
PKI
PKE
Experiments
Conclusions and Future Work

7
Kernel Methods
Training data

No need to represent example in an explicit
feature vector
Complexity of testing is O(L X)

8
Kernels for Sets (1/3)

Focus on the special case where examples
are represented as sets
The instances in NLP are usually
represented as sets (e.g., bag-of-words)

Feature set
Training data
9
Kernels for Sets (2/3)

Simple definition

Combinations (subsets) of features

2nd order
3rd order
10
Kernels for Sets (3/3)
Dependent (1) or independent (-1) ?
I ate a cake PRP VBD
DT NN
head
modifier
11
Polynomial Kernel of degree d
Implicit form
12
Example (Cubic Kernel d3 )
Implicit form
Up to 3 subsets are used as new features
13
Outline

Polynomial Kernel of degree d
Fast Methods for Polynomial kernel
PKI
PKE
Experiments
Conclusions and Future Work

14
Toy Example
Feature Set Fa,b,c,d,e
Examples
X
a
j
j
1 0.5 -2
1 2 3
a, b, c a, b, d b, c, d
SVs L 3
Kernel
Test Example
Xa,c,e
15
PKB (Baseline)
3
K(X,X) (XnX1)
X
a
j
a, b, c a, b, d b, c, d
K(Xj,X)
1 0.5 -2
Test Example Xa,c,e
1 2 3
3
3
3
f(X) 1(21) 0.5(11) - 2 (11)
15 Complexity is always O(LX)
16
PKI (Inverted Representation)
3
K(X,X) (XnX1)
Inverted Index
Xj
a
B Avg. size
a b c d
1,2 1,2,3 1,3 2,3
Test Example X a, c, e
a, b, c a, b, d b, c, d
1 0.5 -2
1 2 3
3
3
3
f(X)1(21) 0.5(11) - 2 (11) 15

Average complexity is O(BXL)
Efficient if feature space is sparse
Suitable for many NL tasks

17
PKE (Expanded Representation)

Convert into linear form by calculating vector w
projects X into its subsets space

18
PKE (Expanded Representation)
3
K(X,X) (XnX1)
19
PKE in Practice

Hard to calculate Expansion Table exactly
Use Approximated Expansion Table
Subsets with smaller w can be removed, since
w represents a contribution to the final
classification
Use subset mining (a.k.a. basket mining)
algorithm for efficient calculation

20
Subset Mining Problem
set
id
a3 b3 c3 d2 a b2 b
c 2 a c2 a d 2
1
a c d

2
a b c
3
a b d
4
b c e
Results
Transaction Database

Extract all subsets that occur in no less than
sets of the transaction database
and no size constraints ? NP-hard
Efficient algorithms have been proposed
(e.g., Apriori, PrefixSpan)

21
Feature Selection as Mining
Xi
ai
a, b, c a, b, d b, c, d
1 2 3
1 0.5 -2

Can efficiently build the approximated table
s controls the rate of approximation

22
Outline

Polynomial Kernel of degree d
Fast Methods for Polynomial kernel
PKI
PKE
Experiments
Conclusions and Future Work

23
Experimental Settings

Three NL tasks
English Base-NP Chunking (EBC)
Japanese Word Segmentation (JWS)
Japanese Dependency Parsing (JDP)
Kernel Settings
Quadratic kernel is applied to EBC
Cubic kernel is applied to JWS and JDP

24
Results (English Base-NP Chunking)
Time (Sec./Sent.) Speedup Ratio F-score
PKB .164 1.0 93.84
PKI .020 8.3 93.84
PKE (s.01) .0016 105.2 93.79
PKE (s.005) .0016 101.3 93.85
PKE (s.001) .0017 97.7 93.84
PKE (s.0005) .0017 96.8 93.84
25
Results (Japanese Word Segmentation)
Time (Sec./Sent.) Speedup Ratio Accuracy ()
PKB .85 1.0 97.94
PKI .49 1.7 97.94
PKE (s.01) .0024 358.2 97.93
PKE (s.005) .0028 300.1 97.95
PKE (s.001) .0034 242.6 97.94
PKE (s.0005) .0035 238.8 97.94
26
Results (Japanese Dependency Parsing)
Time (Sec./Sent.) Speedup Ratio Accuracy ()
PKB .285 1.0 89.29
PKI .0226 12.6 89.29
PKE (s.01) .0042 66.8 88.91
PKE (s.005) .0060 47.8 89.05
PKE (s.001) .0086 33.3 89.26
PKE (s.0005) .0090 31.8 89.29
27
Results

2 - 12 fold speed up in PKI
30 - 300 fold speed up in PKE
Preserve the accuracy when we set an appropriate
s

28
Comparison with related work

XQK Isozaki et al. 02
Same concept as PKE
Designed only for the Quadratic Kernel
Exhaustively creates the expansion table
PKE
Designed for general Polynomial Kernels
Uses subset mining algorithms to create the
expansion table

29
Conclusions

Propose two fast methods for the polynomial
kernel of degree d
PKI (Inverted)
PKE (Expanded)
2-12 fold speed up in PKI, 30-300 fold speed up
in PKE
Preserve the accuracy

30
Future Work

Examine the effectiveness in a general machine
learning dataset
Apply PKE to other convolution kernels
Tree Kernel Collins 00
Dot-product between trees
Feature space is all sub-tree
Apply sub-tree mining algorithm Zaki 02

31
English Base-NP Chunking
Extract Non-overlapping Noun Phrase from text
NP He reckons NP the current account deficit
will narrow to NP only 1.8 billion in NP
September .

BIO representation (seeing as a tagging task)
B beginning of chunk
I non-initial chunk
O outside
Pair-wise method to 3-class problem
training wsj15-18, test wsj20 (standard set)

32
Japanese Word Segmentation
Taro made Hanako read a book
? ? ? ? ? ? ? ? ? ? ? ?
Sentence
? ? ? ? ? ? ? ?
Boundaries
If there is a boundary between and
, otherwise

Distinguish the relative position
Use also the character types of Japanese
Training KUC 01-08, Test KUC 09

33
Japanese Dependency Parsing
?? ???? ??? I-top cake-acc. eat
I eat a cake

Identify the correct dependency relations
between two bunsetsu (base phrase in English)
Linguistic features related to the modifier
and head (word, POS, POS-subcat,
inflections, punctuations, etc)
Binary classification (1 dependent, -1
independent)
Cascaded Chunking Model kudo, et al. 02
Training KUC 01-08, Test KUC 09

34
Kernel Methods (1/2)
Suppose a learning task
training examples

X example to be classified
Xi training examples
weight for examples
a function to map examples to another
vectorial space

35
PKE (Expanded Representation)
If we calculate in advance ( is the indicator
function)
for all subsets
36
TRIE representation
root
w
a d a,b a,c b,c b,d c,d b,c,d
10.5 -10.5 12 12 -12 -18 -24 -12
a
d
b
c
10.5
-10.5
c
c
d
d
b
-24
12
12
-18
-12
d
-12