Support Vector Machine Based Orthographic Disambiguation - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Support Vector Machine Based Orthographic Disambiguation

Description:

Support Vector Machine Based Orthographic Disambiguation Eiji ARAMAKI, Takeshi IMAI, Kengo MIYO, Kazuhiko OHE Hospital center and centre are equivalent? – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 9
Provided by: aram167
Category:

less

Transcript and Presenter's Notes

Title: Support Vector Machine Based Orthographic Disambiguation


1
Support Vector Machine Based Orthographic
Disambiguation
Eiji ARAMAKI, Takeshi IMAI, Kengo MIYO, Kazuhiko
OHE
Hospital
center and centre are equivalent?
We focus on Japanese, but the proposed method
does not depend on languages
2
Background
  • Japanese in particular contains orthographic
    variation, because of tons of transliterations

????? (A BO GA DO RO)
Transliteration
Avogadro
Equivalent or not?
Transliteration
?????? (A VO GA DO RO)
SVM-based classifier
(1) To build training-sets
(2) To define features
3
(1) Training-set
in multiple translation dictionaries
  • Positive example a term pair, which are spelled
    differently, but have the same meaning
  • Negative example a term pair, which are spelled
    differently and have different meanings.

Same English Translation
Different English Translation
4
(2) Features for SVM
  • different characters its surrounding characters
    (window size1 pre-context post-context)

Pre-context
Post-context
Diff.
? ? ? ? ? ?
? ? ? ? ?
(?,????, ?)
(?, ????)
(????, ?)
(????)
label
term2
term1
True
?????? ????? 1 1 1 1
  • Their combinations features

5
Experiments
  • Test-set 883 Medical term pairs (312 positive)
  • Methods
  • (1) EDIT DISTANCE (th) IF SIM gt th THEN 1
  • (2) BYHAND SVM 4,130 handmade training-set
  • (3) AUTOMATIC SVM 68,608 automatically built
    training-set
  • (4) COMBINATION SVM all training-set
    (BYHANDAUTOMATIC)
  • Evaluation
  • Results

6
Conclusion
  • Discussion
  • Why AUTOMATIC lt BYHAND
  • Because of Corpus specific styles (hepatitis-B
    or HepatitisB)
  • BYHAND corpus test-set corpus ? AUTOMATIC corpus
  • Conclusion
  • We proposed a discriminative orthographic
    disambiguation method.
  • We also proposed a method for collecting both
    positive negative examples.
  • Experimental results yielded high levels of
    accuracy (87.8), demonstrating the feasibility
    of the proposed approach.

Unfortunately Bergsma ACL2007 proposed similar
methods
In the future, we will employ more features to
boost the accuracy
7
(No Transcript)
8
Support Vector Machine Based Orthographic
Disambiguation
Eiji ARAMAKI, Takeshi IMAI, Kengo MIYO, Kazuhiko
OHE
Hospital
term1 and term2 are equivalent?
We focus on Japanese, but the proposed method
does not depend on languages
Write a Comment
User Comments (0)
About PowerShow.com