javaConLib - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

javaConLib

Description:

Title: Dialogue Act Coding and Modalities Author: leifg Last modified by: leifg Created Date: 6/9/2002 8:36:10 PM Document presentation format: Bildspel p sk rmen – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 12
Provided by: leif53
Category:

less

Transcript and Presenter's Notes

Title: javaConLib


1
javaConLib
  • GSLT Java Development for HLT Leif Grönqvist
    leifg_at_ling.gu.se
  • 11. June 2002 1030

2
What have I done?
  • I have implemented a library useful for various
    word sense disambiguation based on contexts
  • From the beginning I have had a test method
    trying to provoke errors in each part of the
    implementation
  • A command line application using the library,
    implementing Yarowsky 1995
  • I have tried to make final code at once

3
What is left to do?
  • One very simple test implementation
  • A tutorial based documentation
  • Adjust things Lars pointed out in the last
    iteration
  • Make an ANT build script
  • The final report

4
Project Background
  • Several methods for word disambiguation based on
    context. For example
  • Yarowskys unsupervised algorithm from 1995 is
    based on two general observations
  • One sense per collocation nearby words provide
    strong and consistent clues
  • One sense per discourse the sense for a target
    word is highly consistent within any document

5
(No Transcript)
6
(No Transcript)
7
A much simpler supervised approach
  • Start with a disambiguated set of occurrences
  • Count all word types within a -5 word context
    for each sense
  • To disambiguate a new occurrence compare the
    context to the possible senses distributions

8
javaConLib
  • These two algorithms have a lot in common
  • There are many more similar algorithms
  • javaConLib includes classes that simplify
    implementation and tuning a lot
  • Higher order and intuitive methods the main
    class will look more like an algorithm description

9
Typical parts of a main class
  • Yarowsky ynew Yarowsky(5)
  • Corpus trainCorpnew Corpus (train.txt)
  • SenseSet s1new SenseSet(ägerägde, Abs,
    y.posl1)
  • DecisionList decListy.train95(s1, s2, rum,
    trainCorp)
  • ContextList testConty.test95(decList,
    testCorpus, s1, s2, word)
  • print(testCont.toString())

10
The Classes
  • Context An array of words with specific size and
    the main word at position 0.
  • ContextList A set of Contexts around a certain
    word type extracted from a corpus
  • Corpus A corpus is basically a vector containing
    words read from a file
  • Decision A decision contains a word, a position,
    and a score deciding how good it is to decide the
    sense for the main word in a context
  • DecisionList A DecisionList like the one used in
    Yarowsky's algorithm from 1995.
  • FreqList A frequency list for strings in a
    corpus
  • Positions Holds a list of positions (integers)
    relative to the center word when working with
    words and contexts.
  • SenseSet A set of the necessary components for
    each sense when using the Yarowsky -95 algorithm
    for word sense disambiguation
  • Yarowsky A class with some structures and
    classes useful when implementing Yarowsky's
    disambiguation algorithm from 1995, and similar.

11
We are done
  • And probably out of time
Write a Comment
User Comments (0)
About PowerShow.com