Implementing FastTBL in Oz - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Implementing FastTBL in Oz

Description:

... Data structures Programming paradigms Oz A multi-paradigm language Object-oriented Functional Concurrent Distributed Declarative Stateful .... Data & Paradigm We ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 19
Provided by: gus146
Category:

less

Transcript and Presenter's Notes

Title: Implementing FastTBL in Oz


1
Implementing FastTBL in Oz
  • Leif Grönqvist (lgr_at_msi.vxu.se)
  • Fredrik Kronlid (kronlid_at_ling.gu.se)

2
TBL
  • The training phase
  • Input
  • Annotated corpus
  • Rule templates
  • Output
  • Sequence of rules (best rule first)
  • Annotation phase
  • Input
  • Sequence of rules
  • Un-annotated corpus
  • Lexicon for initial annotations
  • Output
  • Annotated corpus

3
Vanilla TBL (VTBL)
  • Rules are selected greedy
  • Corpus annotations updated after each rule
    selected
  • Continue until enough rules or no errors left
  • Number of possible rules in each iteration is
    very high
  • Grows by tagset, number of templates, and number
    of variables in templates
  • No results used from earlier iterations

4
TBL à la Ramshaw Marcus
  • Only applicable rules, correcting at least one
    sample is generated
  • The set of rules are saved between iterations
  • For each rule there is
  • a score
  • a list of affected positions in the corpus
  • For all samples there is a list of applicable
    rules
  • Much faster than vanilla TBL
  • Needs much more memory than VTBL
  • The update of the needed structures takes a large
    amount of the time used during training

5
FastTBL
  • Ngai, G. Florian, R. (2001), Transformation-Base
    d Learning in the Fast Lane, in Proceedings of
    the 39th ACL Conference.
  • Similar to Ramshaw Marcus
  • Only applicable rules, correcting at least one
    sample is generated
  • The set of rules are saved between iterations
  • For each rule there is
  • a score consisting of the Good and Bad part
  • For each selected rule, the scores for all rules
    are updated
  • The vicinity of a sample tells the system which
    samples the classification may depend on
  • Much faster than Ramshaw Marcus
  • Needs much less memory than R M

6
The algorithm in a nutshell
7
Ngai Florians description
  • Unclear what to store and how
  • When should chosen rules be applied to the
    corpus?
  • Notations like b(s) and p(b(s)) seem a bit sloppy
    what do they mean?
  • b rule, p predicate, s sample
  • How do we define vicinity?
  • An algorithm description should describe the
    algorithm

8
Implementation
  • Oz
  • Data structures
  • Programming paradigms

9
Oz
  • A multi-paradigm language
  • Object-oriented
  • Functional
  • Concurrent
  • Distributed
  • Declarative
  • Stateful
  • ....

10
Data Paradigm
  • We use functional programming, logic programming
    and imperative programming techniques.
  • Functional programming some higher order
    functions
  • Extensive use of tuples logic programming
  • Corpus Array, Rule collection Dictionary
    stateful structures, assignment works

11
Corpus representation
  • Many ways to access data in Oz
  • Data from µ-TBL (SUC)
  • Solution
  • (Script to transform SUC to µ-TBL format)
  • Perl script for transforming SUC data into a
    tuple in a functor (Oz module)
  • Tuple converted to Array for mutability

12
Template representation
  • µ-TBL template representation allows
  • Constraints on any feature
  • Conjunction of constraints
  • Disjunction of positions
  • Target can change any feature
  • tagAgtB lt- wdC_at_0 tagD_at_-1,-2
  • Requires an elaborate template compiler

13
Templates contd
  • FoztTBL templates allow
  • Constraints on tags and words
  • Conjunction of constraints
  • The only target is change tag A into B
  • Our template format also give the possibility to
    generate rules on the form change any tag to B

14
Templates contd
  • We use an extremely simple template formalism
  • template(wd(0 0 c 0 0) tag(0 0 a d 0))
  • tagAgtB lt- wdC_at_0 tagA_at_0 tagD_at_1
  • A FoztTBL template is instantiated into a
    Predicate by pattern matching

15
Representing the rules
  • Rule predicate and target additional info
  • Tuple of tuples
  • rule(P T F UsedFlag)
  • P predicate( wd(EMPTY EMPTY om EMPTY
    EMPTY) tag(EMPTY EMPTY pp EMPTY EMPTY) )
  • T target(sn)
  • F f(Good, Bad)
  • UsedFlag a flag to indicate whether the rule
    has been applied or not
  • (tagppgtsn lt- wdom_at_0 tagpp_at_0)

16
Representing the rule collection
  • FastTBL requirement instant (constant time)
    access to all rules with a certain predicate
  • Oz Dictionaries hash tables with atoms as keys
  • Solution
  • Dictionary
  • functions for rapid conversion predicate-atom

17
Conclusions
  • FastTBL has
  • Better time complexity than VTBL and RMTBL
  • Uses less memory than VTBL
  • Gives the same result
  • The multiparadigmatic nature of Oz makes
    programming easier
  • An algorithm appearing in a reviewed paper isn't
    necessarily complete, comprehensible or easily
    implementable.

18
Future Work
  • Understand the algorithm in detail
  • Read the C code
  • Use the ideas in Ngai/Florian and reinvent it
  • Implement and read Ngai/Florian iteratively using
    a debugger
  • Finish off the implementation
  • Improve on the clarity of the implementation and
    the algorithm description
  • Make the template formalism more flexible,
    keeping its simplicity
Write a Comment
User Comments (0)
About PowerShow.com