Project Part 2 - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Project Part 2

Description:

Compilers, libraries, operating system ... Download and compile the package, and test the code: ( 1 hour) ... A new way of treating unknown words. ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 9
Provided by: facultyWa8
Category:
Tags: compile | part | project

less

Transcript and Presenter's Notes

Title: Project Part 2


1
Project Part 2
  • LING 572
  • Fei Xia
  • 1/26/06

2
NLP Packages
  • FST Carmel, ATT toolkit
  • TBL fnTBL
  • MaxEnt
  • DT C4.5
  • Boosting AdaBoost
  • LM SRI LM
  • MT GIZA, Pharoah,

3
Main steps
  • Download and compile the package, and test the
    code with given examples.
  • License, citation
  • Compilers, libraries, operating system
  • Create your own test data, write a few
    wrappers/converters, and test the code.
  • Fix bugs
  • Understand the main algorithm of the package
  • Read README files, tutorials, and related papers
  • Check the source code.
  • Modify and improve the package
  • Run experiments

4
Using fnTBL
  • Download and compile the package, and test the
    code (lt 1 hour)
  • Create your own test data, write a few
    wrappers/converters, and test the code
  • (about 6 hrs, my time)
  • Understand the main algorithm of the package (??
    Hrs)
  • Modify and improve the package (?? Hrs)
  • Run experiments (computer time)
  • 12 experiments

5
Main tasks
  • Understand the code
  • Core algorithm fnTBL-1.1/src
  • POS tagger perl_code/pos-train.prl and
    pos-apply.prl
  • A wrapper perl_code/build_TBL_tagger1.pl
  • Modify the code
  • Here you dont need to change the core algorithm.
  • A new way of treating unknown words.
  • ? In Report2, explaining the algorithms and your
    modification

6
Main tasks (cont)
  • Run the code with different settings
  • Corpus size 1K, 5K, 10K, 40K
  • Feature templates all the types or a subset
  • Treatment of unknown words
  • ? Report 1

7
Report1
of standard fewer feature w/
simple treatment sents case
types for unknown words
(tagger1.pl) (tagger2.pl)
(tagger3.pl)
1K a11 a12
a13 5K a21
a22 a23 10K a31
a32 a33 40K a41
a42 a43 Replace each
cell with a(b, c, d) a tagging accuracy,
b of lexical rules c of context rules,
d running time
8
Files for the project
  • Files given to you
  • fnTBL-1.1.linux.tar.gz
  • params/
  • data/
  • perl_code/
  • Files that will be produced by you
  • new_params/ feature templates
  • new_perl_code/ build_TBL_tagger3.pl,
    pos-train3.prl and pos-apply3.prl.
  • report/ Report1 and Report2
  • result/ a11/, a12/, ., a43/
Write a Comment
User Comments (0)
About PowerShow.com