ML for NLP: Ling 572 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

ML for NLP: Ling 572

Description:

Train a tagger M on T. Use M to add tags to U, creating U' ... Chuck Norris jokes. For the Morbidly Curious. 95.36. 40K. 93.43. 10K. 92.11. 5K. 85.68. 1K ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 18
Provided by: na33
Category:
Tags: nlp | chuck | ling | norris

less

Transcript and Presenter's Notes

Title: ML for NLP: Ling 572


1
ML for NLP Ling 572
  • Achim Ruopp
  • Zhengbo Zhou
  • Albert Bertram

2
Outline
  • Tagging Introduction
  • Blah, blah, blah, you did this too
  • And now for something possibly different
  • Results
  • Comments
  • Questions
  • Snide remarks

3
Tagging introduction
  • Tagging is pretty simple
  • Relatively good results tagging with
  • Tag argmax p(t word )
  • t in Tagset
  • The problem is well understood
  • Algorithm changes can be clearly visible

4
Preaching to the Choir
  • You all did this too
  • P1 HMM/FST trigram tagger
  • P2 TBL tagger
  • P3 Maximum Entropy tagger

5
Forking Processes
  • Project 4 offered some choices
  • Self-training FTW

6
General Algorithm
  • Initialize
  • Tagged data T
  • Untagged data U
  • Loop
  • Train a tagger M on T
  • Use M to add tags to U, creating U
  • Move the best sentences from U into T
  • Replace U with the rest of U
  • End when youre satisfied

7
Experimental Details
  • For i1,5
  • Initialize T with i1000 sentences
  • For j15,25,35
  • Initialize U with j 1000 sentences
  • Do self training
  • Best of U is defined as the best n
  • Where n 20 of T
  • Satisfied when U 0

8
Results (I) Graph
9
Results (II) Zoomed in
10
Results (III) Projects 1-3
  • Probably not much new here
  • You all did projects 1-3
  • That outlier though TBL with simple unknown word
    treatment
  • The simpler approach to unknown word handling has
    much worse results.

11
Results (IV) Project 4
  • Overall, more than 1 improvement
  • Not spectacular
  • Better than a poke in the eye with a sharp stick
  • Probably works better in a domain with fewer
    degrees of freedom.
  • Then again, unlabelled data is cheap
  • wget r http//www.gutenberg.org/

12
Further Considerations
  • Not much to say about P1-3, but for P4
  • We used a trigram tagger
  • Higher probabilities for shorter sentences
  • May want a quality measure rather than this
  • Unknown words are problematic
  • Try another tagger?

13
Done Done Done
  • Questions?
  • Comments.
  • Snide remarks.
  • Chuck Norris jokes.

14
For the Morbidly Curious
Project 1 tagging accuracy
15
For the Morbidly Curious
Project 2 tagging accuracy
16
For the Morbidly Curious
Project 3 tagging accuracy
17
For the Morbidly Curious
Project 4 tagging accuracy/number of sentences
added
Write a Comment
User Comments (0)
About PowerShow.com