Title: Applications of Sequence Learning CMPT 825 Mashaal A. Memon
1Applications of Sequence LearningCMPT 825
Mashaal A. Memon
2What We Know of Sequence Learning
- Part Of Speech (POS) Tagging is a sequence
learning problem. - 3 approaches to solving the problem
- Noisy-Channel
- Classification
- Rule-Based
3What We Know About POS Tagging
- A part of speech (POS) explains not what the
word is, but how it is used.
- Problem Which POS does each word represent?
- Tags POS tags (i.e. NN Noun, VB Verb, etc)
- Training Words sequences with corresponding POS
tags.
4What We Know About POS Tagging Continued
Anoop is a great professor .
NN VBZ DT JJ NN .
I am kissing butt right now .
PRP VBP RB NN RB RB .
5What Is My Point?
- Other interesting and important problems can be
represented as tagging problems.
- The same three approaches can be used.
- 4 such applications will be briefly introduced
- Chunking
- Named Entity Recognition
- Cascaded Chunking
- Word Segmentation
6(1) Chunking
- A chunk is a syntactically correlated part of a
language (i.e. noun phrase, verb phrase, etc.)
- Problem Which type of chunk does each word or
group of words belong to?
- Note Chunks of the same type can sometimes kiss
each other.
7(1) Chunking Continued
Noun-Phrase (NP) Chunking
- Only look for noun phrase chunks.
- Tags B beginning noun phrase
- I in noun phrase
- O other
- Training Word sequences with corresponding POS
and NP tags.
- Input Word sequences and POS tags.
8(1) Chunking Continued
Noun-Phrase (NP) Chunking
The student talked to Anoop .
B I O O B O
The guy he talked to was smelly .
B I B O O O O O
9(1) Chunking Continued
General Chunking
- Look for other syntactical constructs as well as
noun phrases.
- Tags - B or I prefix to each chunk type
- - chunk types (NP noun phrase, VP verb
phrase, PP prepositional phrase, O other)
- Training Word sequences with corresponding POS
and chunk tags.
- Input Word sequences and POS tags.
10(1) Chunking Continued
General Chunking
Anoop should give me an A .
B-NP B-VP I-VP B-NP B-NP I-NP O
His presentation is boring me to
death .
B-NP I-NP B-PP B-VP
B-NP B-PP B-VP O
11(2) Named Entity Recognition
- A named entity is a phrase that contains names
of persons, organizations or locations
- Problem Does a word or group of words represent
a named entity or not?
- Tags - B or I prefix to each NE type
- - NE types (PER person, ORG organization,
LOC location, O other)
- Training Word sequences with corresponding POS
and NE tags. Sometimes lists of NE data are used
(Cheating!!)
- Input Word sequences with POS tags.
12(2) Named Entity Recognition Continued
The United States of America
O B-LOC I-LOC I-LOC I-LOC
has an intelligent leader in D.C.
O O O O O B-LOC
, Dick Cheney of Halliburton .
O B-PER I-PER O B-ORG O
13(3) Cascaded Chunking
- Cascaded chunking gives us the parse tree of
the sentence back.
- Can think of it as chunker taking initial input
and then continues to work on its OWN output
until no more changes are made to input.
- Difference Chunks may contain other chunks and
POS
14(3) Cascaded Chunking Continued
CHUNKER (W w1..wn, T t1..tn)
? T t1..tn
CASCADE (W w1..wn, T t1..tn)
OutputBefore Ø OutputAfter CHUNKER
(W,T) while (OutputBefore ! OutputAfter) do
OutputBefore OutputAfter OutputAfter
CHUNKER (W, OutputBefore) / Output result of
current iteration /
15(3) Cascaded Chunking Continued
The effort to establish such a conclusion is
unnecessary .
DT NN TO VB PDT DT NN VBZ
JJ .
______ __ ________ __________
___________ DT NP IP VP PDT DT
NP AP
__________ ____________ __________________
______________ DP CP DP
CP
...
__________________________________________________
_________ S
- Chunking is an intermediate step to a full parse
16(4) Word Segmentation
- When written, some languages like Chinese dont
have obvious word boundries.
- Problem Find whether a character or group of
characters is a single word?
- Tags B beginning of word
- I in word
- Training Character sequences with corresponding
WS tags.
- Input Character sequences.
17(4) Word Segmentation Continued
????????????????
B I I B I B I B I B B B I B B I
18Conclusion
- All problems are different in their goals, but
with the same type of representation, they all
can be solved with the same approaches.
- We all LOVE sequence learning ?
THE END
19Questions?!
20References
- Manning D., H. Schultze. Foundations of
Statistical Natural Language Processing. 1999. - CoNLL shared task on Chunking 2000. Website
(http//cnts.uia.ac.be/conll2000/chunking/) - CoNLL shared task on NER 2003. Website
(http//cnts.uia.ac.be/conll2003/ner/) - CoNLL shared task on NER 2002. Website
(http//cnts.uia.ac.be/conll2002/ner/) - Abney, S.. Parsing By Chunks. In Journal of
Psychological Research, 18(1), 1989. - Chinese Word Segmentation Bakeoff 2003. Website
(http//www.sighan.org/bakeoff2003)