Chunk Parsing II - PowerPoint PPT Presentation

About This Presentation

Title:

Chunk Parsing II

Description:

'Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow parser is to divide a text into segments which ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 23

Provided by: facultyWa4

Learn more at: http://faculty.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Chunk Parsing II

1
Chunk Parsing II

Chunking as Tagging

2
Chunk Parsing

Shallow parsing has become an interesting
alternative to full parsing. The main goal of a
shallow parser is to divide a text into segments
which correspond to certain syntactic units.
Although the detailed information from a full
parse is lost, shallow parsing can be done on
non-restricted texts in an efficient and reliable
way. In addition, partial syntactical information
can help to solve many natural language
processing tasks, such as information extraction,
text summarization, machine translation and
spoken language understanding.
Molina Pla 2002

3
Molina Pla

Definitions
Text chunking dividing input text into
non-overlapping segments
Clause identification detecting start and end
boundaries of each clause
What are the chunks of the following? What are
the clauses?
You will start to see shows where viewers
program the program.

4
Molina Pla

Chunks
Clauses

5
Chunk Tags

Chunks and clauses can be represented using tags.
Sang et al 2000s tags
B-X first word of a chunk of type X
I-X non-initial word of chunk
O words or material outside chunks

6
Chunk Tags

You will start to see shows where viewers
program the program.

7
Chunk Tagging

HMMs can be applied to tagging.
HMM tagger, maximize (iinput words, ooutput
tags)
But how do you train an HMM Chunk Tagger? What
should its training data look like? (What are
the is?)

8
Tagging

From Molina and Pla
POS tagging considers only words as input.
Chunking considers words and POS tags as input.
Clause identification considers words, POS tags
and chunks as input.
Problem vocabulary could get very large and the
model would be poorly estimated.

9
Molina Pla

Solution
Enrich chunk tags by adding POS information and
selected words
Describe specialization function fs on the
original training set T to produce a new set T,
essentially transforming every training tuple
ltij, ojgt to ltij, ojgt
Training then done over the new training set

10
Molina and Pla

Examples
ltYou?PRP, B-NPgt ? ltPRP, PRP?B-NPgt
Considering only POS information
ltwhere?WRB, B-ADVPgt ? ltwhere?WRB,
where?WRB?B-ADVPgt
Considering lexical information as well

11
Molina and Pla
12
Molina Pla

Training process
Tag corpus to get the word and tag associations.
The words and tags become the new input (e.g.,
You?PRP, where?WRB)
Chunk a portion of the corpus using Sang et al
(2002) chunk tag outputs. These are the new
outputs (e.g., B-NP, I-NP, )
Apply specialization function across the training
corpus to transform the training set
Train HMM Tagger on transformed set

13
Molina Pla

Tagging
POS tag a corpus
Apply trained tagger against POS tagged corpus.
Take into account input transformations done in
fs
Map relevant information in input to modified
output O
Map output tags O back to O.

14
Molina and Pla

Give brief discussions of other approaches to
chunking
Compare the relative performances of the other
systems
Compare systems with different specialization
functions (different fS)
BTW, they used the TNT tagger developed by
Thorsten Brants, which can be downloaded from the
Web http//www.coli.uni-sb.de/thorsten/tnt
(hardcopy licensing and registration required)

15
N-grams
16
N-grams

An N-gram, or N-gram grammar, represents an
(N-1)th-order Markov language model
Bigram first order
Trigram second order

17
N-grams

The N-gram approximation for calculating the next
word in a sequence is the familiar
P(wnw1n-1) ? P(wnwn-1 )
Probability of a complete string
P(w1n) ? ? P(wkwk-1)
So its possible to talk about
P(I love New York) in a corpus

n-N1
n
k1
18
N-grams

Important to recognize N-grams dont just apply
to words!
We can have n-grams of
Words, POS tags, chunks (MP02)
Characters (Cavnar Trenkle 94)
Phones (Jurafsky colleagues, and loads more)
Binary sequences (for file type identification)
(Li et al 2005)

19
N-grams

The higher the order of the model, the more
specific that model becomes to the source.
Note the discussion in JM re sensitivity to
training corpus

20
N-gram Shakespeare

Unigram

21
N-gram Shakespeare

Quadrigram
Problem due to size of corpus (800K words),
reduced set of continuations to choose from

22
N-grams Language ID

If N-gram models represent language models, can
we use N-gram models for Language Identification?
For example, can we use it to differentiate
between text in German, text in English, text in
Czech, etc.?
If so, how?
Whats the lower threshold for the size of text
that can ensure successful ID?

Write a Comment

User Comments (0)