Chunk Parsing II - PowerPoint PPT Presentation

About This Presentation
Title:

Chunk Parsing II

Description:

'Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow parser is to divide a text into segments which ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 23
Provided by: facultyWa4
Category:

less

Transcript and Presenter's Notes

Title: Chunk Parsing II


1
Chunk Parsing II
  • Chunking as Tagging

2
Chunk Parsing
  • Shallow parsing has become an interesting
    alternative to full parsing. The main goal of a
    shallow parser is to divide a text into segments
    which correspond to certain syntactic units.
    Although the detailed information from a full
    parse is lost, shallow parsing can be done on
    non-restricted texts in an efficient and reliable
    way. In addition, partial syntactical information
    can help to solve many natural language
    processing tasks, such as information extraction,
    text summarization, machine translation and
    spoken language understanding.
  • Molina Pla 2002

3
Molina Pla
  • Definitions
  • Text chunking dividing input text into
    non-overlapping segments
  • Clause identification detecting start and end
    boundaries of each clause
  • What are the chunks of the following? What are
    the clauses?
  • You will start to see shows where viewers
    program the program.

4
Molina Pla
  • Chunks
  • Clauses

5
Chunk Tags
  • Chunks and clauses can be represented using tags.
  • Sang et al 2000s tags
  • B-X first word of a chunk of type X
  • I-X non-initial word of chunk
  • O words or material outside chunks

6
Chunk Tags
  • You will start to see shows where viewers
    program the program.

7
Chunk Tagging
  • HMMs can be applied to tagging.
  • HMM tagger, maximize (iinput words, ooutput
    tags)
  • But how do you train an HMM Chunk Tagger? What
    should its training data look like? (What are
    the is?)

8
Tagging
  • From Molina and Pla
  • POS tagging considers only words as input.
  • Chunking considers words and POS tags as input.
  • Clause identification considers words, POS tags
    and chunks as input.
  • Problem vocabulary could get very large and the
    model would be poorly estimated.

9
Molina Pla
  • Solution
  • Enrich chunk tags by adding POS information and
    selected words
  • Describe specialization function fs on the
    original training set T to produce a new set T,
    essentially transforming every training tuple
    ltij, ojgt to ltij, ojgt
  • Training then done over the new training set

10
Molina and Pla
  • Examples
  • ltYou?PRP, B-NPgt ? ltPRP, PRP?B-NPgt
  • Considering only POS information
  • ltwhere?WRB, B-ADVPgt ? ltwhere?WRB,
    where?WRB?B-ADVPgt
  • Considering lexical information as well

11
Molina and Pla
12
Molina Pla
  • Training process
  • Tag corpus to get the word and tag associations.
    The words and tags become the new input (e.g.,
    You?PRP, where?WRB)
  • Chunk a portion of the corpus using Sang et al
    (2002) chunk tag outputs. These are the new
    outputs (e.g., B-NP, I-NP, )
  • Apply specialization function across the training
    corpus to transform the training set
  • Train HMM Tagger on transformed set

13
Molina Pla
  • Tagging
  • POS tag a corpus
  • Apply trained tagger against POS tagged corpus.
  • Take into account input transformations done in
    fs
  • Map relevant information in input to modified
    output O
  • Map output tags O back to O.

14
Molina and Pla
  • Give brief discussions of other approaches to
    chunking
  • Compare the relative performances of the other
    systems
  • Compare systems with different specialization
    functions (different fS)
  • BTW, they used the TNT tagger developed by
    Thorsten Brants, which can be downloaded from the
    Web http//www.coli.uni-sb.de/thorsten/tnt
    (hardcopy licensing and registration required)

15
N-grams
16
N-grams
  • An N-gram, or N-gram grammar, represents an
    (N-1)th-order Markov language model
  • Bigram first order
  • Trigram second order

17
N-grams
  • The N-gram approximation for calculating the next
    word in a sequence is the familiar
  • P(wnw1n-1) ? P(wnwn-1 )
  • Probability of a complete string
  • P(w1n) ? ? P(wkwk-1)
  • So its possible to talk about
  • P(I love New York) in a corpus

n-N1
n
k1
18
N-grams
  • Important to recognize N-grams dont just apply
    to words!
  • We can have n-grams of
  • Words, POS tags, chunks (MP02)
  • Characters (Cavnar Trenkle 94)
  • Phones (Jurafsky colleagues, and loads more)
  • Binary sequences (for file type identification)
    (Li et al 2005)

19
N-grams
  • The higher the order of the model, the more
    specific that model becomes to the source.
  • Note the discussion in JM re sensitivity to
    training corpus

20
N-gram Shakespeare
  • Unigram

21
N-gram Shakespeare
  • Quadrigram
  • Problem due to size of corpus (800K words),
    reduced set of continuations to choose from

22
N-grams Language ID
  • If N-gram models represent language models, can
    we use N-gram models for Language Identification?
  • For example, can we use it to differentiate
    between text in German, text in English, text in
    Czech, etc.?
  • If so, how?
  • Whats the lower threshold for the size of text
    that can ensure successful ID?
Write a Comment
User Comments (0)
About PowerShow.com