Chunk Parsing - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Chunk Parsing

Description:

Method: Assign some additional structure to input over tagging. Used when full parsing not feasible or not desirable. ... in [NP the barn] ate. Chunk Parsing ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 23

Provided by: facultyWa4

Learn more at: http://faculty.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Chunk Parsing

1
Chunk Parsing
2
Chunk Parsing

Also called chunking, light parsing, or partial
parsing.
Method Assign some additional structure to
input over tagging
Used when full parsing not feasible or not
desirable.
Because of the expense of full-parsing, often
treated as a stop-gap solution.

3
Chunk Parsing

No rich hierarchy, as in parsing.
Usually one layer above tagging.
The process
Tokenize
Tag
Chunk

4
Chunk Parsing

Like tokenizing and tagging in a few respects
Can skip over material in the input
Often finite-state (or finite-state like) methods
are used (applied over tags)
Often application specific (i.e., the chunks
tagged have uses for particular applications)

5
Chunk Parsing

Chief Motivations to find data or to ignore
data
Example from Bird and Loper find the argument
structures for the verb give.
Can discover significant grammatical structures
before developing a grammar
gave NP
gave up NP in NP
gave NP up
gave NP help
gave NP to NP

6
Chunk Parsing

Like parsing, except
It is not exhaustive, and doesnt pretend to be.
Structures and data can be skipped when not
convenient or not desired
Structures of fixed depth produced
Nested structures typical in parsing
SNP The cow PP in NP the barn ate
Not in chunking
NP The cow in NP the barn ate

7
Chunk Parsing

Finds contiguous, non-overlapping spans of
related text, and groups them into chunks.
Because contiguity is given, finite state methods
can be adapted to chunking

8
Longest Match

Abney 1995 discusses longest match heuristic
One automaton for each phrasal category
Start automata at position i (where i0
initially)
Winner is the automaton with the longest match

9
Longest Match

He took chunks from the PTB
NP ? D N
NP ? D Adj N
VP ? V
Encoded each rule as an automaton
Stored longest matching pattern (the winner)
If no match for a given word, skipped it (in
other words, didnt chunk it)
Results Precision .92, Recall .88

10
An Application

Data-Driven Linguistics Ontology Development (NSF
BCE-0411348)
One focus locate linguistically annotated
(read tagged) text and extract linguistically
relevant terms from text
Attempt to discover meaning of the terms
Intended to build out content of the ontology
(GOLD)
Focus on Interlinear Glossed Text (IGT)

11
An Application

Interlinear Glossed Text (IGT), some examples
(1) Afisi a-na-ph-a nsomba
hyenas SP-PST-kill-ASP fish
The hyenas killed the fish.'
(Baker 1988254)

12
An Application

More examples
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG
The child opened the window. (Megerdoom
ian ??)

13
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)

Problem How do we discover the meaning of the
linguistically salient terms, such as NOM, ACC,
AOR, 3SG?
Perhaps we can discover the meanings by examining
the contexts in which the occur.
POS can be a context.
Problem POS tags rarely used in IGT
How do you assign POS tags to a language you know
nothing about?
IGT gives us aligned text for free!!

14
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN

IGT gives us aligned text for free!!
POS tag the English translation
Align with the glosses and language data
That helps. We now know that NOM and ACC attach
to nouns, not verbs (nominal inflections)
And AOR and 3SG attach to verbs (verbal
inflections)

15
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN

In the LaPolla example, we know that NOM does not
attach to nouns, but to verbs. Must be some
other kind of NOM.

16
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN

How we tagged
Globally applied most frequent tags (stupid
tagger)
Repaired tags where context dictated a change
(e.g., TO preceding race VB)
Technique similar to Brill 1995

17
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN

But can we get more information about NOM, ACC,
etc.?
Can chunking tell us something more about these
terms?
Yes!

18
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN

Chunk phrases, mainly NPs
Since relationship (in simple sentences) between
NPs and verbs tells us something about the verbs
arguments (Bird and Loper 2005)
We can tap this information to discover more
about the linguistic tags

19
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
NP
NP
VP

Apply Abney 1995s longest match heuristic to get
as many chunks as possible (especially NP)
Leverage English canonical SVO (NVN) order to
identify simple argument structures
Use these to discover more information about the
terms
Thus

20
An Application
(4) a. yerexa-n p'at'uhan-e bats-ets

child-NOM window-ACC open-AOR.3SG The
child opened the window. (Megerdoomian ??)
DT NN VBP DT NN
NP
NP
VP

We know that
NOM attaches to subject NPs may be a case
marker indicating subject
ACC attaches to object NPs may be a case marker
indicating object

21
An Application

What we do next look at co-occurrence relations
(clustering) of
Terms with terms
Host categories with terms
To determine more information about the terms
Done by building feature vectors of the various
linguistic grammatical terms (grams)
representing their contexts
And measuring relative distances between these
vectors (in particular, for terms we know)

22
Linguistic Gram Space

Write a Comment

User Comments (0)