static model noOverlaps :: ArgumentCandidate[] candidates - PowerPoint PPT Presentation

About This Presentation
Title:

static model noOverlaps :: ArgumentCandidate[] candidates

Description:

Learning Based Java for Rapid Development of NLP Systems Nick Rizzolo and Dan Roth Introduction For Example, Semantic Role Labeling: LBJ: Specifying Models, Features ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 2
Provided by: View61
Category:

less

Transcript and Presenter's Notes

Title: static model noOverlaps :: ArgumentCandidate[] candidates


1
Learning Based Java for Rapid Development of NLP
Systems Nick Rizzolo and Dan Roth
Introduction
For Example, Semantic Role Labeling
LBJ Specifying Models, Features, and Constraints
Today's natural language processing systems are
growing more complex with the need to incorporate
a wider range of language resources and more
sophisticated statistical methods. In many
cases, it is necessary to learn a component with
input that includes the predictions of other
learned components or to simultaneously assign
values to multiple components with an expressive,
data dependent structure among them. As a
result, the design of systems with multiple
learning components is inevitably quite
technically complex, and implementations of
conceptually simple NLP systems can be time
consuming and prone to error. Our new modeling
language, Learning Based Java (LBJ), facilitates
the rapid development of systems that learn and
perform inference. LBJ has already been used to
build state of the art NLP systems. In this
work, we first demonstrate that there exists a
theoretical model that describes most NLP
approaches adeptly. Second, we show how our LBJ
language enables the programmer to describe the
theoretical model succinctly. Finally, we
introduce the concept of data driven compilation,
a translation process in which the efficiency of
the generated code benefits from the data given
as input to the learning algorithms.
Goal identify and classify the arguments of a
given verb.
model ArgumentIdentifier discrete input -gt
boolean isArgument input /\
isArgument model ArgumentType discrete
input -gt discrete type input /\ type
input /\ input /\ type static model
pertinentData ArgumentCandidate
candidate -gt discrete data
data.phraseType candidate.phraseType()
data.headWord candidate.hadWord()
data.headTag candidate.headTag()
data.path candidate.path()
The binary argument identification classifier
determines whether each sequence of words is an
argument.
The multi-class argument type classifier labels
each argument with a type.
ID
Type
Every application of either classifier
contributes to
Classifiers
A1
R-A1
A0
V
A2
static model noOverlaps ArgumentCandidate
candidates -gt discrete types for (i
(0 .. candidates.size() - 1)) for (j (i 1
.. candidates.size() - 1))
candidatesi.overlapsWith(candidatesj)
gt typesi "null" typesj
"null" static model noDuplicates -gt
discrete types forall (v
types0.values) atmost 1 of (t types) t
v static model referenceConsistency -gt
discrete types forall (value
types0.values) (exists (var types) var
"R-" value) gt (exists (var types)
var value)
The pearls
which
I
left
to my daughter
were shiny.
R-A1 gt A1
A2 gt No other A2
Thus, we have a CCM.
Constraints
E
LBJ Model Combination and Inference
Data-Driven Compilation
Developing a machine learning framework as a
stand-alone language as opposed to a library
opens the door to opportunities for automatically
improving the efficiency of the code based on
high-level analyses. In particular, much of the
information necessary to generate the final
program code is only available in the training
data. Thus, we say that an LBJ compiler performs
data-driven compilation.
Constrained Conditional Models (CCMs)
Example Feature Extraction
Consider the following lexicon, mapping from
features to integers, that appears in many NLP
systems.
In a nut shell, to perform inference, a CCM finds
the values of the output variables that maximize
the following scoring function
Feature Values gt Typical Index LBJ Index
word the gt 0 0
word table gt 1 1
word gt
POS DT gt 50000 0
POS NN gt 50001 1
POS gt
word /\ POS the /\ DT gt 50050 computed
word /\ POS table /\ NN gt 50051 computed
solver SRLInference SRLProblem problem
Greedy.solve problem.isArgument ILP.solve
problem.types
State-of-the-art LBJ Implementations
LBJ has already been used to develop several
state-of-the-art resources.
CCMs subsume discriminative and probabilistic
modeling formalisms. Thus, a wide variety of
learning and inference algorithms can be applied
to them.
While a typical lexicon will store an integer
index associated with every feature value, LBJ
will only store indexes for the atomic features.
Indexes of composite features can then be
computed on the fly based on the indexes of the
atomic features values. The result is a much
smaller lexicon, and fewer hash table look-ups.
This work was supported by NSF grant NSF
SoD-HCER-0613885.
Write a Comment
User Comments (0)
About PowerShow.com