Title: QuASI: Question Answering using Statistics, Semantics, and Inference
1QuASI Question Answering using Statistics,
Semantics, and Inference
- Marti Hearst, Jerry Feldman, Chris Manning,
Srini Narayanan - Univ. of California-Berkeley / ICSI / Stanford
University
2Outline
- Project Overview
- Three topics
- Assigning semantic relations via lexical
hierarchies - From sentences to meanings via syntax
- From text analysis to inference using conceptual
schemas
3Main Goals
- Support Question-Answering and NLP in general
by - Deepening our understanding of concepts that
underlie all languages - Creating empirical approaches to identifying
semantic relations from free text - Developing probabilistic inferencing algorithms
4Two Main Thrusts
- Text-based
- Use empirical corpus-based techniques to extract
simple semantic relations - Combine these relations to perform simple
inferences - statistical semantic grammar
- Concept-based
- Determine language-universal conceptual
principles - Determine how inferences are made among these
5Relation Recognition (UCB)
- Abbreviation Definition Recognition
- TREC Genomics Track
- Semantic Relation Identification
6Abbreviation Dectection (UCB)
- Abbreviation Definition Recognition
- Developed and evaluated new algorithm
- Better results than existing approaches
- Simpler and faster as well
- Semantic Relation Identification
- Developed syntactic chunker
- Analyzed sample relations
- Began development of a new computational model
- Incorporates syntax and semantic labels
- Test example identify treatment for disease
7Abbreviation Examples
- Heat-shock protein 40 (Hsp40) enables Hsp70 to
play critical roles in a number of cellular
processes, such as protein folding, assembly,
degradation and translocation in vivo. - Glutathione S-transferase pull-down experiments
showed the direct interaction of in vitro
translated p110, p64, and p58 of the essential
CBF3 kinetochore protein complex with Cbf1p, a
basic region helix-loop-helix zipper protein
(bHLHzip) that specifically binds to the CDEI
region on the centromere DNA. - Hpa2 is a member of the Gcn5-related
N-acetyltransferase (GNAT) superfamily, a family
of enzymes with diverse substrates including
histones, other proteins,arylalkylamines and
aminoglycosides.
8The Algorithm
- Much simpler than other approaches.
- Extracts abbreviation-definition candidates
adjacent to parentheses. - Finds correct definitions by matching characters
in the abbreviation to characters in the
definition, starting from the right. - The first character in the abbreviation must
match a character at the beginning of a word in
the definition. - To increase precision a few simple heuristics are
applied to eliminate incorrect pairs. - Example Heat shock transcription factor (HSF).
- The algorithm finds the correct definition, but
not the correct alignment Heat shock
transcription factor
9Results
- On the gold-standard the algorithm achieved 83
recall at 96 precision. - On a larger test collection the results were 82
recall at 95 precision. - These results show that a very simple algorithm
produces results that are comparable to these of
the exiting more complex algorithms.
Counting partial matches, and abbreviations
missing from the gold-standard our algorithm
achieved 83 recall at 99 precision.
10TREC Task 1 Overview
- Search 525,938 MedLine records
- Titles, abstracts, MeSH category terms, citation
information - Topics
- Taken from the GeneRIF portion of the LocusLink
database - We are supplied with a gene names
- Definition of a GeneRIF
- For gene X, find all MEDLINE references that
focus on the basic biology of the gene or its
protein products from the designated organism.
Basic biology includes isolation, structure,
genetics and function of genes/proteins in normal
and disease states. -
11TREC Task 1 Sample Query
- 3 2120 Homo sapiens OFFICIAL_GENE_NAME ets
variant gene 6 (TEL ncogene) - 3 2120 Homo sapiens OFFICIAL_SYMBOL ETV6
- 3 2120 Homo sapiens ALIAS_SYMBOL TEL
- 3 2120 Homo sapiens PREFERRED_PRODUCT ets variant
gene 6 - 3 2120 Homo sapiens PRODUCT ets variant gene 6
- 3 2120 Homo sapiens ALIAS_PROT TEL1 oncogene
- The first column is the official topic number
(1-50). - The second column contains the LocusLink ID for
the gene. - The third column contains the name of organism.
- The fourth column contains the gene name type.
- The fifth column contains the gene name.
12TREC Task 1 Approach
- Two main components
- Retrieve relevant docs
- May miss many because of variation in how gene
names are expressed - Rank order them
13TREC Task 1 Approach
- Retrieval
- Normalization of query terms
- Special characters are replaced with spaces in
both queries and documents. - Term expansion
- A set of pattern based rules is applied to the
original list of query terms, to expand the
original set, and increase recall. - Some rules with lower confidence get a lower
weight in the ranking step. - Stop word removal
- Organism identification
- Gene names are often shared across different
organisms - Developed a method to automatically determine
which MeSH terms correspond to LocusLink Organism
terms - Retrieved Medline docs indicated by LocusLink
links corresponding to a given organism - Organism terms were the most frequent MeSH
categories among the selected docs - Used these terms to identify the organism term in
Medline - An example of playing two databases off each
other. - Mesh concepts
- When an exact match is found between one of the
query terms and a MeSH term assigned to a
document, the document is retrieved.
14TREC Task 1 Approach
- Relevance ranking
- IBMs DB2 Net Search Extender was used as the
text search engine. - Scoring
- Each query is a union of 5 different sub-queries
- - titles,
- abstracts,
- titles using low confidence expansion rules,
- abstracts using low confidence expansion rules,
and - MeSH concepts.
- Each sub-query returns a set of documents with a
relevance score from the text search engine (or a
fixed value for MeSH matches) - The aggregated score is the weighted SUM of the
individual scores with optional weights applied
to each sub-query score. - SUM performs better than MAX, since it gives
higher confidence to documents found in multiple
sub-queries. - Scores are normalized to be in the (0,1) range,
by dividing the score by the highest aggregated
score achieved for the query.
15TREC Task 1 Approach
- GeneRIF classification
- A Naïve Bayes model is used to assign to each
document the probability it is a GeneRIF. - MeSH terms are used as features.
- Combination of text retrieval score and GeneRIF
classification score. - We tried both an additive and a multiplicative
approach. Both behave similarly with a slightly
better performance achieved with the additive one.
16TREC Task 1 Results
- Performance is measured using the standard
trec_eval program. - On training data
- Best published result 0.4125
- With GeneRIF classifier 0.5101
- Without GeneRIF classifier 0.5028
- On testing data (turned in 8/4/03)
- With GeneRIF classifier 0.3933
- Without GeneRIF classifier 0.3768
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27The Stanford Lexicalized ParserAn open source
Java parser
- Dan Klein, Roger Levy, and Chris Manning Computer
Science and Linguistics - Stanford University
- http//nlp.stanford.edu/
28Probabilistic parsing
- Standard Solutions Collins 96, 99 Charniak 97,
00 - Capture word-specific trends by lexicalizing
symbols. - Capture environment-specific trends by marking
ancestors. - Benefits
- Model context-freedom matches data
context-freedom better. - Maximum posterior parses are correct more often.
- Costs
- State space becomes huge.
- Joint estimates become extremely sparse.
- Exact inference becomes infeasible.
- Parsers become difficult to engineer.
NP becomes NPrates
NPrates becomes NPVPSrates
We want to address these issues.
29Factoring Syntax and Semantics
Lexicalized Tree T (C,D) P(T) P(C)P(D)
Syntax C P(C) is a standard PCFG, captures
structural patterns
Semantics D P(D) is a dependency grammar,
captures word-word patterns
30Efficient exact inference The Factored A
Estimate
?D
aT?C?D??T
?C
- A parsing will be efficient if we can find a
tight upper bound on - Finding the score of the best coherent pair (C,D)
is as hard as parsing, but P(C) and P(D) alone
are very simple, and so we can quickly find - These maximizations, considered jointly,
effectively range over all pairs (C,D) instead of
only coherent ones, so we know that ?T(E) ? ?C(E)
?D(E). We can therefore use a(E) ?T(E) ?D(E)
as a good admissible estimate.
31Results Accuracy
Details Syntactic Basic is the unsmoothed
parent-annotated treebank covering grammar best
includes other annotation. Semantic Basic is a
word-word model smoothed by tags best includes a
simple distance and valence model. Results on
Penn Treebank WSJ Section 23. Labeled bracketing
is average sentence F1. Gold dependencies induced
heuristically from gold parsed trees. Klein and
Manning, IJCAI 2003
Labeled Bracketing Accuracy (F1)
Dependency Accuracy
32Results Efficiency
- The factored A estimate reduces work by a factor
of between 100 and 10,000 compared to exhaustive
parsing.
Search work!
- Details
- Parser uses the Eisner Satta 99 O(n4) schema
(though the exponential observed growth suggests
that so little work is being done that the
dominant effect is the small-constant exponential
function of the A gap, not the large-constant
polynomial function of the sentence length). - The total time is dominated by the plain-PCFG
parse phase, which can be reduced.
33Recent Focus Accurate unlexicalized parsing
- Most of the emphasis in the last decade has been
on exploiting lexical dependencies - We show that accurate structural (syntactic)
modeling has been highly underexploited - Strategy deterministically refine the category
set of a treebank so it better reflects important
linguistic distinctions (and hence better models
probabilistic dependencies) - Our best unlexicalized parsers outperform early
lexicalized parsers Klein and Manning, ACL 2003
cf. Magerman 1995 84.7, Collins 1996 86.0
34Recent Focus Accurate unlexicalized parsing
- E.g. representing subordinating complementizers
in category set fixes PP parse on the left
35Recent Focus Accurate unlexicalized parsing
Note development set performance final test
set 40 words F1 86.32
- Illustrates the strength of the Factored Parser
architecture we can quickly and easily improve
one component - Unlexicalized grammar is more domain-independent
36Unlexicalized Sec. 23 Results
- Beats first generation lexicalized parsers.
- Much of the power of lexicalization from
closed-class monolexicalization.
37Multilingual Parsing ChineseSyntactic sources
of ambiguity
- English PP attachment (well-understood)
coordination scoping (less well-understood) - Chinese modifier attachment less of a problem,
as verbal modifiers direct objects arent
adjacent, and NP modifiers are overtly marked.
38Chinese Performance
- Close to state-of-the-art for Chinese parsing
- Considerable difference in precision/recall split
from other work suggests complementary
strengths - Levy and Manning, ACL 2003
39Recent Chinese results learning curve
- New release of Chinese Treebank provides more
data (300,000 words)
40Multilingual Parsing German
- Linguistic characteristics, relative to English
- Ample derivational and inflectional morphology
- Freer word order
- Verb position differs in matrix/embedded clauses
- Target corpus Negra
- 400,000 words newswire text
- Flatter phrase structure annotations (few PPs!)
- Explicitly marked phrasal discontinuities
41Current results (preliminary)
- Area needing investigation Word dependency model
currently gives relatively little improvement. - Consistent with Dubey and Kellers findings that
basic head-complement lexical dependencies harm
performance for Negra German
42Upcoming
- Incorporation of morphological information into
parsing model - Recently released TIGER corpus (similar to Negra,
800,000 words) - Additional languages (Czech, Arabic)
- Reconstruction of dislocated argument positions
(common in German, Czech, many other languages)
43Semantic Role Identification Problem Statement
- Given a sentence and a word of interest (the
predicator) in that sentence - Find
- The constituents related to that word and the
nature of those relationships - The overarching relationship (the frame) for the
word and its roles - Example Tim drove his car to the store.
- TimDriver his carVehicle to the
storeGoal - Relationship Transportation
44Annotated Examples
- Judge We praised Evaluee the syrup tart
extravagantly. - Her verse circulated to Manner warm Judge
critical praise. - Agent His brothers avenged Injured_party him.
- Selector The president appoints Leader a Prime
Minister Conditions each year. - She bought Count three Unit kilos Stuff of
apples. - Beh It was Degree really mean Evaluee of
me.
45Benefits of Solving the Problem
- Identify that two syntactically different phrases
play the same role - The board changed their ruling yesterday.
- The ruling changed because of protests.
- NLP Question answering, WSD, translation,
summarization, speech recognition - Computational Biology Operon Prediction
- Security
- Intrusion Detection
- Credit Card Fraud
46A Generative Model
47Results Framenet I
48Confusion Table, Roles Contributing Most to Error
Rows, correct Columns guesses
49Results Framenet II
Test Set Accuracy
Comparable numbers for Framenet I
50Concept-based Analysis
Uniform formalism for encoding conceptual
relations and grammatical constructions Initial
version of construction parser Coordinated
Relational Probabilistic Models for inference
51Inference and Conceptual Schemas Background
- Hypothesis
- Linguistic input is converted into a mental
simulation based on bodily-grounded structures. - Components
- Semantic schemas
- image schemas and executing schemas are
abstractions over neurally grounded perceptual
and motor representations - Linguistic units
- lexical and phrasal construction representations
invoke schemas, in part through metaphor - Inference links these structures and provides
parameters for a simulation engine
52Conceptual Schemas
- We have developed an formalism for encoding
conceptual schemas. - Structured feature structure representation
(ECG). - Uniform representation for conceptual relations
and for grammatical constructions. - Supports structured probabilistic inference.
- Initial DAMLOIL implementation.
- Produced by a construction parser.
53Construction Parser
- The parser maps from language input to a deep
semantic specification - The semantic specification is network of linked
conceptual ECG schemas - Language and domain independent
- Supports structured probabilistic inference.
- First system running since November 2002
- Uses novel parsing techniques combining chunking,
unification, and semantic fit
54State of Resource Development
- MetaNet
- Pilot System implemented
- SQL-based backend (Michael Meisel, CS Undergrad).
- Data-Entry GUI.
- Database is being populated with Image Schemas
(Ellen Dodge, Ling Grad) - FrameNet
- DAMLOIL version of FrameNet-1
- Combining FrameNet and WordNet for Semantic
Extraction (Behrang Mohit, SIMS and ICSI,
recently UTD) - Good use of FrameNet for QA (UTD, Stanford, CU)
- Linking to external ontologies
- ECG OpenCyc Link (Preslov Nakov, Marco Barreno)
55Dynamic Probabilistic Inference for event
structure
Srini Narayanan Jerry Feldman ICSI and UC
Berkeley
56Scenario Question (CNS data)
- How has Al-Qaida conducted its efforts to acquire
WMD capability and what are the results of this
endeavor? - Even with perfect parsing, to answer this
question, we have to go beyond words in the input
in at least the following ways - Multiple sources (reports, evidence, news)
- Fusing information from unreliable sources
(P(Information true source)) - Non-monotonicity. Previous assertions or
predictions may have to be retracted in the light
of new evidence. - Modeling complex events
- Evolving events with complex dynamics including
sequence, concurrency, coordination,
interruptions and resources.
57Reasoning about Events for QA
- Reasoning about dynamics
- Complex event structure
- Multiple stages, interruptions, resources
- Evolving events
- Conditional events, presuppositions.
- Nested temporal and aspectual references
- Past, future event references
- Metaphoric references
- Use of motion domain to describe complex events.
- Reasoning with Uncertainty
- Combining Evidence from Multiple, unreliable
sources - Non-monotonic inference
- Retracting previous assertions
- Conditioning on partial evidence
58Cognitive Semantics
- Much of language and thought is directly embodied
and relies on recurrent patterns of familiar
experience - Image Schemas
- Containment, Force Dynamics, Spatial Relations
- Motor Schemas
- Homeostastis, Source Path Goal, Monitoring,
Aspect - Social Cognition
- Authority, Care-giving, play
- Abstract Language and Thought derive a
significant amount of their meaning from mappings
to embodied schemas - Event Structure Metaphor, Projection invariants
and Cogs (Aspect, topological relations), Frames,
Mental Spaces.
59Previous work
- Models of event structure that are able to deal
with the temporal and aspectual structure of
events - Based on an active semantics of events and a
factorized graphical model of complex states. - Models event stages, embedding, multi-level
perspectives and coordination. - Event model based on a Stochastic Petri Net
representation with extensions allowing
hierarchical decomposition. - State is represented as a Temporal Bayes Net
(T(D)BN).
60(No Transcript)
61(No Transcript)
62(No Transcript)
63Factorized Inference
64Quantifying the model
65Pilot System Results
- Captures fine grained distinctions needed for
interpretation - Frame-based Inferences (COLING02)
- Aspectual Inferences (Cogsci98, IJCAI 99,
COLING02) - Metaphoric Inferences (AAAI 99)
- Sufficient Inductive bias for verb learning
(Bailey97, CogSci99), construction learning
(Chang02, to Appear) - Model for DAML-S (WWW02, Computer Networks 03)
66Extensions to Pilot System
- Scalable Data Resources
- Language Resources/Ontology
- Lexicon (Open Source, WordNet, FrameNet)
- Conceptual Relations
- Schemas, Maps, Frames, Mental Space
- General Principle Use Semantic Web resources
- (DAML, DAML-S, OpenCYC, IEEE SUMO)
- Language Analyzer
- Construction Parser (ICSI/EML)
- Statistical techniques (UCB/Stanford, CU,UTD)
- Scalable Domain Representation
- Coordinated Probabilistic Relational Models
67Problems with DBN
- Scaling up to relational structures
- Supports linear (sequence) but not branching
(concurrency, coordination) dynamics
68Structured Probabilistic Inference
69Probabilistic inference for QA
- Filtering
- P(X_t o_1t,X_1t)
- Update the state based on the observation
sequence and state set - MAP Estimation
- Argmaxh1hnP(X_t o_1t, X_1t)
- Return the best assignment of values to the
hypothesis variables given the observation and
states - Smoothing
- P(X_t-k o_1t, X_1t)
- modify assumptions about previous states, given
observation sequence and state set - Projection/Prediction/Reachability
- P(X_tk o_1..t, X_1..t)
- Predict future states based on observation
sequence and state set
70The CPRM algorithm
- Combines insights from
- the SVE algorithm for PRMs (Pfeffer 2000)
- the frontier algorithms for temporal models
(Murphy 2002) and - Inference algorithms for complex, coordinated
events (Narayanan 1999) - Expressive Probabilistic Modeling paradigm with
relations and branching dynamics. - Offers principled methods to bound inferential
complexity.
71Summary
- QA with complex scenarios (such as the CNS
scenario/data) needs complex inference that deals
with - Relational Structure
- Uncertain source and domain knowledge
- Complex dynamics and evolving events
- We have developed a representation and inference
algorithm that is capable of tractable inference
for a variety of domains. - We are collaborating with UTD (Sanda Harabagiu)
to apply these techniques to QA systems.
72Putting it all Together
- We explored two related levels of semantics
- Universal conceptual schemas
- Extracting semantic relations from text
- In Phase I they remained separate
- However, we came up with CPRMs as a common
representational format - In Phase II we propose to combine them in an
semantically based integrated QA system.