Feature Vector Quality and Distributional Similarity - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Feature Vector Quality and Distributional Similarity

Description:

– PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 63
Provided by: Lea79
Category:

less

Transcript and Presenter's Notes

Title: Feature Vector Quality and Distributional Similarity


1
Textual EntailmentA Framework for Applied
Semantics

2
Outline a Vision
  • Why do we need Text Understanding?
  • Capture understanding by Textual Entailment
  • Does one text entail another?
  • Major challenge knowledge acquisition
  • Initial applications
  • Looking 5 years ahead

3
Text Understanding
  • Vision for improving information access
  • Common search engines
  • Still text processing mostly matches query
    keywords
  • Deeper understanding
  • Consider the meanings of words and the
    relationships between them
  • Relevant for applications
  • Question answering, information extraction,
    semantic search, summarization

4
(No Transcript)
5
(No Transcript)
6
Towards text understanding Question Answering
7
(No Transcript)
8
Information Extraction (IE)
  • Identify information of pre-determined structure
  • Automatic filling of forms
  • Example - extract product information

9
Search may benefit understanding
  • Query AIDS treatment
  • Irrelevant document
  • Hemophiliacs lack a protein, called factor VIII,
    that is essential for making blood clots. As a
    result, they frequently suffer internal bleeding
    and must receive infusions of clotting protein
    derived from human blood.During the early 1980s,
    these treatments were often tainted withthe AIDS
    virus. In 1984, after that was discovered,
    manufacturersbegan heating factor VIII to kill
    the virus. The strategy greatlyreduced the
    problem but was not foolproof. However, many
    expertsbelieve that adding detergents and other
    refinements to thepurification process has made
    natural factor VIII virtually free ofAIDS.
  • (AP890118-0146, TIPSTER Vol. 1)
  • Many irrelevant documents mention AIDS and
    treatments for other diseases

10
Relevant Document
  • Query AIDS treatment
  • Federal health officials are recommending
    aggressive use of a newly approved drug that
    protects people infected with the AIDS virus
    against a form of pneumonia that is the No.1
    killer of AIDS victims.The Food and Drug
    Administration approved the drug, aerosol
    pentamidine, on Thursday. The announcement came
    as the Centers for Disease Control issued greatly
    expanded treatment guidelines recommending wider
    use of the drug in people infected with the AIDS
    virus but who may show no symptoms.
  • (AP890616-0048, TIPSTER VOL. 1)
  • Relevant documents may mention specific types of
    treatments for AIDS

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
Why is it difficult?
Meaning
Language
15
Variability of Semantic Expression
The Dow Jones Industrial Average closed up 255
Dow ends up
Dow gains 255 points
Stock market hits a record high
Dow climbs 255
16
Its all about entailment
Question Expected answer templateWho
acquired Overture? gtgt X acquired Overture
Yahoo acquired Overture
Yahoos buyout of Overture
entails
hypothesized answer
text
  • Application inferences can be reduced to
    entailment!
  • IE X acquire Y
  • Summarization (multi-document) identify
    redundant sentences
  • MT paraphrasing, evaluation
  • Educational applications student answer vs.
    reference

17
Applied Textual Entailment
  • Directional relation between two text fragments
    Text (t) and Hypothesis (h)
  • Operational (applied) definition
  • Human gold standard
  • Entailment judgment matches applications
    judgments
  • Assuming common background knowledge
  • Language world knowledge

18
Textual Entailment Human Reading Comprehension
  • From a childrens English learning book(Sela and
    Greenberg)
  • Reference Text The Bermuda Triangle lies in
    the Atlantic Ocean, off the coast of Florida.
  • Hypothesis (True/False?) The Bermuda Triangle is
    near the United States

???
19
PASCAL Recognizing Textual Entailment (RTE)
ChallengesFP-6 Funded PASCAL NOE 2004-7
Bar-Ilan University ITC-irst and CELCT,
Trento MITRE Microsoft Research
20
Some Examples
21
Participation and Impact
  • Very successful challenges, world wide
  • RTE-1 17 groups
  • RTE-2 23 groups
  • 150 downloads!
  • RTE-3 25 groups
  • RTE-4 (2008) 25 groups, moved to NIST (TREC
    organizers)
  • High interest in the research community
  • Papers, conference keywords, sessions and areas,
    PhDs, influence on funded Projects
  • special issue at Journal of Natural Language
    Engineering

22
Results RTE-2
Average 60 Median 59
23
Classical Approach Semantics as and
Interpretation Task
Stipulated Meaning Representation(by scholar)
Variability
Language(by nature)
  • Logical forms, word senses, semantic roles,
    named entity types, - scattered tasks
  • Feasible/suitable framework for applied
    semantics?

24
Textual Entailment Text Mapping
Assumed Meaning (by humans)
Variability
Language(by nature)
25
General Case Inference
MeaningRepresentation
Inference
Interpretation
Language
Textual Entailment
  • Entailment mapping is the actual applied goal
    - lets agree on it as unified test for
    (all) semantic tasks
  • Interpretation becomes a possible mean -
    direct inference at language level may be
    attempted

26
What is the main obstacle?
  • System reports point at
  • Lack of knowledge
  • rules, paraphrases, lexical relations, etc.
  • It seems that systems that coped better with
    these issues performed best

27
Research Directions at Bar-IlanKnowledge
AcquisitionInferenceApplications
Oren Glickman, Idan Szpektor, Roy Bar Haim,
Maayan Geffet, Moshe Koppel Bar Ilan
UniversityShachar Mirkin Hebrew University,
Israel Hristo Tanev, Bernardo Magnini, Alberto
Lavelli, Lorenza Romano ITC-irst,
Italy Bonaventura Coppola, Milen Kouylekov
University of Trento and ITC-irst, Italy
28
Distributional Word Similarity
Similar words appear in similar contexts
Harris, 1968
Similar Word Meanings ? Similar Contexts
Distributional Similarity Model
Similar Word Meanings ? Similar Context Features
29
Measuring Context Similarity
  • Country State
  • Industry (genitive) Neighboring (modifier)
  • Neighboring (modifier)
  • Governor (modifier)
  • Visit (obj) Parliament (genitive)
  • Industry (genitive)
  • Population (genitive)
  • Governor (modifier) Visit (obj)
  • Parliament (genitive) President (genitive)

30
Incorporate Indicative Patterns
31
Acquisition Example
  • Top-ranked entailments for company
  • firm, bank, group, subsidiary, unit, business,
  • supplier, carrier, agency, airline, division,
    giant,
  • entity, financial institution, manufacturer,
    corporation,
  • commercial bank, joint venture, maker, producer,
    factory
  • Current work extraction from Wikipedia

32
Extracting Lexical Rules from Wikipedia
  • Be-complement
  • Nominal complements of be
  • Redirect
  • various terms to canonical title
  • Parenthesis
  • used for disambiguation
  • Link
  • Maps to a title of another article

33
Entailment Rules for Predicates
Q What reduces the risk of Heart Attacks?
TextAspirin prevents Heart Attacks
Hypothesis Aspirin reduces the risk ofHeart
Attacks
Entailment RuleX prevent Y ? X reduce risk of
Y
template
template
? Need a large knowledge base of entailment rules
34
TEASE Algorithm Flow
Lexicon
Input template X?subj-accuse-obj?Y
WEB
TEASE
Sample corpus for input template Paula Jones
accused Clinton Sanhedrin accused St.Paul
Anchor Set Extraction(ASE)
Anchor sets Paula Jones?subj
Clinton?obj Sanhedrin?subj St.Paul?obj
Template Extraction (TE)
Sample corpus for anchor sets Paula Jones called
Clinton indictable St.Paul defended before the
Sanhedrin
Templates X call Y indictableY defend before X
iterate
35
Sample of ExtractedAnchor-Sets for X prevent Y
36
Sample of Extracted Templates for X prevent Y
37
Experiment and Evaluation
  • 48 randomly chosen input verbs
  • 1392 templates extracted human judgments
  • Encouraging Results
  • Future work improve precision

38
Syntactic Variability Phenomena
Template X activate Y
39
Inference putting it all together
A Proof system over parse trees A compact,
unified formalism for knowledge representation
and inference at the lexical-syntactic level
  • Providing
  • Uniform representation for all knowledge types
  • A single knowledge-based inference mechanism

40
Proof System Components
Research goal Develop formalism components to
support the needed inferences
41
Inference Rules Tree Transformations
  • Pair of subtrees with shared variables
    (templates)
  • Example

Passive-to-Active
R
L
Vverb
Vverb
subj
obj
by
obj
be
N1noun
N2noun
beverb
byprep
N1noun
pcomp-n
N2noun
The book was read by John
John read the book
42
Alignments
  • The book was read by John yesterday

L
R
Vverb
Vverb
subj
obj
by
obj
be
N1noun
N2noun
beverb
byprep
N1noun
pcomp-n
N2noun
  • We want to infer John read the book yesterday
  • We introduced alignments to indicate copying of
    modifiers to the generated tree

43
Proof Example
  • Text It rained when John and Mary left
  • ?
  • Hypothesis Mary left

44
Proof Example
It rained when John and Mary left
It rained when Mary left
Mary left
?
?
ROOT
i
rainverb
expletive
expletive
wha
itother
whenadj
i
leaveverb
subj
Johnnoun
conj
Marynoun
45
Making sense of (implicit) senses
  • What is the RIGHT set of senses?
  • Any concrete set is problematic/subjective
  • but WSD forces you to choose one
  • A lexical entailment perspective
  • Instead of identifying an explicitly stipulated
    sense of a word occurrence
  • identify whether a word occurrence (i.e. its
    implicit sense) entails another word occurrence,
    in context
  • ACL-2006

46
Lexical Matching for Applications
  • Sense equivalence

Q announcement of new models of chairs
T1 IKEA announced a new comfort chair
T2 MIT announced a new CS chair position
  • Sense entailment in substitution

Q announcement of new models of furniture
T1 IKEA announced a new comfort chair
T2 MIT announced a new CS chair position
47
Unsupervised Direct kNN-ranking
  • Test example score Average Cosine similarity
    with k most similar training examples
  • Rational
  • positive examples will be similar to some source
    occurrence (of corresponding sense)
  • negative examples wont be similar to source
  • Rank test examples by score
  • A classification slant on language modeling

48
Results (for synonyms) Ranking
  • kNN improves 8-18 precision up to 25 recall

49
Initial ApplicationsRelation
ExtractionSemantic Search
50
Relation Extraction
Input Template X prevent Y
Entailment Rule Acquisition
TEASE
Templates X prevention for Y, X treat Y, X reduce
Y
TransformationRules
Syntactic Matcher
Relation Instances ltsunscreen, sunburnsgt
51
Dataset
  • Recognizing interactions between annotated
    proteins pairs (Bunescu 2005)
  • 200 Medline abstracts
  • Gold standard dataset of protein pairs
  • Input template X interact with Y

52
Manual Analysis - Results
  • 93 of interacting protein pairs can be
    identified with lexical syntactic templates

Number of templates vs. recall (within 93)
53
TEASE Output for X interact with Y
A sample of correct templates learned
54
TEASE algorithm - Potential Recall on Training
Set
  • Iterative - taking the top 5 ranked templates as
    input
  • Morph - recognizing morphological derivations

55
Results Vs Supervised Approaches
  • 180 training abstracts

56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
Integrating IE and Search (w. IBM Research Haifa)
60
TE for Summarization(Harabagiu et al., IPM 2007)
  • Textual entailment roles
  • Selecting information
  • Scoring summaries via pyramid-based measures

61
Entailment Engine Specification (API)
  • Recognize/generate entailments
  • Recognize given t/h pair
  • RTE mode validation
  • Given h and corpus, find all entailing texts
  • IR, QA, FAQ
  • Given text, generate all entailed statements
  • paraphrase generation for MT
  • Identify partial entailments
  • summarization, partial match
  • Accommodate template hypotheses
  • Addresses variable value extraction (QA, IE)
  • Accommodate contextual preferences in input
  • Variable types (IE, QA, IR)
  • Disambiguating context

62
Optimistic Conclusions
  • Good prospects for better levels of text
    understanding
  • Enabling more sophisticated information access
  • Textual entailment is an appealing framework
  • Boosts research on text understanding
  • Potential for vast knowledge acquisition

Thank you!
Write a Comment
User Comments (0)
About PowerShow.com