Title: A Discourse-based Approach to Generating Why-questions from Texts
1A Discourse-based Approach to Generating
Why-questions from Texts
- Rashmi Prasad and Aravind Joshi
- University of Pennsylvania
- rjprasad,joshi_at_seas.upenn.edu
- September 2008
- Workshop on the Question Generation Shared Task
and Evaluation Challenge
2Top-level Task Division
Why-QG
Source identification (content selection)
Text generation (planning/realization)
3Top-level Task Division
Why-QG
Source identification (content selection)
Text generation (planning/realization)
4Content Selection What Content?
- Are questions to be supported by the text?
- Yes
- Select content for question source
- Select (Verify) content for answer source
5Content Selection What Content?
- Are questions to be supported by the text?
- Yes
- Select content for question source
- Select (Verify) content for answer source
- Should all supported questions be generated?
- Yes
- Maximally generate
- Filter later, based on application
6Content Selection Example
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Question Why will increased carbon dioxide
emissions cause the Earth to heat up?
Answer Because carbon dioxide prevents heat from
escaping into space
7Content Selection Example
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Question Why will increased carbon dioxide
emissions cause the Earth to heat up?
Answer Because carbon dioxide prevents heat from
escaping into space
8Content Selection Example
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Question Why, will increased carbon dioxide
emissions cause the Earth to heat up?
Answer Because carbon dioxide prevents heat from
escaping into space
9Content Selection Task Definition
- Assumption Source of why-questions and their
- answers are events and situations in the text
by - related by causal relations discourse
relations
- Task
- Identify causal discourse relations and their
two - arguments (related events/situations)
- Identify question and answer source
- Which argument is the question source?
- Which argument is the answer source?
10Corpus-based ApproachPenn Discourse Treebank
(PDTB)http//www.seas.upenn.edu/pdtb, LDC
- PDTB annotated over Wall Street Journal texts
(1M words, 2384 texts) - Discourse relations and their arguments
(explicit/implicit) SPAN - Relation semantics (cause, temporal, comparison,
expansion) FEATURE - Attributions of relations/arguments (e.g.,
source) SPAN/FEATURE - 40 tokens (20K explicit relations, 20K
implicit relations)
- PDTB Findings
- Arguments of relations not necessarily local
- Expressions denoting relations are ambiguous
- Attributions that need to be factored out have
complex syntax
- Why-QG content selection not a trivial task
- Lexically-grounded approach of PDTB makes it a
valuable resource - for content-selection.
11PDTB Example Explicit Causal Relation
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
12PDTB Example Explicit Causal Relation
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Explicit relation (discourse connective)
13PDTB Example Explicit Causal Relation
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Semantics Causereason
Explicit relation (discourse connective)
14PDTB Example Explicit Causal Relation
Arg1
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Semantics Causereason
Explicit relation (discourse connective)
15PDTB Example Explicit Causal Relation
Arg1
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Semantics Causereason
Arg2
Explicit relation (discourse connective)
16PDTB Example Explicit Causal Relation
Arg1
Attribution (relation)
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Semantics Causereason
Arg2
Explicit relation (discourse connective)
17PDTB Example Implicit Causal Relation
Implicit relation (Implicit connective)
To compare temperatures over the past 10,000
years, researchers analyzed the changes in
concentrations of two forms of oxygen.
IMPLICITbecause These measurements can indicate
temperature changes, researchers said, because
the rates of evaporation of these oxygen atoms
differ as temperatures change.
- Implicit relations between adjacent sentences
18PDTB Example Implicit Causal Relation
Semantics Causereason
Implicit relation (Implicit connective)
To compare temperatures over the past 10,000
years, researchers analyzed the changes in
concentrations of two forms of oxygen.
IMPLICITbecause These measurements can indicate
temperature changes, researchers said, because
the rates of evaporation of these oxygen atoms
differ as temperatures change.
19PDTB Example Implicit Causal Relation
Semantics Causereason
Implicit relation (Implicit connective)
Arg1
To compare temperatures over the past 10,000
years, researchers analyzed the changes in
concentrations of two forms of oxygen.
IMPLICITbecause These measurements can indicate
temperature changes, researchers said, because
the rates of evaporation of these oxygen atoms
differ as temperatures change.
20PDTB Example Implicit Causal Relation
Semantics Causereason
Implicit relation (Implicit connective)
Arg1
To compare temperatures over the past 10,000
years, researchers analyzed the changes in
concentrations of two forms of oxygen.
IMPLICITbecause These measurements can indicate
temperature changes, researchers said, because
the rates of evaporation of these oxygen atoms
differ as temperatures change.
Arg2
21PDTB Example Implicit Causal Relation
Semantics Causereason
Implicit relation (Implicit connective)
Arg1
To compare temperatures over the past 10,000
years, researchers analyzed the changes in
concentrations of two forms of oxygen.
IMPLICITbecause These measurements can indicate
temperature changes, researchers said, because
the rates of evaporation of these oxygen atoms
differ as temperatures change.
Arg2
Attribution (over Arg2)
22Evaluation with PDTB
- Metric Recall and Precision (IR-type)
- For the task of identifying causal discourse
relations and their arguments, - How many of the PDTB causal relations were
identified? (recall) - How many of the identified relations were the
PDTB causal relations? (precision) - Criteria for determining a match would have to be
defined. - E.g., extent of argument spans?
23PDTB Coverage
Internal evaluation Does the PDTB exhaustively
annotate all causal discourse relations allowed
by guidelines?
- PDTB does not annotate (practical reasons)
- cross-paragraph implicit relations
- intra-sentential implicit relations
- For why-QG, these extensions are necessary
- PDTB annotations extended for 3 texts
- Extensions of larger subset of corpus is planned
24PDTB Coverage
- External evaluation Independent data source
(Verberne et al., 2007). - 177 why-QA pairs over 3 extended PDTB texts.
- Human elicitation experiment
- Questions mapped to source text
- Answers mapped to source text
- How many why-QA pairs can be associated with a
PDTB - causal relation?
- Look at source text of why-QA pairs.
25PDTB Coverage
- External Evaluation
- Exclusions from 177 why-QA pairs
- Unsupported questions 30
- Duplicate questions 109
- Evaluated QA pairs 38 (177-139)
No. why-QA pairs (Total 38) PDTB Relation
27 (71) causal
1 specification
10 none
26PDTB Coverage
- External evaluation
- Unmatched why-QA pairs (10/38)
- QA source in distant clauses/sentences 16
(6/38) - Complex inferences over distant arguments
- Explore chains of relations (reasoning)?
- QA source in single clause 11 (4/38)
- Causal relation anchored by causal
verb/preposition - Use clause-level semantic annotations (Propbank)
27Summary
- A discourse-based task definition and a
corpus-based evaluation of why-QG seems well
motivated. - Lexically-grounded annotations of discourse
relations (including causal relations) are
provided by PDTB. - PDTB causal relations provides 71 coverage of
existing causal relations. - Data can be easily incremented for at least an
additional 11 by also using Propbank.
28Thank You!
29Distant Relation
- Whittle Communications L.P., . . ., said it has
signed 500 schools in 24 states to subscribe to
the controversial Channel One news program . . . - . . . (8 sentences) . . .
- Subscribing schools get the 12-minute daily
Channel One news program, whose four 30- second
TV ads during each show have drawn protests from
educators.
Question Why is Channel One controversial? Answer
Four 30-second TV ads during each show have
drawn protests from educators.
30Clause-internal Relation
According to greenhouse theories, increased
carbon dioxide emissions, largely caused by
burning of fossil fuels, will cause the Earth to
warm up because carbon dioxide prevents heat
from escaping into space.
Question Why are carbon dioxide emissions
increasing? Answer Because fossil fuels are
being burned.