Title: The QuALiM Question Answering system
1The QuALiM Question Answering system
- Question Answering
- by Searching Large Corpora
- with Linguistic Methods
2Talk Outline
- What does a QA system do?
- QuALiMs two answer strategies
- Fallback mechanism
- Rephrasing algorithm
- TREC evaluation results
- Post TREC evaluation results
3Question Answering - Definition
- Definition from Wikipedia
- Question Answering (QA) is a type of information
retrieval. Given a collection of documents (such
as the World Wide Web) the system should be able
to retrieve answers to questions posed in natural
language. QA is regarded as requiring more
complex natural language processing (NLP)
techniques than other types of information
retrieval such as document retrieval, and it is
sometimes regarded as the next step beyond search
engines.
4Question Answering - Example
Start is MITs QA system http//start.csail.mit.e
du/
5Question Answering - Example
Start is MITs QA system http//start.csail.mit.e
du/
6Question Answering - Example
Betterhoweverwould be Albert Einstein was
born on March 14th, 1879. The system should
actually return a complete English sentence
expressing the desired fact.
Start is MITs QA system http//start.csail.mit.e
du/
7The Fallback Mechanism(exemplary for common
answer finding techniques)
8Fallback Mechanism
- The fallback mechanism creates queries based upon
- keywords and key phrases from the question. Three
- queries are send to Google
- The first query contains all non-stop words from
the question - The second contains all NPs from the question
(that contain at least one non-stop word) - The third query contains all NPs and all non-stop
words that do not occur in the NPs.
9Fallback Mechanism
- So "When was Jim Inhofe first elected to the
senate? becomes - Jim Inhofe senate first elected
- Jim Inhofe the senate
- Jim Inhofe the senate first elected
- Note The results from the last query are
weighted twice as high as the results form the
first two queries.
10Fallback Mechanism
The result from the queries when placed in a
Weighted Sequence Bag
- 72.0 "senator"
- 42.0 "senator jim inhofe" "senator jim"
- 41.25 "r" (abbreviation for republican)
- 32.25 "oklahoma"
- 30.0 "r-okla" (abbreviation for
republican-oklahoma) - 26.25 "1994"
- 25.0 "the leading conservative voices"
- "of the leading conservative voices
- "leading conservative voices"
- 24.0 "us senator"
- 23.25 "republican"
- 21.0 "okla" (abbreviation for oklahoma)
11Fallback Mechanism
But we know that we are looking for a date, so
the answer is 1994
- 72.0 "senator"
- 42.0 "senator jim inhofe" "senator jim"
- 41.25 "r" (abbreviation for republican)
- 32.25 "oklahoma"
- 30.0 "r-okla" (abbreviation for
republican-oklahoma) - 26.25 1994
- 25.0 "the leading conservative voices"
- "of the leading conservative voices
- "leading conservative voices"
- 24.0 "us senator"
- 23.25 "republican"
- 21.0 "okla" (abbreviation for oklahoma)
12Definition Questions
- Query "Florence Nightingale
- 20.0 "may 12, 1820"
- 16.0 "may 12" "nursing"
- 15.0 "august 13, 1910"
- 14.0 "1820-1910
- 13.0 "born"
- 12.0 "august 13" "museum"
- 11.0 "history"
- 10.0 "modern nursing" "lady with the lamp"
"florence nightingale museum" "the lady with the
lamp" - 9.0 "italy"
- 8.0 "of modern nursing" "nurses" "london"
- 7.5 "on may 12, 1820"
- 7.0 "2 lambeth palace road london"
13Definition Questions
- 20.0 "may 12, 1820"
- 16.0 "may 12" "nursing"
- 15.0 "august 13, 1910"
- 14.0 "1820 1910
- 13.0 "born"
- 12.0 "august 13" "museum"
- 11.0 "history"
- 10.0 "modern nursing" "lady with the lamp"
"florence nightingale museum" "the lady with the
lamp" - 9.0 "italy"
- 8.0 "of modern nursing" "nurses" "london"
- 7.5 "on may 12, 1820"
- 7.0 "2 lambeth palace road london
- Answer sentences in AQUAINT corpus
- "on may 12, 1820, the founder of modern nursing,
florence nightingale, - was born in florence, italy."
- "on aug. 13, 1910, florence nightingale, the
founder of modern nursing, - died in london.
14The Rephrasing Algorithm
15Pattern Layout
- ltpattern name"WhendidNPVerbNPorPP"
level"5"gt - ltsequencegt
- ltword id"1"gtWhenlt/wordgt
- ltword id"2"gtdidlt/wordgt
- ltparse id"3"gtNPlt/parsegt
- ltmorph id"4"gtV INFlt/morphgt
- ltparse id"5"gtNPPPlt/parsegt
- ltfinalgt?lt/finalgt
- lt/sequencegt
-
- lttarget name"target1"gt
- ltrefgt3lt/refgt
- ltref morph"V PAST"gt4lt/refgt
- ltrefgt5lt/refgt
- ltwordgtinlt/wordgt
- ltanswergtNPlt/answergt
- lt/targetgt
- lttarget name"target2"gt
- ltwordgtinlt/wordgt
Sequences are matched against questions.
Targets describe (flat) syntactic structures of
potential answer sentences.
AnswerTypes place restrictions on the expected
answer type.
16Sequences
- This sequence matches all questions
- beginning with When
- followed by did
- followed by an NP
- followed by a verb in its infinitive form
- followed by an NP or a PP
- followed by a question mark (which
has to be the last element in the question)
- ltsequencegt
- ltword id"1"gtWhenlt/wordgt
- ltword id"2"gtdidlt/wordgt
- ltparse id"3"gtNPlt/parsegt
- ltmorph id"4"gtV INFlt/morphgt
- ltparse id"5"gtNPPPlt/parsegt
- ltfinalgt?lt/finalgt
- lt/sequencegt
question start word When
word did phrase NP
POS V INF phrase
NP or PP punctuation ? question end
17Sequences
- In the TREC 2005 question set this particular
sequence matched 5 questions - When did Floyd Patterson win the title?
- When did Amtrak begin operations?
- When did Jack Welch become chairman of
General Electric? - When did Jack Welch retire from GE?
- When did the Khmer Rouge come into power?
- ltsequencegt
- ltword id"1"gtWhenlt/wordgt
- ltword id"2"gtdidlt/wordgt
- ltparse id"3"gtNPlt/parsegt
- ltmorph id"4"gtV INFlt/morphgt
- ltparse id"5"gtNPPPlt/parsegt
- ltfinalgt?lt/finalgt
- lt/sequencegt
-
-
question start word When
word did phrase NP
POS V INF phrase
NP or PP punctuation ? question end
18Targets
- If a question matched a sequence, the targets are
used to propose templates for potential answer
sentences. - For the question When did Amtrak begin
operations, these would be - Amtrak began operations in ANSWERNP
- In ANSWERNP (,) Amtrak began operations
- lttarget name"target1"gt
- ltrefgt3lt/refgt
- ltref morph"V PAST"gt4lt/refgt
- ltrefgt5lt/refgt
- ltwordgtinlt/wordgt
- ltanswergtNPlt/answergt
- lt/targetgt
- lttarget name"target2"gt
- ltwordgtinlt/wordgt
- ltanswergtNPlt/answergt
- ltpunctuation optional"true"gt,
- lt/punctuationgt
- ltrefgt3lt/refgt
- ltref morph"V PAST"gt4lt/refgt
- ltrefgt5lt/refgt
- lt/targetgt
-
-
19Targets
answer sentence start Amtrak began
operations in answer (NP) answer sentence
end answer sentence start In answer
(NP) (,) Amtrak began
operations answer sentence end
- lttarget name"target1"gt
- ltrefgt3lt/refgt
- ltref morph"V PAST"gt4lt/refgt
- ltrefgt5lt/refgt
- ltwordgtinlt/wordgt
- ltanswergtNPlt/answergt
- lt/targetgt
- lttarget name"target2"gt
- ltwordgtinlt/wordgt
- ltanswergtNPlt/answergt
- ltpunctuation optional"true"gt,
- lt/punctuationgt
- ltrefgt3lt/refgt
- ltref morph"V PAST"gt4lt/refgt
- ltrefgt5lt/refgt
- lt/targetgt
-
-
20Targets
- The information from the targets can be used to
create Google queries - Amtrak began operations in
- In Amtrak began operations
- lttarget name"target1"gt
- ltrefgt3lt/refgt
- ltref morph"V PAST"gt4lt/refgt
- ltrefgt5lt/refgt
- ltwordgtinlt/wordgt
- ltanswergtNPlt/answergt
- lt/targetgt
- lttarget name"target2"gt
- ltwordgtinlt/wordgt
- ltanswergtNPlt/answergt
- ltpunctuation optional"true"gt,
- lt/punctuationgt
- ltrefgt3lt/refgt
- ltref morph"V PAST"gt4lt/refgt
- ltrefgt5lt/refgt
- lt/targetgt
-
-
21Snippet Retrieval
- For the first query Amtrak began operations in
the first five sentences Google returns are - Since Amtrak began operations in 1971, federal
outlays for intercity rail passenger service have
been about \18 billion. - Amtrak began operations in 1971.
- Amtrak of the obligation to operate the basic
system of routes that was largely inherited from
the private railroads when Amtrak began
operations in 1971. - Amtrak began operations in 1971, as authorized
by the Rail Passenger Service Act of 1970.' - A comprehensive history of intercity passenger
service in Indiana, from the mid-19th century
through May 1, 1971, when Amtrak began operations
in the state.
22Answer Extraction
- The sentences are parsed and tagged, and by
matching then to the targets once more the exact
position of the potential answer can be located - Since Amtrak began operations in 1971, federal
outlays for intercity rail passenger service have
been about \18 billion. - Amtrak began operations in 1971.
- Amtrak of the obligation to operate the basic
system of routes that was largely inherited from
the private railroads when Amtrak began
operations in 1971. - Amtrak began operations in 1971, as authorized
by the Rail Passenger Service Act of 1970.' - A comprehensive history of intercity passenger
service in Indiana, from the mid-19th century
through May 1, 1971, when Amtrak began operations
in the state.
23QuALiM Type Checking
- The answerType element in the pattern tells us
that we are looking for a date. - Wed like to have
- a complete date in standard form, e.g. May 1st,
1971 - some form of a date, e.g. 5/1/1971
- If we cannot have that, a year specification will
also do. (E.g. 1971)
- ltanswerType phrases"NPPP"gt
- ltbuilt-in weight"2"gt
- dateComplete
- lt/built-ingt
- ltnamedEntity weight"4"gt
- date
- lt/namedEntitygt
- ltbuilt-in weight"3"gt
- yearin_year
- lt/built-ingt
- ltother ignore"true"/gt
- lt/answerTypegt
24QuALiM Type Checking
- An answerType may contain the following elements
- NamedEntity
- WordNetCategory
- Built-in (date, year, percentage ect.)
- Measure (15 meters, 100 mph)
- List (e.g. a list of movies)
- WebHypernym
- other
- ltanswerType phrases"NPPP"gt
- ltbuilt-in weight"2"gt
- dateComplete
- lt/built-ingt
- ltnamedEntity weight"4"gt
- date
- lt/namedEntitygt
- ltbuilt-in weight"3"gt
- yearin_year
- lt/built-ingt
- ltother ignore"true"/gt
- lt/answerTypegt
25Excursus WordNet
26Excursus WordNet
27Excursus WordNet
28Excursus Named Entity Recognition
- The task identify atomic elements of information
in text - person names
- company/organization names
- locations
- datestimes
- percentages
- monetary amounts
29Excursus Named Entity Recognition
- Task of a NE System
- Delimit the named entities in a text and tag them
with NE categores
ltENAMEX TYPELOCATIONgtItalylt/ENAMEXgts business
world was rocked by the announcement ltTIMEX
TYPEDATEgtlast Thursdaylt/TIMEXgt that
Mr. ltENAMEX TYPEPERSONgtVerdilt/ENAMEXgt would
leave his job as vice-president of ltENAMEX
TYPEORGANIZATIONgtMusic Masters of Milan,
Inclt/ENAMEXgt to become operations director of
ltENAMEX TYPEORGANIZATIONgtArthur
Andersenlt/ENAMEXgt.
- Milan is part of organization name
- Arthur Andersen is a company
- Italy is sentence-initial gt capitalization
useless
30Excursus Named Entity Recognition
- Task of a NE System
- Delimit the named entities in a text and tag them
with NE categores
Italys business world was rocked by last
Thursday that Mr.Verdi would leave his job as
vice-president of Music Masters of Milan, Inc to
become operations director of Arthur Andersen.
- Milan is part of organization name
- Arthur Andersen is a company
- Italy is sentence-initial gt capitalization
useless
31Excursus Named Entity Recognition
How does it work?
- Basically quite simple
- The system accesses huge lists of
- First names
- Last names
- Cities
- Countries
- ...
- And knows about special words/abbreviations like
- Mr., Dr., Prof., Inc., Blvd. etc.
- It knows the names of weekdays or months etc.
32Excursus Named Entity Recognition
- Some system use hand-written context-sensitive
reduction rules - title capitalized word gt title
person_namecompare Mr. Jones vs. Mr.
Ten-Percentgt no rule without exceptions - 2) person_name, the adj CEO of
organizationFred Smith, the young dynamic CEO
of BlubbCogt ability to grasp non-local
patterns - plus help from databases of known named entities
33QuALiM Type Checking
- An answerType may contain the following elements
- NamedEntity
- WordNetCategory
- Built-in (date, year, percentage ect.)
- Measure (15 meters, 100 mph)
- List (e.g. a list of movies)
- WebHypernym
- other
- ltanswerType phrases"NPPP"gt
- ltbuilt-in weight"2"gt
- dateComplete
- lt/built-ingt
- ltnamedEntity weight"4"gt
- date
- lt/namedEntitygt
- ltbuilt-in weight"3"gt
- yearin_year
- lt/built-ingt
- ltother ignore"true"/gt
- lt/answerTypegt
34QuALiM Type Checking
- When the answers are checked on their correct
semantic type the first four sentences pass the
test, the last one is ruled out - Since Amtrak began operations in 1971, federal
outlays for intercity rail passenger service have
been about \18 billion. - Amtrak began operations in 1971.
- Amtrak of the obligation to operate the basic
system of routes that was largely inherited from
the private railroads when Amtrak began
operations in 1971. - Amtrak began operations in 1971, as authorized
by the Rail Passenger Service Act of 1970.' - A comprehensive history of intercity passenger
service in Indiana, from the mid-19th century
through May 1, 1971, when Amtrak began operations
in the state.
35TREC 2004 Results and Post-TREC Evaluation
36TREC Results factoid questions
37TREC Results combined score
38Post TREC Evaluation
- Purpose What is the performance and behavior of
the different algorithms implemented? - Performed with resolved questions.
- (When was Franz Kafka born? instead of When
was he born?) - No document localization, thus
- no NIL answers returned
- no unsupported judgments
39Post TREC Evaluation
40(No Transcript)
41(No Transcript)