The QuALiM Question Answering system - PowerPoint PPT Presentation

About This Presentation
Title:

The QuALiM Question Answering system

Description:

ENAMEX TYPE='PERSON' Verdi /ENAMEX would leave his job as vice-president ... Italy s business world was rocked by last Thursday that Mr.Verdi would leave his job as ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 42
Provided by: 37St
Category:

less

Transcript and Presenter's Notes

Title: The QuALiM Question Answering system


1
The QuALiM Question Answering system
  • Question Answering
  • by Searching Large Corpora
  • with Linguistic Methods

2
Talk Outline
  • What does a QA system do?
  • QuALiMs two answer strategies
  • Fallback mechanism
  • Rephrasing algorithm
  • TREC evaluation results
  • Post TREC evaluation results

3
Question Answering - Definition
  • Definition from Wikipedia
  • Question Answering (QA) is a type of information
    retrieval. Given a collection of documents (such
    as the World Wide Web) the system should be able
    to retrieve answers to questions posed in natural
    language. QA is regarded as requiring more
    complex natural language processing (NLP)
    techniques than other types of information
    retrieval such as document retrieval, and it is
    sometimes regarded as the next step beyond search
    engines.

4
Question Answering - Example
Start is MITs QA system http//start.csail.mit.e
du/
5
Question Answering - Example
Start is MITs QA system http//start.csail.mit.e
du/
6
Question Answering - Example
Betterhoweverwould be Albert Einstein was
born on March 14th, 1879. The system should
actually return a complete English sentence
expressing the desired fact.
Start is MITs QA system http//start.csail.mit.e
du/
7
The Fallback Mechanism(exemplary for common
answer finding techniques)
8
Fallback Mechanism
  • The fallback mechanism creates queries based upon
  • keywords and key phrases from the question. Three
  • queries are send to Google
  • The first query contains all non-stop words from
    the question
  • The second contains all NPs from the question
    (that contain at least one non-stop word)
  • The third query contains all NPs and all non-stop
    words that do not occur in the NPs.

9
Fallback Mechanism
  • So "When was Jim Inhofe first elected to the
    senate? becomes
  • Jim Inhofe senate first elected
  • Jim Inhofe the senate
  • Jim Inhofe the senate first elected
  • Note The results from the last query are
    weighted twice as high as the results form the
    first two queries.

10
Fallback Mechanism
The result from the queries when placed in a
Weighted Sequence Bag
  • 72.0 "senator"
  • 42.0 "senator jim inhofe" "senator jim"
  • 41.25 "r" (abbreviation for republican)
  • 32.25 "oklahoma"
  • 30.0 "r-okla" (abbreviation for
    republican-oklahoma)
  • 26.25 "1994"
  • 25.0 "the leading conservative voices"
  • "of the leading conservative voices
  • "leading conservative voices"
  • 24.0 "us senator"
  • 23.25 "republican"
  • 21.0 "okla" (abbreviation for oklahoma)

11
Fallback Mechanism
But we know that we are looking for a date, so
the answer is 1994
  • 72.0 "senator"
  • 42.0 "senator jim inhofe" "senator jim"
  • 41.25 "r" (abbreviation for republican)
  • 32.25 "oklahoma"
  • 30.0 "r-okla" (abbreviation for
    republican-oklahoma)
  • 26.25 1994
  • 25.0 "the leading conservative voices"
  • "of the leading conservative voices
  • "leading conservative voices"
  • 24.0 "us senator"
  • 23.25 "republican"
  • 21.0 "okla" (abbreviation for oklahoma)

12
Definition Questions
  • Query "Florence Nightingale
  • 20.0 "may 12, 1820"
  • 16.0 "may 12" "nursing"
  • 15.0 "august 13, 1910"
  • 14.0 "1820-1910
  • 13.0 "born"
  • 12.0 "august 13" "museum"
  • 11.0 "history"
  • 10.0 "modern nursing" "lady with the lamp"
    "florence nightingale museum" "the lady with the
    lamp"
  • 9.0 "italy"
  • 8.0 "of modern nursing" "nurses" "london"
  • 7.5 "on may 12, 1820"
  • 7.0 "2 lambeth palace road london"

13
Definition Questions
  • 20.0 "may 12, 1820"
  • 16.0 "may 12" "nursing"
  • 15.0 "august 13, 1910"
  • 14.0 "1820 1910
  • 13.0 "born"
  • 12.0 "august 13" "museum"
  • 11.0 "history"
  • 10.0 "modern nursing" "lady with the lamp"
    "florence nightingale museum" "the lady with the
    lamp"
  • 9.0 "italy"
  • 8.0 "of modern nursing" "nurses" "london"
  • 7.5 "on may 12, 1820"
  • 7.0 "2 lambeth palace road london
  • Answer sentences in AQUAINT corpus
  • "on may 12, 1820, the founder of modern nursing,
    florence nightingale,
  • was born in florence, italy."
  • "on aug. 13, 1910, florence nightingale, the
    founder of modern nursing,
  • died in london.

14
The Rephrasing Algorithm
15
Pattern Layout
  • ltpattern name"WhendidNPVerbNPorPP"
    level"5"gt
  • ltsequencegt
  • ltword id"1"gtWhenlt/wordgt
  • ltword id"2"gtdidlt/wordgt
  • ltparse id"3"gtNPlt/parsegt
  • ltmorph id"4"gtV INFlt/morphgt
  • ltparse id"5"gtNPPPlt/parsegt
  • ltfinalgt?lt/finalgt
  • lt/sequencegt
  • lttarget name"target1"gt
  • ltrefgt3lt/refgt
  • ltref morph"V PAST"gt4lt/refgt
  • ltrefgt5lt/refgt
  • ltwordgtinlt/wordgt
  • ltanswergtNPlt/answergt
  • lt/targetgt
  • lttarget name"target2"gt
  • ltwordgtinlt/wordgt

Sequences are matched against questions.
Targets describe (flat) syntactic structures of
potential answer sentences.
AnswerTypes place restrictions on the expected
answer type.
16
Sequences
  • This sequence matches all questions
  • beginning with When
  • followed by did
  • followed by an NP
  • followed by a verb in its infinitive form
  • followed by an NP or a PP
  • followed by a question mark (which
    has to be the last element in the question)
  • ltsequencegt
  • ltword id"1"gtWhenlt/wordgt
  • ltword id"2"gtdidlt/wordgt
  • ltparse id"3"gtNPlt/parsegt
  • ltmorph id"4"gtV INFlt/morphgt
  • ltparse id"5"gtNPPPlt/parsegt
  • ltfinalgt?lt/finalgt
  • lt/sequencegt

question start word When
word did phrase NP
POS V INF phrase
NP or PP punctuation ? question end
17
Sequences
  • In the TREC 2005 question set this particular
    sequence matched 5 questions
  • When did Floyd Patterson win the title?
  • When did Amtrak begin operations?
  • When did Jack Welch become chairman of
    General Electric?
  • When did Jack Welch retire from GE?
  • When did the Khmer Rouge come into power?
  • ltsequencegt
  • ltword id"1"gtWhenlt/wordgt
  • ltword id"2"gtdidlt/wordgt
  • ltparse id"3"gtNPlt/parsegt
  • ltmorph id"4"gtV INFlt/morphgt
  • ltparse id"5"gtNPPPlt/parsegt
  • ltfinalgt?lt/finalgt
  • lt/sequencegt

question start word When
word did phrase NP
POS V INF phrase
NP or PP punctuation ? question end
18
Targets
  • If a question matched a sequence, the targets are
    used to propose templates for potential answer
    sentences.
  • For the question When did Amtrak begin
    operations, these would be
  • Amtrak began operations in ANSWERNP
  • In ANSWERNP (,) Amtrak began operations
  • lttarget name"target1"gt
  • ltrefgt3lt/refgt
  • ltref morph"V PAST"gt4lt/refgt
  • ltrefgt5lt/refgt
  • ltwordgtinlt/wordgt
  • ltanswergtNPlt/answergt
  • lt/targetgt
  • lttarget name"target2"gt
  • ltwordgtinlt/wordgt
  • ltanswergtNPlt/answergt
  • ltpunctuation optional"true"gt,
  • lt/punctuationgt
  • ltrefgt3lt/refgt
  • ltref morph"V PAST"gt4lt/refgt
  • ltrefgt5lt/refgt
  • lt/targetgt

19
Targets
answer sentence start Amtrak began
operations in answer (NP) answer sentence
end answer sentence start In answer
(NP) (,) Amtrak began
operations answer sentence end
  • lttarget name"target1"gt
  • ltrefgt3lt/refgt
  • ltref morph"V PAST"gt4lt/refgt
  • ltrefgt5lt/refgt
  • ltwordgtinlt/wordgt
  • ltanswergtNPlt/answergt
  • lt/targetgt
  • lttarget name"target2"gt
  • ltwordgtinlt/wordgt
  • ltanswergtNPlt/answergt
  • ltpunctuation optional"true"gt,
  • lt/punctuationgt
  • ltrefgt3lt/refgt
  • ltref morph"V PAST"gt4lt/refgt
  • ltrefgt5lt/refgt
  • lt/targetgt

20
Targets
  • The information from the targets can be used to
    create Google queries
  • Amtrak began operations in
  • In Amtrak began operations
  • lttarget name"target1"gt
  • ltrefgt3lt/refgt
  • ltref morph"V PAST"gt4lt/refgt
  • ltrefgt5lt/refgt
  • ltwordgtinlt/wordgt
  • ltanswergtNPlt/answergt
  • lt/targetgt
  • lttarget name"target2"gt
  • ltwordgtinlt/wordgt
  • ltanswergtNPlt/answergt
  • ltpunctuation optional"true"gt,
  • lt/punctuationgt
  • ltrefgt3lt/refgt
  • ltref morph"V PAST"gt4lt/refgt
  • ltrefgt5lt/refgt
  • lt/targetgt

21
Snippet Retrieval
  • For the first query Amtrak began operations in
    the first five sentences Google returns are
  • Since Amtrak began operations in 1971, federal
    outlays for intercity rail passenger service have
    been about \18 billion.
  • Amtrak began operations in 1971.
  • Amtrak of the obligation to operate the basic
    system of routes that was largely inherited from
    the private railroads when Amtrak began
    operations in 1971.
  • Amtrak began operations in 1971, as authorized
    by the Rail Passenger Service Act of 1970.'
  • A comprehensive history of intercity passenger
    service in Indiana, from the mid-19th century
    through May 1, 1971, when Amtrak began operations
    in the state.

22
Answer Extraction
  • The sentences are parsed and tagged, and by
    matching then to the targets once more the exact
    position of the potential answer can be located
  • Since Amtrak began operations in 1971, federal
    outlays for intercity rail passenger service have
    been about \18 billion.
  • Amtrak began operations in 1971.
  • Amtrak of the obligation to operate the basic
    system of routes that was largely inherited from
    the private railroads when Amtrak began
    operations in 1971.
  • Amtrak began operations in 1971, as authorized
    by the Rail Passenger Service Act of 1970.'
  • A comprehensive history of intercity passenger
    service in Indiana, from the mid-19th century
    through May 1, 1971, when Amtrak began operations
    in the state.

23
QuALiM Type Checking
  • The answerType element in the pattern tells us
    that we are looking for a date.
  • Wed like to have
  • a complete date in standard form, e.g. May 1st,
    1971
  • some form of a date, e.g. 5/1/1971
  • If we cannot have that, a year specification will
    also do. (E.g. 1971)
  • ltanswerType phrases"NPPP"gt
  • ltbuilt-in weight"2"gt
  • dateComplete
  • lt/built-ingt
  • ltnamedEntity weight"4"gt
  • date
  • lt/namedEntitygt
  • ltbuilt-in weight"3"gt
  • yearin_year
  • lt/built-ingt
  • ltother ignore"true"/gt
  • lt/answerTypegt

24
QuALiM Type Checking
  • An answerType may contain the following elements
  • NamedEntity
  • WordNetCategory
  • Built-in (date, year, percentage ect.)
  • Measure (15 meters, 100 mph)
  • List (e.g. a list of movies)
  • WebHypernym
  • other
  • ltanswerType phrases"NPPP"gt
  • ltbuilt-in weight"2"gt
  • dateComplete
  • lt/built-ingt
  • ltnamedEntity weight"4"gt
  • date
  • lt/namedEntitygt
  • ltbuilt-in weight"3"gt
  • yearin_year
  • lt/built-ingt
  • ltother ignore"true"/gt
  • lt/answerTypegt

25
Excursus WordNet
26
Excursus WordNet
27
Excursus WordNet
28
Excursus Named Entity Recognition
  • The task identify atomic elements of information
    in text
  • person names
  • company/organization names
  • locations
  • datestimes
  • percentages
  • monetary amounts

29
Excursus Named Entity Recognition
  • Task of a NE System
  • Delimit the named entities in a text and tag them
    with NE categores

ltENAMEX TYPELOCATIONgtItalylt/ENAMEXgts business
world was rocked by the announcement ltTIMEX
TYPEDATEgtlast Thursdaylt/TIMEXgt that
Mr. ltENAMEX TYPEPERSONgtVerdilt/ENAMEXgt would
leave his job as vice-president of ltENAMEX
TYPEORGANIZATIONgtMusic Masters of Milan,
Inclt/ENAMEXgt to become operations director of
ltENAMEX TYPEORGANIZATIONgtArthur
Andersenlt/ENAMEXgt.
  • Milan is part of organization name
  • Arthur Andersen is a company
  • Italy is sentence-initial gt capitalization
    useless

30
Excursus Named Entity Recognition
  • Task of a NE System
  • Delimit the named entities in a text and tag them
    with NE categores

Italys business world was rocked by last
Thursday that Mr.Verdi would leave his job as
vice-president of Music Masters of Milan, Inc to
become operations director of Arthur Andersen.
  • Milan is part of organization name
  • Arthur Andersen is a company
  • Italy is sentence-initial gt capitalization
    useless

31
Excursus Named Entity Recognition
How does it work?
  • Basically quite simple
  • The system accesses huge lists of
  • First names
  • Last names
  • Cities
  • Countries
  • ...
  • And knows about special words/abbreviations like
  • Mr., Dr., Prof., Inc., Blvd. etc.
  • It knows the names of weekdays or months etc.

32
Excursus Named Entity Recognition
  • Some system use hand-written context-sensitive
    reduction rules
  • title capitalized word gt title
    person_namecompare Mr. Jones vs. Mr.
    Ten-Percentgt no rule without exceptions
  • 2) person_name, the adj CEO of
    organizationFred Smith, the young dynamic CEO
    of BlubbCogt ability to grasp non-local
    patterns
  • plus help from databases of known named entities

33
QuALiM Type Checking
  • An answerType may contain the following elements
  • NamedEntity
  • WordNetCategory
  • Built-in (date, year, percentage ect.)
  • Measure (15 meters, 100 mph)
  • List (e.g. a list of movies)
  • WebHypernym
  • other
  • ltanswerType phrases"NPPP"gt
  • ltbuilt-in weight"2"gt
  • dateComplete
  • lt/built-ingt
  • ltnamedEntity weight"4"gt
  • date
  • lt/namedEntitygt
  • ltbuilt-in weight"3"gt
  • yearin_year
  • lt/built-ingt
  • ltother ignore"true"/gt
  • lt/answerTypegt

34
QuALiM Type Checking
  • When the answers are checked on their correct
    semantic type the first four sentences pass the
    test, the last one is ruled out
  • Since Amtrak began operations in 1971, federal
    outlays for intercity rail passenger service have
    been about \18 billion.
  • Amtrak began operations in 1971.
  • Amtrak of the obligation to operate the basic
    system of routes that was largely inherited from
    the private railroads when Amtrak began
    operations in 1971.
  • Amtrak began operations in 1971, as authorized
    by the Rail Passenger Service Act of 1970.'
  • A comprehensive history of intercity passenger
    service in Indiana, from the mid-19th century
    through May 1, 1971, when Amtrak began operations
    in the state.

35
TREC 2004 Results and Post-TREC Evaluation
36
TREC Results factoid questions
37
TREC Results combined score
38
Post TREC Evaluation
  • Purpose What is the performance and behavior of
    the different algorithms implemented?
  • Performed with resolved questions.
  • (When was Franz Kafka born? instead of When
    was he born?)
  • No document localization, thus
  • no NIL answers returned
  • no unsupported judgments

39
Post TREC Evaluation
40
(No Transcript)
41
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com