Title: Implementation of a QA system in a real context
1Implementation of a QA system in a real context
- Carlos Amaral (Priberam, Portugal)
- Dominique Laurent (Synapse Développement, France)
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
2- 1. The Question-Answering system
- What is a QA System ?
- System that enables the extraction of an answer
(or several) to a request (a question) based on a
corpus - The problematic of  the type of the questionÂ
- An answer or several, possibly a list from one or
several documents, an answer of the type Yes/No, - On a corpus in one or several languages
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
31.1. QA and Language Processing
- A QA system appears to be a LP Â par excellenceÂ
- However, certain systems are uniquely based on
pattern matching (cf Soubotine Soubotine, TREC
2003), - These systems seems to have reached their limits
- And, if they can process all what is factual, the
complex questions/queries are far beyond their
possibility. - The best systems validated at TREC and CLEF are
based on Automated Language Processing.
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
4- 1.2. OUR QA SYSTEM
- First developed (1999 - 2001) within a French
innovation project (Anvar) - Then (end 2001- end 2003) within the European
project TRUST (FP5) - Currently, (2005/06) within the European project
M-CAST (FP6) - Main features targets B2B and B2C,
multilingual, NLP based and intensive.
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
5A modular conception
French Language Module
Italian Language Module
Portuguese Language Module
Polish Language Module
English Language Module
Czech Language Module
Indexation engine
Extraction of text engine
Documents
Visualization of Results
Index
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
6Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
7- 1.3. Evaluations of the QA system
- Professional benchmarking contests and campaigns
such as EQueR (2004) and CLEF (2005 2006), - Evaluations for the French, English, Portuguese
and Spanish language modules, in monolingual and
multilingual.
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
8CLEF 2005
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
9CLEF 2006
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
10- In CLEF 2005 and CLEF 2006, the best engines for
monolingual were our systems for Portuguese and
French. And the best systems for multilingual
were our systems for English-French,
Portuguese-French, Spanish-Portuguese,
Portuguese-Spanish. - Synapse Développement and Priberam are now
partners of the project Quaero.
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
11- 2. Implementation in M-CAST Project
- Tests carried-out on books in the National Czech
library and the Torun library in Poland, - Processing several millions of digitized
documents, - Manages meta-data and UDC classification,
- Accommodates questions and answers in English,
French, Italian, Portuguese, Polish, Czech - Implemented on both library portals
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
122.1. Adaptation to Digital Libraries Resources
- Scanned texts poor quality
- gt Spell checker to improve the quality of
documents. - One book, lots of pages
- gt Management of multi-part documents during
semantic analysis
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
132.2. Integration of Dublin Core documents
attributes
- Storage of Dublin Core attributes as Metadata
- QA Who is the author of Hamlet ?
- Adaptation of the system to search in metadata
- Use of those metadata as filters
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
142.3. Universal Decimal Classification
- Storage of UDC codes for each document
- Search through UDC codes
- Filtering through UDC codes
- Semantic disambigation through UDC codes
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent
15Technical architecture
16 END of Presentation I would appreciate your
questions ! Thank you - Merci !
Workshop TellMeMore, November 24, 2006, C.Amaral,
D.Laurent