Title: Contextbased term reweighting
1Context-basedterm (re)weighting
30th August 2006 Riva del Garda (Italy) ECAI-06
- An experiment on Single-Word
- Question Answering
Supported by
Authors Marco Ernandes, Giovanni Angelini,
Marco Gori, Leonardo Rigutini, Franco Scarselli
2Abstract
Term weighting is a crucial task in many
Information Retrieval applications. Common
approaches are based either on statistical or on
natural language analysis. Here, we present a new
algorithm that capitalizes from the advantages of
both the strategies. In the proposed method, the
weights are computed by a parametric function,
called Context Function, that models the
semantic influence exercised amongst the terms.
The Context Function is learned by examples, so
that its implementation is mostly automatic. The
algorithm was successfully tested on a data set
of crossword clues, which represent a case of
Single-Word Question Answering.
3The Idea
- The semantics and the relevance of a word depend
on its context (e.g. the terms in the same
sentence, the title, etc.). - A text document can be represented by a social
network, and the relevance of a word can be
computed on the basis of its neighbours. - In this network, the influence exercised by a
word on another one is not always the same, but
depends on statistical, morphological and
syntactical properties of the words. Given these
features we can learn the words reciprocal
influences by examples.
The proposed model represents a generalization
of TextRank (Mihalcea and Tarau, EMNLP 04)
4Theoretical Model1) recursive term (re)weighting
- The score of word w is the result of the weighted
sum of its default score dw and the sum of the
influences cw,u from each word u that appear in
the same context - This equation can be efficiently (as in Googles
PageRank) solved with the Jacobi Algorithm. In
fact, stacking all the scores - This dynamic system converges exponentially to
the desired solution provided that the norm of
the context influence is less than 1
5Theoretical Model2) computing the context
influence
- The influence of word u on w is computed by
combining the contribution of all the occurrences
of w and u (belonging to the context of w) -
- (we represent with with an occurrence of
word w) - defines the strength of the influence of an
occurrence of u on w and is computed by a
parametric function called Context Function - This function can be realized using an universal
approximator (e.g. ANNs), and it can use any type
of features (statistical, morphological,
syntactical, lexical, etc.) extracted from
and
6Theoretical Model3) the Context Function
- For simplicity, the Context Function has been
implemented as - xi is the value of the ith feature
- ?i (steepness) and ?i (medium value) are the
model parameters - ? is the logistic sigmoid function
- The whole function approximates a boolean
expression composed by soft AND operators - The features that were adopted for the Context
Functions are mainly statistical, e.g. term
frequency and idf of w and u, word distance
(separating words) between w and u, etc. - Along with all these parameters we additionally
estimate from examples the damping factor ?.
7Learning Context Functions
- For training the parameters of the Context
Function, resilient parameter adaptation has been
used
- For Question Answering, the most suited
evaluation measure is MRR (Mean Reciprocal Rank),
but MRR is not differentiable. - MRR has been approximated by a continuous
function, where replacing the discrete concept of
position is replaced by a soft_position
function that takes into account the score of
each candidate answer
8Experimental SetupThe Single Word QA problem
- Single Word Question Answering is a special case
of QA in which questions have to be answered with
a unique and exact word. - Crossword clues represent a challenging example
of Single Word QA. - Our dataset 525 crossword clues
- 165 named entity answers (NE-answers)
- 360 non named entity answers (nonNE nouns,
adj, verbs, ect.) - We measured the performances of three different
ranking techniques (two are based on the
crossword solving system, WebCrow) with and
without the introduction of the context-based
term weighting - TFIDF vs. TFIDF context
- WebCrow-S (statistical term weighting) vs.
WebCrow-Scontext - WebCrow-SM (statistical and morphological term
weighting) vs. WebCrow-SMcontext - Both for the NE and the nonNE cases, 40 of the
examples were used for training the context
functions and 60 for testing.
9Experimental Results
MRR performances
In all the experiments the introduction of the
context re-weighting improved the MRR (no new
information introduced!)
The impact of the context is more evident
observing the Success Rate. The SR(50) of
WebCrow-S goes from 66 to 80 by the context
re-weighting.
10Feasibility Further Works
- Training is time consuming. On the contrary, the
convergence of the recursive algorithm is fast
and suitable even for online term weighting (page
re-ranking, online clustering, keyword extraction
etc) - Other applications of the approach ?
- Direct applications Keyword Extraction, Snippet
Extraction. - Other problems document clustering and
categorization.
The score convergence is exponential. In around 4
or 5 iterations we already have a good
approximation.