Title: Rutgers Components Phase 2
1Rutgers Components Phase 2
- Principal investigators
- Paul Kantor, PI Design, modelling and analysis
- Kwong Bor Ng, Co-PI - Fusion Experimental design
- Nina Wacholder, Co-PI linguistic foundations for
modelling
2Key Components
- Adaptive personalization to analyst, task and
context - Improve effectiveness of information access for
question answering -- data fusion of IR methods - Improve effectiveness of characterizing document
qualities, tuned to specific analysts
persepctives
3Model Personalization (1) Robust Information
Access Data Fusion
- For a persistent query, improve frame and answer
generation through Data Fusion (local fusion with
person, task, topic feedback) and Interactive
Relevance Feedback. - In stage 1, we have demonstrated effective data
fusion into HITIQA to optimize the rate of
useful paragraph extraction. In stage 2, the
emphasis will be on exploiting user judgments
over time to adjust fusion parameters
chronologically, with a time-sensitive weighting
scheme, to fit the evolving perspective of the
analyst on the task, topic an context.
4Model Personalization (2) Document Quality
Aspects
- Personalization of the automatic document quality
aspect assessment algorithm, through advanced
statistical analysis and machine learning, to
identify (1) global quality aspect predictors,
(2) a general formal model of quality aspect
assessment, and (3) personal parameters settings
for individual preference. - At stage1, we have established a effective models
for estimation of some document qualities, based
on textual features and linguistic patterns in a
document. While global models do better than
chance, for high acuracy models must be
personalized. In stage 2, we will expand
identification of good predictive variables for
quality aspects, with emphasis on a local level
to encapsulate the personal mental model of an
analyst.
5Model Personalization (3) Integration through
Experiment
- We will integrate the personalization and other
mechanisms into a single interface, by
converting related functionalities into position
and iconic information in the user display. - At stage 2, focusing on the analyst with a
persistent query, we will investigate the impacts
of interface options on analyst satisfaction and
task effectiveness, to identify the best
combination strategy, and to establish
effectiveness measures on a personal level.
6Sophisticated Statistical Techniques
- Sophisticated statistical methods (Design of
experiment, ANOVA, multiple comparisons by
Scheffe and Tukeys method, and orthogonal
arrays) will reduce the number of experimental
configurations to be studied. - Instead of a case-by-case attention to failure
analysis the design will focus on how to
neutralize negative effects to obtain more
accurate evaluations and design selection with
fewer experiments
7Language Features for Quality Aspects.
- Expand a scheme, now being developed, for
characterizing aspects or facets of topics.
These will be different for e.g. WMD or
Biography. Aspects are signalled by the presence
of adjective classes. These classes are being
defined now, and will be expanded in the proposed
work.
8Using Language Features
- With a more refined model of the relation of
adjectives to aspects, the system will be better
able to understand classes that the analyst
defines, and to flag further occurences in an
incoming message stream.
9A note on retrieval fusion
- Retrieval fusion will be made interactive with a
small Java display, now under development, that
tracks the contribution of each retrieval scheme
to providing useful information. An interactive
feature permits the analyst to highlight a region
in the fusion space for further investigation.
10Mock-up Fusion Interface
not relevant
relevant
System 1
1. HITIQAs Initial retrieval uses both systems.
The occupied region here represents the LOGICAL
OR rule. Each document is represented by a small
circle. As a passage is marked relevant by the
users, the document it came from is flagged (here
shown in yellow).
2. The analyst perceives that many of the useful
passages came from documents that are clustered
near the inner corner, and using the interface
tool, draws an extended retrieval region (shown
here by the dotted orange box) which HITIQA now
explores.
System 2
2.5 inches