Title: COLLATE
1Virtual Agents for a Bookstore an Empirical
Evaluation
P. Lops, V. Andersen, H.H.K. Andersen, F.
Abbattista and G. Semeraro () Dipartimento
di Informatica, Università di Bari, Italy ()
Risoe National Lasboratory, Denmark
2Overview
- Introduction
- E-commerce problems and solutions
- The COGITO project
- Empirical evaluation
- Conversation Log Analysis
- Visual Behaviour Analysis (eye-tracking)
- Questionnaire Analysis
- Some general considerations
3Introduction
- A Digital Library is not merely a collection of
electronic information. It is a distributed
technology environment that dramatically reduces
barriers to the creation, dissemination,
manipulation, storage, integration and reuse of
information by individuals and groups (Lesk). - Digital Library can play a relevant role in
several key areas of the e-era, such as
e-government, e-learning, e-publishing, and
e-commerce.
4DLs vs e-commerce the (obvious) differences
- DLs
- You need to be a paying member
- You have to stay on the same site
- Often dedicated to specific domains
- No specific stimulation needed
- E-commerce
- You may search freely until you choose to buy
- You may jump from one site to another
- Usually broader in domains
- Stimulation is very important
5DLs vs e-commerce the evaluation
- For DLs the evaluation is based on objective
measures (way of sorting, recall, precision of
documents compared with the requested topic) - For e-commerce the evaluation is highly based on
subjective feelings (satisfaction of the
individual customer related to user interface,
functionality, and overall performance). - The trust of correctness of information is much
more important for DLs than for e-commerce.
6Problems concerning E-Commerce
- Getting people started on the Web and making
their first purchase - Using traditional metaphors for shopping on web
sites - Users are forced to make their model of shopping
fit into a web structure with which they are not
familiar - Getting people to submit personal information
- Summarizing UNCERTAINTY
7The goal of the COGITO project
- COGITO aims at improving consumer-supplier
relationships in e-commerce through intelligent
personalized agents which can play the role of
virtual assistants for users.
HOW ?
- Using intelligent retrieval process integrated
with chatterbot technology - Extracting and exploiting User Models
8The COGITO application scenario
- Virtual shop of books, CDs, DVDs, gifts (BOL.de)
- User profiles the key for personal
recommendations - Improvement of search capabilities exploiting the
knowledge about users (query expansion) - Interaction by means of a chatterbot
9BOL web site
10Scenario 1 unknown user
- The user is not registered
- The profile is not available to the system
- The user requires a book written by King
11Scenario 1 unknown user
List of books belonging to several categories by
authors whose last name is King
12Scenario 2 registered user
- The user is already registered
- The profile is available to the system
- The user likes Science Technique
- The user dislikes Narrative
- The user requires a book written by King
13Scenario 2 registered user
List of books by authors whose last name is
King belonging to the book category
Naturwissenschaften
14Empirical evaluation the framework
- Used for evaluating the performance of the agent
based on the means-end hierarchy. - Also used during the phase of requirements
specification - Requirements are classified in three levels
- the strategic-level
- the procedural-level
- the operational-level
15Means-end hierarchy
General hierarchy
Condensed hierarchy
each level is specified by the next upper level
concerning the reason for an action, and by the
next lower level concerning how this action may
be supported
16Empirical evaluation example of requirements
- Strategic requirements
- Increase trust
- Increase customer loyalty
- Increase conversion rate
- Procedural requirements
- Improve naturalness and effort involved in user
giving information to the system - Support a natural dialogue users are happy to
take part in - Provide guided tours of the system
- Operational requirements
- Encounter few dead ends in a conversation
- Run without plug-ins
- No login procedure
17Evaluation of COGITO to check...
- whether the interface succeeds in getting the
user into a dialogue - whether users notice the tailored parts of the
dialogue - whether users notice and accept the chatterbot
- Performed by letting groups of test persons solve
various tasks related to searching
general/specific information using the agent on
the BOL site
18Empirical evaluation methods
- Partly based on quantitative measures
- Analysis of the conversation log
- Analysis of eye-tracking
- Partly based on qualitative measures
- Fulfilment of detailed questionnaires
- The COGITO proactive agent has been compared with
the BOL agent, a state-of-the-art agent with no
proactive features
19The two agents
BOL state-of-the-art agent (no proactive)
COGITO proactive agent
20Set-up of the experiment
Test person, moderator and the eye-tracking system
21Session introduction
- Give your honest opinion, we are not the
programmers and will therefore not be personally
offended - We are testing the system, you are not to be
tested - The agent is not perfect, if so, we didnt need
to test it - Please, think aloud indicate what you think and
intend to do
22Conversation log analysis
- Purpose
- Measure the conversation performance in terms
of number of - Correct text output
- Fallback sentences
- Proactive sentences
- Search results
- Average length of user queries
23Conversation log analysis measures
- Correct text output
- Manual analysis of successful elements of the
agent-user dialogue consisting of one user text
input string, e.g. a query, and one agent output
text string, e.g. delivering a correct answer - Fallback sentences
- Degree of the heterogeneousness of the
conversations. A large occurrence of fallback
sentences is an expression of poor conversation
performance - Proactive sentences
- A contextually meaningful response to user input
24Conversation log analysis measures
- Search results successful query
- If the agent on the basis of the Query Expansion
process prompts the BOL search engine and
produces a correct list of results in terms of
relevance for a given task - The queries listed in the conversation log have
been repeated and the results analysed wrt the
users tasks.
25Conversation log analysis results
26Conversation log analysis results
- Analysis of the average length of user queries
- Query to the Cogito system average length 5.05
words - Query to the Excite system average length 2.21
words - Query to traditional IR systems average length
7-15 words - Good performance of the Cogito system (see search
results) because the agent allows the users to
type their queries in a conversational manner
without the use of Boolean operators.
27Eye-tracking analysis
- Measurement of the respondents visual behaviour
during the evaluation session - The device is non-intrusive
- Video recording of the eye-movements together
with the graphic signal from the computer - Division of the screen into 5 Areas of
Interests (AOIs)
28Eye-tracking analysis AOI
29Eye-tracking analysis results
The BOL prototype respondents used more time in
checking their keyboard strokes than the Cogito
respondents
Much viewing time has been spent outside the
display because the agent requires input in terms
of written text using keyboard.
The BOL agent deep links did not function well
enough, so more often the BOL respondents used
the BOL site on its own
The BOL agent animation has attracted double
visual attention wrt the Cogito agent probably
due to the more photo-like appearance, obliging
attitude and a larger repertoire of gesticulations
Most viewing time spent at the text output field.
This is not surprising because users need to read
the text.
30Questionnaire analysis
- Four groups of 8 persons each were recruited for
the test session - 2 groups of novices
- 2 groups of experienced users
- The members of the test groups indicated their
impression and comparison of the two agents by
filling out a detailed questionnaire - Results need to be re-analysed in various ways
and is necessary to make statistical
considerations concerning significance of the
results
31Questionnaire analysis measures
- 7 evaluation criteria
- Impression user feeling or emotions when using
the software - Command measure to which the user feels that
she is in control. - Effectiveness degree to which the user feels
that she can complete the task while using the
system. - Navigability degree to which the user can move
around the application. - Learnability degree to which the user feels
that the application is easy to become familiar
with. - Aidability degree to which the application
assists the user to resolve a situation. - Comprehension degree to which the interaction
with the application is satisfying.
32Questionnaire analysis impression
- The questions related to the impression of the
agent are based on - the agent is enjoyable or a bit awkward to use
- the user recommends the use of the agent to
colleagues
33Questionnaire analysis
BOL agent
COGITO agent
- Novices had negative feelings for both agents,
probably because they expect an agent should act
unimpeachably in all situations - Experienced users are aware of the need of a
period for maturing a new product
34Some general considerations
- General approach for DLs evaluation
- System specification
- Methodology framework based on the means-end
hierarchy - General measures of
- Usability by means of log analysis, eye-tracking
analysis, and questionnaire analysis - Effectiveness of search functions by means of
log analysis and questionnaire analysis - User satisfaction by means of log analysis,
eye-tracking analysis, and questionnaire analysis
35Contacts
- Pasquale Lops
- Dipartimento di Informatica
- Università di Bari
- Tel (39) 080 5442276
- Fax (39) 080 5443196Email lops_at_di.uniba.it