COLLATE

1 / 35

About This Presentation

Title:

COLLATE

Description:

Evaluation of digital libraries: Testbeds, measurements, and metrics. June 6-7, ... Navigability degree to which the user can move around the application. ... –

Number of Views:120

Avg rating:3.0/5.0

Slides: 36

Provided by: szt4

Category:

Tags: collate

more less

Transcript and Presenter's Notes

Title: COLLATE

1
Virtual Agents for a Bookstore an Empirical
Evaluation
P. Lops, V. Andersen, H.H.K. Andersen, F.
Abbattista and G. Semeraro () Dipartimento
di Informatica, Università di Bari, Italy ()
Risoe National Lasboratory, Denmark
2
Overview

Introduction
E-commerce problems and solutions
The COGITO project
Empirical evaluation
Conversation Log Analysis
Visual Behaviour Analysis (eye-tracking)
Questionnaire Analysis
Some general considerations

3
Introduction

A Digital Library is not merely a collection of
electronic information. It is a distributed
technology environment that dramatically reduces
barriers to the creation, dissemination,
manipulation, storage, integration and reuse of
information by individuals and groups (Lesk).
Digital Library can play a relevant role in
several key areas of the e-era, such as
e-government, e-learning, e-publishing, and
e-commerce.

4
DLs vs e-commerce the (obvious) differences

DLs
You need to be a paying member
You have to stay on the same site
Often dedicated to specific domains
No specific stimulation needed

E-commerce
You may search freely until you choose to buy
You may jump from one site to another
Usually broader in domains
Stimulation is very important

5
DLs vs e-commerce the evaluation

For DLs the evaluation is based on objective
measures (way of sorting, recall, precision of
documents compared with the requested topic)
For e-commerce the evaluation is highly based on
subjective feelings (satisfaction of the
individual customer related to user interface,
functionality, and overall performance).
The trust of correctness of information is much
more important for DLs than for e-commerce.

6
Problems concerning E-Commerce

Getting people started on the Web and making
their first purchase
Using traditional metaphors for shopping on web
sites
Users are forced to make their model of shopping
fit into a web structure with which they are not
familiar
Getting people to submit personal information
Summarizing UNCERTAINTY

7
The goal of the COGITO project

COGITO aims at improving consumer-supplier
relationships in e-commerce through intelligent
personalized agents which can play the role of
virtual assistants for users.

HOW ?

Using intelligent retrieval process integrated
with chatterbot technology
Extracting and exploiting User Models

8
The COGITO application scenario

Virtual shop of books, CDs, DVDs, gifts (BOL.de)
User profiles the key for personal
recommendations
Improvement of search capabilities exploiting the
knowledge about users (query expansion)
Interaction by means of a chatterbot

9
BOL web site
10
Scenario 1 unknown user

The user is not registered
The profile is not available to the system
The user requires a book written by King

11
Scenario 1 unknown user
List of books belonging to several categories by
authors whose last name is King
12
Scenario 2 registered user

The user is already registered
The profile is available to the system
The user likes Science Technique
The user dislikes Narrative
The user requires a book written by King

13
Scenario 2 registered user
List of books by authors whose last name is
King belonging to the book category
Naturwissenschaften
14
Empirical evaluation the framework

Used for evaluating the performance of the agent
based on the means-end hierarchy.
Also used during the phase of requirements
specification
Requirements are classified in three levels
the strategic-level
the procedural-level
the operational-level

15
Means-end hierarchy
General hierarchy
Condensed hierarchy
each level is specified by the next upper level
concerning the reason for an action, and by the
next lower level concerning how this action may
be supported
16
Empirical evaluation example of requirements

Strategic requirements
Increase trust
Increase customer loyalty
Increase conversion rate
Procedural requirements
Improve naturalness and effort involved in user
giving information to the system
Support a natural dialogue users are happy to
take part in
Provide guided tours of the system
Operational requirements
Encounter few dead ends in a conversation
Run without plug-ins
No login procedure

17
Evaluation of COGITO to check...

whether the interface succeeds in getting the
user into a dialogue
whether users notice the tailored parts of the
dialogue
whether users notice and accept the chatterbot
Performed by letting groups of test persons solve
various tasks related to searching
general/specific information using the agent on
the BOL site

18
Empirical evaluation methods

Partly based on quantitative measures
Analysis of the conversation log
Analysis of eye-tracking
Partly based on qualitative measures
Fulfilment of detailed questionnaires
The COGITO proactive agent has been compared with
the BOL agent, a state-of-the-art agent with no
proactive features

19
The two agents
BOL state-of-the-art agent (no proactive)
COGITO proactive agent
20
Set-up of the experiment
Test person, moderator and the eye-tracking system
21
Session introduction

Give your honest opinion, we are not the
programmers and will therefore not be personally
offended
We are testing the system, you are not to be
tested
The agent is not perfect, if so, we didnt need
to test it
Please, think aloud indicate what you think and
intend to do

22
Conversation log analysis

Purpose
Measure the conversation performance in terms
of number of
Correct text output
Fallback sentences
Proactive sentences
Search results
Average length of user queries

23
Conversation log analysis measures

Correct text output
Manual analysis of successful elements of the
agent-user dialogue consisting of one user text
input string, e.g. a query, and one agent output
text string, e.g. delivering a correct answer
Fallback sentences
Degree of the heterogeneousness of the
conversations. A large occurrence of fallback
sentences is an expression of poor conversation
performance
Proactive sentences
A contextually meaningful response to user input

24
Conversation log analysis measures

Search results successful query
If the agent on the basis of the Query Expansion
process prompts the BOL search engine and
produces a correct list of results in terms of
relevance for a given task
The queries listed in the conversation log have
been repeated and the results analysed wrt the
users tasks.

25
Conversation log analysis results
26
Conversation log analysis results

Analysis of the average length of user queries
Query to the Cogito system average length 5.05
words
Query to the Excite system average length 2.21
words
Query to traditional IR systems average length
7-15 words
Good performance of the Cogito system (see search
results) because the agent allows the users to
type their queries in a conversational manner
without the use of Boolean operators.

27
Eye-tracking analysis

Measurement of the respondents visual behaviour
during the evaluation session
The device is non-intrusive
Video recording of the eye-movements together
with the graphic signal from the computer
Division of the screen into 5 Areas of
Interests (AOIs)

28
Eye-tracking analysis AOI
29
Eye-tracking analysis results
The BOL prototype respondents used more time in
checking their keyboard strokes than the Cogito
respondents
Much viewing time has been spent outside the
display because the agent requires input in terms
of written text using keyboard.
The BOL agent deep links did not function well
enough, so more often the BOL respondents used
the BOL site on its own
The BOL agent animation has attracted double
visual attention wrt the Cogito agent probably
due to the more photo-like appearance, obliging
attitude and a larger repertoire of gesticulations
Most viewing time spent at the text output field.
This is not surprising because users need to read
the text.
30
Questionnaire analysis

Four groups of 8 persons each were recruited for
the test session
2 groups of novices
2 groups of experienced users
The members of the test groups indicated their
impression and comparison of the two agents by
filling out a detailed questionnaire
Results need to be re-analysed in various ways
and is necessary to make statistical
considerations concerning significance of the
results

31
Questionnaire analysis measures

7 evaluation criteria
Impression user feeling or emotions when using
the software
Command measure to which the user feels that
she is in control.
Effectiveness degree to which the user feels
that she can complete the task while using the
system.
Navigability degree to which the user can move
around the application.
Learnability degree to which the user feels
that the application is easy to become familiar
with.
Aidability degree to which the application
assists the user to resolve a situation.
Comprehension degree to which the interaction
with the application is satisfying.

32
Questionnaire analysis impression

The questions related to the impression of the
agent are based on
the agent is enjoyable or a bit awkward to use
the user recommends the use of the agent to
colleagues

33
Questionnaire analysis

BOL agent

COGITO agent

Novices had negative feelings for both agents,
probably because they expect an agent should act
unimpeachably in all situations
Experienced users are aware of the need of a
period for maturing a new product

34
Some general considerations

General approach for DLs evaluation
System specification
Methodology framework based on the means-end
hierarchy
General measures of
Usability by means of log analysis, eye-tracking
analysis, and questionnaire analysis
Effectiveness of search functions by means of
log analysis and questionnaire analysis
User satisfaction by means of log analysis,
eye-tracking analysis, and questionnaire analysis

35
Contacts