Title: Multimodal Input for Meeting Browsing and Retrieval Interfaces: Preliminary Findings
1Multimodal Input for Meeting Browsing and
Retrieval Interfaces Preliminary Findings
- Agnes Lisowska Susan Armstrong
- ISSCO/TIM/ETI, University of Geneva
- IM2.HMI
2The Problem
- Many meeting centered projects, resulting in
databases of meeting data - but
3The Problem
- Many meeting centered projects, resulting in
databases of meeting data - but
- How can a real-world user best exploit this data?
4Mouse-keyboard vs. Multimodal Input
- Web - similar media (video, pictures, text,
sound) and we are used to manipulating them with
keyboard and mouse - but
5Mouse-keyboard vs. Multimodal Input
- Web - similar media (video, pictures, text,
sound) and we are used to manipulating them with
keyboard and mouse - but
- Multimedia meeting domain is novel
- interesting information found across media in the
database - so .
6Mouse-keyboard vs. Multimodal Input
- Web - similar media (video, pictures, text,
sound) and we are used to manipulating them with
keyboard and mouse - but
- Multimedia meeting domain is novel
- interesting information found across media in the
database - so .
- Multimodal interaction could be the most
efficient way to exploit cross-media information
7The Archivus System
- Designed based on
- a user requirements study
- data and annotations available in IM2 project
- Flexibly multimodal
- can study the system with minimal a priori
assumptions about interaction modalities - Input
- pointing mouse, touchscreen
- language voice, keyboard
- freeform questions allowed, but not a QA system
- Output
- text, graphics, video, audio
8The Archivus Interface
9Experiment Scenario
- Scenario
- user is a new employee that must do some fact
finding and checking for their boss - Task
- answer a series of short answer (Who attended
all of the meetings?) and true/false questions
(The budget was 1000CHF) - 21 questions in total
- ordering of questions is varied (4 different
tasks) - alternated starting with true/false or
short-answer - Done in the lab, not in the field
10Experiment Methodology Wizard of Oz
- What it is
- user interacts with what they think is a fully
functioning system but a human is actually
controlling the system and processing (language)
input - Why
- allows experimenting with natural language input
without having to implement SR and NLP - Data
- video and audio
- users face (reaction to the system)
- users input devices
- users screen
-
11Experiment Environment
- Users room
- PC with speakers
- wireless mouse, keyboard
- touchscreen
- 2 cameras
- recording equipment
- Wizards room
- NL processing simulation
- view of the user
- view of the users screen
12Procedure
- Pre-experiment questionnaire (demographic
information), consent form - Read scenario description and software manual
- Phase 1 20 minutes
- subset of modalities
- 11 questions (5 true/false, 6 short answer)
- Phase 2 20 minutes
- all modalities
- 10 questions (5 true/false, 5 short answer)
- Post-experiment questionnaire and interview (time
permitting)
13Experiment
- Participants
- 24 in total 11 female, 13 male
- mostly non-native English speakers
- different levels of computer experience
- 4 modalities used
- mouse (M), voice (V), keyboard (K), and
touchscreen (T) - 8 Phase I conditions
- M, T, V, MK, VK, TVK, MVK, MTVK
- Experiment was conducted between-subjects, with 3
subjects per condition
14What we were looking at
- Task completion
- Which modalities result in most success?
- Learning Effect
- Does learning with a novel modality encourage its
use later on? -
- Number of Interactions
- Are users equally active with functionally
equivalent modalities?
15Task Completion
- Expectation
- mouse-keyboard would be most efficient
- Result
- Mouse-keyboard on par with voice only, TVK and
MVK - Mouse-only, all modalities and touchscreen-only
are best
Table 1. All answers found in Phase 1
16Task Completion
- In the mouse-only and touchscreen-only condition
user can only make correct moves - not the case when voice interaction is involved
- Touchscreen-only was worse than mouse-only
- lower pointing accuracy with touchscreen
- blocking effect due to unfamiliarity with
touchscreen - similar results with MVK and TVK
- Combining voice with other modalities does add
value to the interaction
17Learning Effect
- Expected use of novel modalities in Phase 1
increases likelihood of use in Phase 2
Table 2. Number of interactions in Phase 2
- More voice use in Phase 2 of mouse-only than
voice-only - If given familiar modalities in Phase 1, more
likely to explore new modalities in Phase 2
18Learning Effect
- Lack of learning effect could be caused by
- unconscious need to feel comfortable with the
system and input modalities at early stages of
interaction - comfort can manifest in two ways
- with system itself (same for all users)
- knowing what graphics represent, type of info
available and where it can be found - with interaction methods (different in
conditions) - what input modalities are available
- System is slower with voice
19Number of Interactions pointing modalities
- Only looked at functionally equivalent modalities
- Mouse vs. touchscreen
- users more active with mouse than touchscreen
- similarly for MKV and TKV
- Comfort and/or blocking effects are factors
- Quickly learn strategies with mouse and
touchscreen
Table 3a Modality interactions per condition and
phase
20Number of Interactions voice and keyboard
- Novel input modality on its own is easier to
learn than novel mod less frequently used half
of traditional MK paradigm
Table 3b Modality interactions per condition and
phase
- Keyboard use increases when mouse becomes
available - but
- In Ph2 in the MK condition, voice is used almost
twice as much as keyboard, despite the continued
high use of mouse input
Table 4. Voice vs. keyboard interactions in
Phase 1 and 2 of VK condition
21Conclusions and Future Work
- Encouraging results
- Users can be encouraged to use voice
- in particular when in combination with other more
familiar modalities - Blocking effect can be reduced
- especially if all modalities available at once
- Results achieved despite
- a high learning curve
- small number of participants
- New experiments with
- new version of system and WOz environment, tablet
PC, tutorial, more users (10/condition)