Multimodal Input for Meeting Browsing and Retrieval Interfaces: Preliminary Findings

1 / 21
About This Presentation
Title:

Multimodal Input for Meeting Browsing and Retrieval Interfaces: Preliminary Findings

Description:

PC with speakers. wireless mouse, keyboard. touchscreen. 2 cameras. recording equipment ... mostly non-native English speakers. different levels of computer ... –

Number of Views:36
Avg rating:3.0/5.0
Slides: 22
Provided by: LISO4
Category:

less

Transcript and Presenter's Notes

Title: Multimodal Input for Meeting Browsing and Retrieval Interfaces: Preliminary Findings


1
Multimodal Input for Meeting Browsing and
Retrieval Interfaces Preliminary Findings
  • Agnes Lisowska Susan Armstrong
  • ISSCO/TIM/ETI, University of Geneva
  • IM2.HMI

2
The Problem
  • Many meeting centered projects, resulting in
    databases of meeting data
  • but

3
The Problem
  • Many meeting centered projects, resulting in
    databases of meeting data
  • but
  • How can a real-world user best exploit this data?

4
Mouse-keyboard vs. Multimodal Input
  • Web - similar media (video, pictures, text,
    sound) and we are used to manipulating them with
    keyboard and mouse
  • but

5
Mouse-keyboard vs. Multimodal Input
  • Web - similar media (video, pictures, text,
    sound) and we are used to manipulating them with
    keyboard and mouse
  • but
  • Multimedia meeting domain is novel
  • interesting information found across media in the
    database
  • so .

6
Mouse-keyboard vs. Multimodal Input
  • Web - similar media (video, pictures, text,
    sound) and we are used to manipulating them with
    keyboard and mouse
  • but
  • Multimedia meeting domain is novel
  • interesting information found across media in the
    database
  • so .
  • Multimodal interaction could be the most
    efficient way to exploit cross-media information

7
The Archivus System
  • Designed based on
  • a user requirements study
  • data and annotations available in IM2 project
  • Flexibly multimodal
  • can study the system with minimal a priori
    assumptions about interaction modalities
  • Input
  • pointing mouse, touchscreen
  • language voice, keyboard
  • freeform questions allowed, but not a QA system
  • Output
  • text, graphics, video, audio

8
The Archivus Interface
9
Experiment Scenario
  • Scenario
  • user is a new employee that must do some fact
    finding and checking for their boss
  • Task
  • answer a series of short answer (Who attended
    all of the meetings?) and true/false questions
    (The budget was 1000CHF)
  • 21 questions in total
  • ordering of questions is varied (4 different
    tasks)
  • alternated starting with true/false or
    short-answer
  • Done in the lab, not in the field

10
Experiment Methodology Wizard of Oz
  • What it is
  • user interacts with what they think is a fully
    functioning system but a human is actually
    controlling the system and processing (language)
    input
  • Why
  • allows experimenting with natural language input
    without having to implement SR and NLP
  • Data
  • video and audio
  • users face (reaction to the system)
  • users input devices
  • users screen

11
Experiment Environment
  • Users room
  • PC with speakers
  • wireless mouse, keyboard
  • touchscreen
  • 2 cameras
  • recording equipment
  • Wizards room
  • NL processing simulation
  • view of the user
  • view of the users screen

12
Procedure
  • Pre-experiment questionnaire (demographic
    information), consent form
  • Read scenario description and software manual
  • Phase 1 20 minutes
  • subset of modalities
  • 11 questions (5 true/false, 6 short answer)
  • Phase 2 20 minutes
  • all modalities
  • 10 questions (5 true/false, 5 short answer)
  • Post-experiment questionnaire and interview (time
    permitting)

13
Experiment
  • Participants
  • 24 in total 11 female, 13 male
  • mostly non-native English speakers
  • different levels of computer experience
  • 4 modalities used
  • mouse (M), voice (V), keyboard (K), and
    touchscreen (T)
  • 8 Phase I conditions
  • M, T, V, MK, VK, TVK, MVK, MTVK
  • Experiment was conducted between-subjects, with 3
    subjects per condition

14
What we were looking at
  • Task completion
  • Which modalities result in most success?
  • Learning Effect
  • Does learning with a novel modality encourage its
    use later on?
  • Number of Interactions
  • Are users equally active with functionally
    equivalent modalities?

15
Task Completion
  • Expectation
  • mouse-keyboard would be most efficient
  • Result
  • Mouse-keyboard on par with voice only, TVK and
    MVK
  • Mouse-only, all modalities and touchscreen-only
    are best


Table 1. All answers found in Phase 1
16
Task Completion
  • In the mouse-only and touchscreen-only condition
    user can only make correct moves
  • not the case when voice interaction is involved
  • Touchscreen-only was worse than mouse-only
  • lower pointing accuracy with touchscreen
  • blocking effect due to unfamiliarity with
    touchscreen
  • similar results with MVK and TVK
  • Combining voice with other modalities does add
    value to the interaction

17
Learning Effect
  • Expected use of novel modalities in Phase 1
    increases likelihood of use in Phase 2

Table 2. Number of interactions in Phase 2
  • More voice use in Phase 2 of mouse-only than
    voice-only
  • If given familiar modalities in Phase 1, more
    likely to explore new modalities in Phase 2

18
Learning Effect
  • Lack of learning effect could be caused by
  • unconscious need to feel comfortable with the
    system and input modalities at early stages of
    interaction
  • comfort can manifest in two ways
  • with system itself (same for all users)
  • knowing what graphics represent, type of info
    available and where it can be found
  • with interaction methods (different in
    conditions)
  • what input modalities are available
  • System is slower with voice

19
Number of Interactions pointing modalities
  • Only looked at functionally equivalent modalities
  • Mouse vs. touchscreen
  • users more active with mouse than touchscreen
  • similarly for MKV and TKV
  • Comfort and/or blocking effects are factors
  • Quickly learn strategies with mouse and
    touchscreen

Table 3a Modality interactions per condition and
phase
20
Number of Interactions voice and keyboard
  • Novel input modality on its own is easier to
    learn than novel mod less frequently used half
    of traditional MK paradigm

Table 3b Modality interactions per condition and
phase
  • Keyboard use increases when mouse becomes
    available
  • but
  • In Ph2 in the MK condition, voice is used almost
    twice as much as keyboard, despite the continued
    high use of mouse input

Table 4. Voice vs. keyboard interactions in
Phase 1 and 2 of VK condition
21
Conclusions and Future Work
  • Encouraging results
  • Users can be encouraged to use voice
  • in particular when in combination with other more
    familiar modalities
  • Blocking effect can be reduced
  • especially if all modalities available at once
  • Results achieved despite
  • a high learning curve
  • small number of participants
  • New experiments with
  • new version of system and WOz environment, tablet
    PC, tutorial, more users (10/condition)
Write a Comment
User Comments (0)
About PowerShow.com