A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System

About This Presentation

Title:

A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System

Description:

Roldano Cattoni, ITC-irst. Erica Costantini, University of Trieste. July 8, 2002 ... Each site should verify that most up-to-date results are being reported ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 20

Provided by: AlonL

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System

1
A Multi-Perspective Evaluation of the NESPOLE!
Speech-to-Speech Translation System

Alon Lavie, Carnegie Mellon University
Florian Metze, University of Karlsruhe
Roldano Cattoni, ITC-irst
Erica Costantini, University of Trieste

2
Outline

The NESPOLE! Project
Approach and System Architecture
Performance and Usability Challenges
Distributed real-time performance over internet
Integration and use of multi-modal capabilities
End-to-end Translation performance
Lessons learned and conclusions

Speech-to-speech translation for E-Commerce
applications
Partners CMU, Univ of Karlsruhe, ITC-irst,
UJF-CLIPS, AETHRA, APT-Trentino
Builds on successful collaboration within C-STAR
Improved limited-domain speech translation
Experiment with multimodality and with MEMT
Showcase-1 Travel and Tourism in Trentino,
completed in Nov-2001, demonstrated
Showcase-2 expanded travel medical service

4
Speech-to-speech in E-commerce

Replace current passive web E-commerce with live
interaction capabilities
Client starts via web, can easily connect to
agent for specific information
Thin client - very little special hardware and
software on client PC browser, MS Netmeeting,
Shared Whiteboard

5
NESPOLE! User Interfaces
6
NESPOLE! Architecture
7
Distributed S2S Translation over the Internet
8
Network Traffic Impact
9
NESPOLE! Monitor
10
Aethra Whiteboard
11
Recent Developments Apr-02

Improved analysis and generation grammars (using
old C-STAR data)
Improved SR engines
Packet-loss, video, and modem connection tests
Data Collection for Showcase 2A
Evaluation Scheme Experiment
Paper and Demo at HLT-02
Paper submissions to ACL-02, ICSLP-02, ESSLLI-02

12
IF Status Report

Presented by Donna Gates

13
WP5 HLT Modules

Data Collection for Showcase-2A completed in
February-2002
Status of transcriptions from all sites?
CMU will maintain a data repository (Alon
collecting all data CDs here)
IF discussions and development have already
started (Donna)
Development Schedule?

14
WP7 Evaluation

D9 Evaluation of Showcase-1 Report draft
circulated earlier this week
Each site should verify that most up-to-date
results are being reported
Include detailed tables in the report?
Majority vote finalize a common procedure
New evaluation experiments

15
Majority Vote Scheme

Issue did all sites use same guidelines?
What to do when there is no majority?
i.e. 4 graders assign P/P/K/K
What to do when there is complete disagreement?
i.e. 3 graders assign P/K/B
Need to recalculate scores from prev evaluation?

16
New Evaluation Experiments

We are investigating three main issues
Binary versus 3-way grading
Majority vote versus averaging of scores
Intercoder and Intracoder agreement
Grading Experiment
Four groups, three graders in each group
Each group grades two sets, two weeks apart
Sets are different but have a common large
overlap
Groups differ in eval scheme used (binary/3-way)

17
Planned Analysis of Data

Compare results across grading schemes (binary
vs. 3-way) on same set of data
Compare majority scores with average scores
Evaluate Intercoder agreement between graders (on
same set and same scheme)
Evaluate Intracoder agreement of same grader (on
overlap data in the two sets, same grading scheme
in both sessions)

18
Preliminary Results
Group(procedure) W1 Acc W1 Acc W1 Bad W1 Bad W2 Acc W2 Acc W2 Bad
Gr1 (binary/3-way) 50.2 49.8 49.8 48.7 48.7 51.3 51.3
Gr2 (3-way/binary) 52.4 47.6 47.6 48.8 48.8 51.2 51.2
Gr3 (3-way/3-way) 53.8 46.2 46.2 54.9 54.9 45.1 45.1
Gr4 (binary/binary) 49.0 51.0 51.0 50.0 50.0 50.0 50.0
19
Plans for Final Evaluations