Title: Graz
1About the use of UNL in key words, key images
and key concepts transcultural analysis
2Preparing for the 2008 Beijing Olympics The
LingTour and KNOWLISTICS projects
MAO Yuhang, DING Xiao-Qing, NI Yang, LIN
Shiuan-Sung, Laurence LIKFORMAN, Gérard
CHOLLET Presented here by Gérard
CHOLLETchollet_at_tsi.enst.fr ENST/CNRS-LTCIhttp//
www.tsi.enst.fr/chollet
3Outline
- Rationale of the proposal
- Objectives
- The Beijing 2008 Olympics
- Approaches
- Multimedia, multilingual information server
- Intelligent Camera
- Bilingual Voice Communicator
- Needs and relevance
- A PDA for tourists and travelling businessmen
- Conclusions and Perspectives
4Rationale for the IP-KNOWLISTICS
- Logistics for knowledge in a specific domain (OG)
- Language independent knowledge representation and
management - Multimedia (text, speech, image, video)
- Multimodal access (text, speech, visual I/O)
- Distributed multimedia server accessible from
mobile terminals (phone, PDA, PC,) - Primarily targetted to tourist applications
initially - 2008 Beijing Olympics as a field trial
5Technical developments
- Language independent knowledge representation
(using conceptual graphs and an Intermediate
Representation Language like Universal
Networking Language) - Summarisation and reformulation of texts
- Generation in 12 target languages
- Speech synthesis and recognition
- VoiceXML-based interactive dialog agent
- Intelligent camera with Chinese character
recognition - Cross-language Multimodal communicator on a PDA
- Cross-language lexical access
6Chinese character recognition
7Intelligent camera from Tsinghua Univ.
8Extracting text in scene images
- Complex color images
- Uncontrolled illumination
- Variations size, fonts, orientation, texture
- Complex backgrounds, shadows
9Text extraction
- Searching for character regions (text has uniform
color) - Multi-channel decomposition
- Connected components analysis
- Grouping of components
- Alignment analysis (number of horizontally or
vertically aligned components) - Text identification (language independant
features size, alignment,) - Detection rate 84
- False alarm rate 5.6
10Cross-language Multimodal Communicator
- Use of a visual display (e.g. on a PDA) to
mediate the dialogue between 2 persons speaking
different languages. - Recognition of short utterances, display of a
word graph, selection of keywords, visualisation
(and synthesis) of the translation of key words
and groups of words. - Specialised lexicon for dialog acts in typical
touristic situations (in a restaurant, at the
hotel, in the street, in public transport, about
the Olympic games,) - UMTS access to an information server offering
maps, photographs, video sequences,
11Generation in target languages
- Sharing of acoustic models between languages to
simplify extensibility to other languages. - Combination of phone models with small amounts of
data. - Models adaptation to user and environmental
situations.
Chinese
French
Language specific models
12Knowledge representation
- A formal language for representating the meaning
of natural language sentence. - UNL (Universal Networking Language) introduced to
describe natural language semantics. - Language-independent context indexing, possible
for cross-language information retrieval. - Use of conceptual hierarchy of UNL to address the
inherent ambiguity of natural languages. - A set of semantic relations (linking concepts
together) for a structured information pattern.
13UNL representation
The cat drank the milk
can be encoded by
agt(drink(iclgtdo,agtgtthing, objgtliquid)._at_past._at_ent
ry, cat(iclgtmammalgtanimal)._at_def)
obj(drink(iclgtdo,agtgtthing, objgtliquid)._at_past._at_ent
ry, milk(iclgtbeveragegtfood)._at_def)
agt, obj are binary semantic relations
14Role of semantic contents representation in
indexing
Users request
AudioVideo
Textual
Digital
UNL encoding
Cross lingual Multimedia platform
User specific information
UNL decoding
15Application architecture
UMTS server
Access information
a word graph, a list of keywords
Translation
Speech synthesis
16Digital OlympicMulti-Language Information
Network Service System Project
17From VoiceXML to VoiceUNL.
Presented here by Gérard CHOLLETchollet_at_tsi.enst
.fr ENST/CNRS-LTCIhttp//www.tsi.enst.fr/chollet
With the contribution of Christian BOITET
18Outline
- Rationale of the proposition
- Objectives
- Promotion of a new standard, demonstrations
- Approaches
- An extra layer of VoiceXML
- Need and relevance
- Multilingual Vocal Servers
- Integration and structuring effect
- Conclusions and Perspectives
19Rationale for VoiceUNL
- Need for Language Independent Vocal Servers,
- ? Need for a language independent knowledge
representation and management formalism - Principle of proposed solution
- Start from UNL graphs augmented with
voice-oriented semantic marks (special UWs,
attributes), - Generate in the target language,
- Voice-oriented marks become prosodic markers,
- Final conversion to VoiceXML
- 2008 Beijing Olympics as a field trial
20What is VoiceXML ?
- A recommendation of W3C (WWW Consortium)
- An extension of XML for vocal information
servers, - A set of normalised markup tags,
- Current ags concern language identification,
voice prompting, speech synthesis, form filling,
barge in, echo cancelling, - No provision to access a semantically encoded
data base, - Need for a UNL-type front-end
- Compatibility with MPEG4-SNHC (talking head)
21Applications
22Prosodic information in UNL
- Attributes that can influence the grammatical and
the prosodic structure of a sentence already
exist - _at_emphasis
- _at_qfocus
- Representations should be defined, concerning
- Emotion _at_angry, _at_bored, _at_relaxed?
- Focus grouping words to emphasize in a scope?
- Passivity _at_passive?
- Speaker _at_age, _at_sex, special UWs for voice
characteristics? - Expression (for face and gesture animation)
special UWs/constructs?
23Conclusions and Perspectives
- Demonstrations to be prepared within the
LingTour, Normalangue and KNOWLISTICS projects - First target is the Beijing 2008 Olympics
- Some concept-oriented formalism
- (such as Sowa's conceptual graphs)
- may be used to store knowledge
- before building in UNL
- "interlingual prelinguistic, communicative
content"
24Conclusions and Perspectives
- UNL representation of meaning of natural language
sentences directly available for retrieval,
indexing and knowledge extraction. - UNL with multimedia contents (text, speech,
image, video) and multimodal access (text,
speech, visual I/O) to enrich the service for
communication. - Comprehensive and extensive information service
on PDAs with access to UMTS and wireless LAN.