Chinese Romanization for Chinese Voice Browsing - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Chinese Romanization for Chinese Voice Browsing

Description:

A Chinese character represents a meaning more than a pronunciation. ... of alphabet, we propose to use 'x-CSBQTS-96' to represent Chinese Romanization alphabet. ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 28
Provided by: w3
Category:

less

Transcript and Presenter's Notes

Title: Chinese Romanization for Chinese Voice Browsing


1
Chinese Romanization for Chinese Voice Browsing
  • IBM China Research Lab

2
Index
  • Motivations Proposals
  • IPA. VS. Chinese Romanization
  • Chinese Romanization Standards
  • Implementations of Chinese Romanization in SSML
  • Extensions for other languages

3
Motivations Proposals
4
IBM Speech Synthesis System
  • IBM speech synthesis system support about 20
    languages.
  • For Asian Language, we cover
  • Mandarine,
  • Cantonese,
  • Korean,
  • Japanese,
  • Thai.

5
Pronunciations Annotations are important for
Chinese
  • A Chinese character represents a meaning more
    than a pronunciation.
  • The homograph phenomenon is very common for
    Chinese characters.
  • So it will be very helpful if the pronunciation
    can be given explicitly.

6
Proposals
  • We propose to use Chinese Romanization to
    annotate Chinese pronunciation in phoneme
    element.
  • We also propose SSML to use diverse predefined
    and widely used pronunciation annotation
    standards for different languages.
  • Thus SSML can be more easily accepted and used
    around the world.
  • Note Chinese Romanization Hanyu Pinyin in this
    PPT.

7
IPA. VS. Chinese Romanization
8
Comparison Rule Goal of SSML
  • The goal of SSML is to provide a rich, XML-based
    markup language for assisting the generation of
    synthetic speech in Web and other applications.
  • To reach the goal, we need more and more users of
    SSML, such as ordinary Web applications
    developers, to learn and use the SSML easily.
  • So, we need to define the SSML based on ordinary
    peoples knowledge and skill rather than
    professional linguistics knowledge.
  • Otherwise, it will be a long way for SSML be
    widely accepted and used around the world.

9
IPA is not very fit for Chinese
  • IPA tries to collect an exhaustive set of
    pronunciations for all kinds of languages.
  • It has become very complicated and difficult to
    input.
  • A well educated Chinese adult can not annotate
    Chinese Pronunciation in IPA without special
    training.
  • IPA is not very popular in China.
  • Special linguistic phenomena in Chinese, such as
    tone, retroflex, can not be conveniently
    described by IPA.

10
Chinese Romanization is fit for Chinese
  • Chinese Romanization is specially designed only
    for Chinese instead of all languages.
  • Adding r in the end to describe a retroflex
    syllable.
  • Adding tone attribute to describe the tone.
  • Chinese Romanization is widely used and learnt.
  • Chinese people learn Chinese Romanization in
    primary school.
  • Many foreigners begin to learn Chinese by Chinese
    Romanization.
  • Chinese Romanization is widely used to input
    Chinese Characters on computer.
  • Chinese government has brought into effect a
    standard for Chinese Romanization.
  • It is in effect for education, publishing,
    information processing and other related
    industries in China.

11
Chinese Romanization Standards
12
Chinese Romanization Standard
  • The writing rules of Chinese Romanization conform
    to P.R.C state standard Basic rules for Hanyu
    Pinyin Orthography 1 published by (CSBQTS) in
    1996.
  • This Orthography is based on Hanyu Pinyin
    Schema published in 1958.
  • According to the naming method of alphabet, we
    propose to use x-CSBQTS-96 to represent Chinese
    Romanization alphabet. However, we also propose
    to use x-Pinyin-96, which is easier to
    remember.
  • CSBQTS China State Bureau of Quality and
    Technical Supervision

13
Hanyu Pinyin Schema (published in 1958)
  • Character Set.
  • 25 characters, all from a to z except ü.
  • (For easy to input on computer ü is replaced by
    v.)
  • Initial Set
  • b, p m, f, d, t, n, l, g, k, h, j, q, x, zh, ch,
    sh, r, z, c, s
  • Final Set
  • i, u, ü, a , ia, ua, o, uo, e, ie, eü, ai, uai,
    ei, uei,
  • ao, iao, ou, iou, an, ian, uan, üan, en, in,
    uen, ün
  • ang, iang, uang, eng, ing, ueng, ong, iong,
  • Tone Annotation
  • ma , má, ma, mà, ma
  • Separator '
  • piao

14
Pinyin VS. IPA
15
Basic rules for Hanyu Pinyin Orthography
(published in 1996)
  • 1. Words are the basic units for spelling the
    Chinese Common Language. (Space is used to
    separate Word)
  • rén (person/people), péngyou (friends),
    túshuguan (library/libraries)
  • worén hé nóngmín (Workers and Farmers)
  • 2. Structures of two or three syllables that
    indicate a complete concept are linked
  • quánguó (the whole nation), duìbuqi (sorry),
  • 3. Separate terms with more than 4 syllables if
    they can be separated into words, otherwise link
    all the syllables
  • wúfèng gangbi (seamless pen), Hóngshízìhuì (Red
    Cross)

16
Basic rules for Hanyu Pinyin Orthography
(published in 1996)
  • 4. Reduplicated monosyllabic words are linked,
    but reduplicated disyllabic words are separated
  • rénrén (everybody), chángshi chángshi (give it a
    try)
  • 5. In certain situations, for the purpose of
    making it convenient to read and understand the
    words, a hyphen can be added
  • huán-bao (environmental protection), shíqi-ba suì
    (17 or 18 years old)

17
Implementations of Chinese Romanization in SSML
18
Implementation 1
  • lt?xml version"1.0"?gt
  • ltspeak version"1.0" xmlns"http//www.w3.org/2001
    /10/synthesis"
  • xmlnsxsi"http//www.w3.org/2001/XMLSche
    ma-instance"
  • xsischemaLocation"http//www.w3.org/200
    1/10/synthesis
  • http//www.w3.org/TR/speech-syn
    thesis/synthesis.xsd"
  • xmllang"zh-CH"gt
  • ltphoneme alphabet" x-CSBQTS-96" ph"duìbuqi"gt
    ??? lt/phonemegt
  • lt!-- This is an example of Chinese Romanization
  • Standard Tone Annotation--gt
  • lt/speakgt

19
Implementation 2
  • lt?xml version"1.0"?gt
  • ltspeak version"1.0" xmlns"http//www.w3.org/2001
    /10/synthesis"
  • xmlnsxsi"http//www.w3.org/2001/XMLSche
    ma-instance"
  • xsischemaLocation"http//www.w3.org/200
    1/10/synthesis
  • http//www.w3.org/TR/speech-syn
    thesis/synthesis.xsd"
  • xmllang"zh-CH"gt
  • ltphoneme alphabet"x-CSBQTS-96"
    ph"dui4bu0qi3"gt ??? lt/phonemegt
  • lt!-- This is an example of Chinese Romanization
  • using number to describe tone --gt
  • lt/speakgt

20
Comparison between Two implementations
  • Implementation 1
  • ltphoneme alphabet" x-CSBQTS-96" ph"duìbuqi"gt
    ??? lt/phonemegt
  • Implementation 2
  • ltphoneme alphabet"x-CSBQTS-96"ph"dui4bu0qi3"gt
    ??? lt/phonemegt
  • Note "x-CSBQTS-96" may be replaced by
    "x-Pinyin-96"

21
Extensions for other languages
22
Extension for Cantonese
  • The Linguistic society of Hong Kong has published
    a simple, easy-to-learn and easy-to-use LSHK
    Cantonese Romanization Scheme in 1993.
  • This scheme is widely adopted in various areas
    education, Cantonese information process and
    computer input method, etc.
  • So we also propose to use The LSHK Cantonese
    Romanization Scheme to annotate Cantonese
    pronunciation.

23
Extension for more languages
  • Though it is possible to form up a general
    standard to annotate all languages
    pronunciation, such a standard may become very
    complex to use.
  • Another way is to use the predefined and widely
    accepted pronunciation annotation standards for
    different language.
  • At least, these diverse standards should be an
    important complement to the general standard.

24
Thank you!
25
Korea Romanization
It is used in our Korea Speech Synthesis System.
26
Japanese Romanization
  • Japanese
  • ??????????? ???????
  • Japanese Romanization
  • mada oboeteiru deshou nami oto ni tsutsumarete
  • English meaning
  • Do you remember being surrounded by the sound of
    tide?

27
Discussion of Word
  • What is the definition of Word in Chinese?
  • Prosodic Word or Grammar Word
  • ???????ni lái háishi bù lái?
  • Is ?? a word?
  • What is the difference between Word break?
  • The misunderstanding problem can be solved by
    adding break.
  • Can Word information be handled by Hanyu Pinyin
    Orthography?
  • In Hanyu Pinyin Orthography, space is used to
    separate words.
Write a Comment
User Comments (0)
About PowerShow.com