Title: Fist page
1Country Report Vietnam Luong Chi Mai Institute
of Information Technology Vietnamese Academy of
Science and Technology lcmai_at_ioit.ac.vn
2VLSP National Project (ICT Program)
- National Project 2006 2008, 2008 2010 with
participation of ten research groups (all active
groups on VLSP) - Objectives
- Basic research on methods for processing
Vietnamese language and speech - Build and develop several typical products for
VLSP for public end-users. - Build and develop indispensable resources and
tools for the VLSP development
3Objective of the Project Basic research
- Basic research on methods for processing
Vietnamese language and speech. - Applied research to adapt methods, technologies,
advanced techniques for other languages to
Vietnamese language and speech.
Typical products for the end-users
Resources and tools for VLSP
Computation methods for VLSP
4Phonetic Structure
Pitch
(7)
(1)
(5)
(3)
(6)
(2)
(4)
(8)
Time
- (7) and (8) have F0 contour similar to (5) and
(6), but rise and fall more sharply - (8) is not accompanied by glottalization
5Some Current Text Corpora
- Monolingual corpora VLC (Vietnam Lexicography
Centre), UNS-VNUHCM, etc. for Vietnamese - Bilingual corpora The EVC corpus (UNS-VNUHCM)
consists of 400,000 pairs of E-V sentences
(approx. 5,500,000 words) in the fields of
Science and Technology (Computer,
Electronics,..). This EVC has been being
partially annotated with morphology (word
boundary, lemmatize), POS and Sense tags
semi-automatically.
6Some Current Speech Corpora
- Broadcasting Speech Corpus VOV (Voice of Vietnam)
contains ? 23,000 utterances, ? 4,000 distinct
syllables - 30 broadcasters and speakers reciting stories,
news reports, colloquy - Data digitized at 16,000 Hz sampling rate, using
16 bits per sample - All data were manually transcribed at syllable
level - At phonetic level Corpus contains all Vietnamese
phonemes, but is not phonetically balanced (up to
50 capacity from story reading programs) - The number of speakers is limited and most
speakers are Northern persons. So, corpus does
not cover most variations of Vietnamese speech
7Some Current Speech Corpora
- Telephone Speech Corpus
- Mobile phone 170 speakers from the North (males
55) and (females 45), 1600 digit strings - Cordless phone 208 speakers from the South (130
males, 78 female, 442 utterances with 2340 words - Labeling at syllable level, labeled manually at
phonetic level, using forced alignment with
manual adjustment (using HTK and CSLU toolkit) - Develop Dialog System for Continuous Digit
Recognition and VnTTS for reading SMS on Smart
phone (Symbian).
8Some Current Speech Corpora
- Vietnamese TTS Speech corpus
- one female voice from a short story
- 567 utterances of an average length of 15
syllables, about 40.000 syllables - 11 kHz sampling rate, and 16-bit resolution
- Corpus is labeled in syllable level, segment
boundaries - Prosody detection (Fujisaki model) CART for
manipulation of duration - Develop Vietnamese TTS system VnVoice based on
PSOLA
9Some Current Speech Corpora
- VNSpeech corpus
- 5 different kinds of units Phoneme, Tones,
Digits and string of digits, application words,
sentences and paragraphs - text collected by using a web-robot (about 2500
websites in Vietnam) with about 10,020,000
sentences. - 50 speakers
- Sentences corpus is divided into two parts, a
common part and a private part - The common part 33 conversations and 37
paragraphs. They were read by all speakers. - The private part about 2,000 short paragraphs,
each speaker was asked to read 40 paragraphs.
10Some Current Speech Corpora
- Distribution of mono-phones in speech corpus and
Web corpora
11Some Current Speech Corpora
- Distribution of six tones in speech corpus and
Web corpora
12Design New Corpora
- Main goal design and realize of corpora
available - to provide the Vietnamese researchers with a
basic amount of speech material for general
speech research, including speech synthesis and
speech recognition - for developing commercial speech recognition
engines in given purposes (number recognizer,
limited command recognizer, name recognizer, ...) -
13Design of General Corpus
- General Corpora
- for general purpose to do research on continuous
independent-speaker recognition with the large
vocabulary - selected text from text corpus, and text selected
by linguistics - Number of speakers 200-300 (50 -male and 50 -
female), ages 15-45 - Number of sentences 300 sentences, each sentence
is spoken by one speaker for at lease 3 times. - Size of vocabulary 3000-4000 Vietnamese
syllables - Requirement context balance is obtained among
Vietnamese phonemes
14Design of Specific Corpus
- Continuous Digit Corpus
- is used to make the recognition applications
- Number of speakers 100-200 (50 -male and 50 -
female), ages 15-45 - Concurrence of digits should be approximately the
same for all digits, the sentences consist of
digits with random order and have variant
lengths. - The number of word will be from 10000 words, each
sentence is recorded at least 3 times for each
speaker. - Name corpus
- Popular Vietnamese names including family name
and first name. - Number of speaker 100-200 (50 -male and 50 -
female), ages 15-45 - Each sentence consists of one full name and
spoken by a speaker for at least 3 times. - The sentence should contain as many as possible
names of people in Vietnam. The size of
vocabulary is estimated about 2000 words.
15Corpus Organization
- The database consists of several sub-databases
- 1. /general Material for training, testing the
general recognizer - 2. /Digit Material for training, testing digit
continuous recognizer - 3. /Name Material for training, testing proper
name recognizer - .... Other sub-database to be inserted
depending on the specific purposes - Each sub-database has directory hierarchy is as
follows - 1. / database root directory
- 2. /train to be used for system training
- 3. /test material to be used for system
testing - 4. / doc online documentation and tables
unusual) - The train and test directories contain
sub-directories corresponding to each speaker,
whose names are coded as follows XXXSRR where - XXX speaker identifier
- S sex code F for female, M for male
- RR region code (HN for Hanoi, SG for Saigon,
HU for Hue). - Each sentence directory contains 3
sentence-related files. - wav wave file
- phn - phonetically-based transcription
16Thank you !