Title: Study of Nucleus Vowel Duration and
1Study of Nucleus Vowel Duration and its Role in
Prosody of Bangla
By Rajib Roy, Tulika Basu, Joyanta Basu, Arup
Saha
rajibroy_at_kolkatacdac.in
Centre for Development of Advanced Computing
(C-DAC) Kolkata, India Speech Group
www.kolkatacdac.in
2Introduction
What is Prosody?
Original speech
Flat synthesized speech
- Prosodic Parameters
- Pitch (F0)
- Duration
- Amplitude
- Pause
The variation of these parameters for a given
dialect depends on sentence, clause, phrase
boundaries, the position of the word, position
and nature of the syllables The study aims to
capture only the details of nucleus vowel
duration from a large corpus of spoken sentences
as we believe that this gives better control for
introduce prosody in synthesized speech.
3Objective
- The main objective of the study is to find out
the role of the nucleus vowel duration in the
prosody of Standard Colloquial Bengali (Bangla). - The other objective is to study whether nucleus
vowel duration is related to the phonemic value,
syllable type and the position of the syllable in
the word as well as in the clause in Bangla. - These studies are done in the context of
introducing naturalness in synthesized speech.
4Types of syllable
- Syllables are of two types open and closed. If
the syllable ends in a vowel, it is called an
open syllable (cv/v) and if it ends in a
consonant, it is a closed syllable (cvc/vc),
where v stands for vowels as well as diphthongs
and c stands for consonants and semivowels.
/ k?k /
/ ?m /
/ o /
/ k? /
Open Syllable (CV)
Closed Syllable (CVC)
5Nucleus vowel
/ k?k /
Nucleus vowel duration
Nucleus vowel is defined as the steady state of
the vowel along with the two transitions as in
figure. There are seven vowels in Standard
Colloquial Bangla (SCB) /u/, /o/, /?/, /?/, /æ/,
/e/, /i/ (in Bangla )
6Experimental Data Set
- All together 650 sentences spoken by seven
informant of both sexes of native SCB are used
for this study. - The above data are taken from Bangla Speech
corpora of C-DAC, Kolkata. - The data consists of 10185 clauses , 30702 words
and 58807 syllables .
Meta data of the informants -
7Experimental Findings and Results
Distribution of Open and Closed Syllables
8Nucleus vowel duration with respect to syllable
position in a word
9Nucleus vowel duration for each open syllable
with respect to word position in a clause
10Nucleus vowel duration for each closed syllable
with respect to word position in a clause
11Intrinsic Vowel Duration
There exists durational differences for different
vowels due to its articulatory differences
irrespective of its position and context
12Nucleus vowel duration rules for synthesis
Based on the aforesaid studies a set of rules has
been formed to give durational variation akin to
natural sounding speech to the synthesized output.
Rules
- Take intrinsic vowel duration multiply by 0.99 or
1.01 if the syllable is closed or open
respectively. - Multiply by respective ratio given in Table
according to the position of the first syllable
of the word with respect to its position in the
clause. - For other syllables multiplication factor is 1.
Ratio of nucleus vowel to intrinsic vowel duration
13Result of Listening Test
- 15 sentences stimuli were created by replacing
the nucleus vowel duration of the original
sentences by the derived duration rule. The
original and modified sentences are randomly
mixed up and presented to the subject for
judgment in 5 value score. - For this evaluation 5 subjects, 3 male and 2
female, is selected, their age ranged from 24 to
50.
14Result of Listening Test
The total average score for the original
sentences is 3.56 and the modified sentence is
2.94. The average grade difference of less than
one grade is encouraging.
15Conclusions
- Duration of the nucleus vowel of the 1st syllable
of consecutive words in the clause decreases. For
other syllables no such trend is observable. - Length of the nucleus vowel of the first
syllable is always longer, irrespective of the
syllable type. - It is interesting to note that the occurrence of
the open syllable is more than two times that of
the closed in Bangla. - The average vowel duration of closed syllable is
greater than that of the open syllable. - From the study it is observed that the duration
of high vowels is lesser than the low vowels
which may be due to some physiological reasons. - Finally, using the duration rule, TTS output
becomes-
Flat Synthesized
Duration Modified Synthesized
16References
- Wen-Hsing Lai Sin-Horng Chen, 2001, A novel
syllable duration modeling approach for Mandarin
speech, Proceedings of Acoustics, Speech, and
Signal Processing, 2001, Volume 1, pp 93 - 96
vol.1. - Uwe D. Reichel, Data-driven Extraction of
Intonation Contour Classes, pp 240-245, 6th ISCA
Workshop on Speech Synthesis, Germany, 2007. - Rao K.Sreenivasa and Yegnanarayana B., Modelling
Syllable Duration in Indian Languages using
Neural Networks, pp 313-315, ICASSP, 2004. - Crystal David, A Dictionary of Linguistics
Phonetics, Fifth Edition, pp 326, Blackwell
Publishing, 2003. - Chatterji Suniti Kumar, The Origin and
Development of the Bengali Language, pp 402and pp
279 paragraph 3, 3rd impression, 2002. - Speech Corpora, CDAC, Kolkata, http//www.kolkatac
dac.in/html/txttospeeh/corpora/corpora_main/first.
htm. - Shyamal Kumar Das Mandal, Datta, Asoke Kumar,
2007, Epoch synchronous non-overlap-add (ESNOLA)
method-based concatenative speech synthesis
system for Bangla, In Proceeding SSW6-2007, pp
351-355.
17Thank you