Speaker Recognition Research in Joensuu

About This Presentation

Title:

Speaker Recognition Research in Joensuu

Description:

Software 3: Symbian. Port to Symbian OS with Series 60 UI platform. University of Joensuu ... Machine room. CAN. Ethernet. TCP/IP. Microphone. Display. OPC ... – PowerPoint PPT presentation

Number of Views:166

Avg rating:3.0/5.0

Slides: 39

Provided by: csJoe

Category:

more less

Transcript and Presenter's Notes

Title: Speaker Recognition Research in Joensuu

1
Speaker Recognition Research in Joensuu
Puheteknologian talviseminaari
Pasi Fränti
Joensuu 10.3.2006

Speech and Image Processing Unit (SIPU)
http//cs.joensuu.fi/sipu/

2
Goals for PUMS season 3 (1/2)

Usability of automatic speaker identification in
forensic applications
Compatibility with large databases
Automatization of LTAS fusion with MFCC.
Voice activity detection

3
Goals for PUMS season 3 (2/2)

Speaker verification in real (noisy) environment
Prototype for access control
Solving technical requirements for prototype in
elevator.
Usability for detecting sound sources in general
Key word search (using HTK or Lingsoft
Recognizer)

4
PUMS personnel
Pasi Fränti Professor
Ilja Sidoroff
Marko Tuononen, BSc
Rosa Gonzalez-Hautamäki, MSc
Doctoral researchers
Collaborators
Juhani Saastamoinen, PhLic
Ismo Kärkkäinen, MSc
Ville Hautamäki, MSc
Tomi Kinnunen, PhD (Singapore)
Victoria Yanulevskaya
Evgeny Karpov, MSc (NRC)
5
1. Applicability to forensic applications

Automatic speaker recognition study has been
done.
Results are not reported but actions taken within
tasks 3 and 4.
Material can be found in Kinnunens PhD thesis
4 and Niemi-Laitinens presentation.

6
2. Support for large databases

- Not yet done -

7
3. LTAS and other features

Automatic calculation of LTAS done. Integration
to WinSprofiler in progress. Reporting in
progress.
Benefit of LTAS is merely its speed and ease of
use no difficult control parameters.
No additional benefit to recognition accuracy.
MFCC includes the same information.
Could be used for preliminary pruning in case of
large datasets.

8
Noise robustness of F0 feature
Results reported in 3, 5
9
4. Voice activity detection

Software for speech segmentation (VoiceGrep).
Command line version for Linux.
Windows version in WinSprofiler.
Testing done in SIPU laboratory.
Labtec pc mic 333, 44,1 kHz
Recordings were emphasized 24 dB by Audacity
voice editor

10
4a. Test material and results

Material
4 hours in total.
Bad quality recordings 11 bits data, of which
4-5 informatio, and the rest noise.
VoiceGrep made 168 detections
56 speech (33)
112 non-speech (67)
Material included 71 real speech segments
Average segment length 16 s.
VoiceGrep found 25 of these (35 )

11
4b. VoiceGrep overall results
12
4c. VoiceGrep example(Correct detection)
End of the speech is missed
Start of the speech is detected correctly
Play sample 1
13
4d. VoiceGrep example(false detections)
Door opening
Running water
Walking
Door
Play sample 2
Play sample 3
14
4e. VoiceGrep example(missed speech segment)
Door
Door
Speech and walking
Play sample 4
15
4f. Entire data set(4 hours)
Data
Speech segments
Result of VoiceGrep
16
5. Speaker verification in noisy environment

Systematic testing of the effective parameters
has been reported in 1.
Applicability of speaker verification in real
environment has been reported in 2 and in
Kinnunens PhD thesis 5.
Additional testing will be done if enough time.

17
5a. Text-dependent verificationin access control

Utilizing time series information improves
recognition.
Best result if everyone has their own password.

18
6. Prototype for access control
Emergency button
Microphone
Motion detector
19
7. Calling elevator(technical requirements)

Communication with OPC-server
Implemented with Matrikon server.
Program logic to elevator implemented
Reads variables from OPC-server.
Interprets and shows elevator status.
Includes recording logic.
Speaker and voice related stuff
Not yet implemented.
Main window does not show anything yet.

20
8. Usability for detecting sound sources in
general

- Not yet done -

21
9. Keyword search

- Not yet done -

22
Publications (season 3)

J. Saastamoinen, Z. Fiedler, T. Kinnunen and
P. Fränti, "On factors affecting MFCC-based
speaker recognition accuracy", Int. Conf. on
Speech and Computer (SPECOM'05), Patras, Greece,
503-506, October 2005.
H. Gupta, V. Hautamäki, T. Kinnunen and
P. Fränti, "Field evaluation of text-dependent
speaker recognition in an access control
application", Int. Conf. on Speech and Computer
(SPECOM'05), Patras, Greece, 551-554, October
2005.
T. Kinnunen, R. Gonzalez-Hautamäki, "Long-Term F0
Modeling for Text-Independent Speaker
Recognition" Int. Conf. on Speech and Computer
(SPECOM'05), Patras, Greece, 567-570, October
2005.

23
Theses (season 3)Opinnäytetyöt

T. Kinnunen, "Optimizing Spectral Feature Based
Text Independent Speaker Recognition, PhD
thesis, University of Joensuu, June 2005.
R. Gonzalez-Hautamäki, "Fundamental Frequency
Estimation and Modeling for Speaker Recognition,
MSc thesis, University of Joensuu, July 2005.

24
Applications scenarios
Speaker Recognition
Speaker Verification
Speaker Identification
Whose voice is this?
Is this Bobs voice?
?

(Claim)
Identification
Verification
Imposter!
25
Software 1 Console program
26
Software 2 WinSprofiler
27
Software 3 Symbian
Port to Symbian OS with Series 60 UI platform
28
Software 4 Door SProfiler
Opening laboratory door by speaking
29
Software 5 Lift SProfiler(to appear in season 4
perhaps)
30
Future development (1)
Software integration
Keyword search
WinSprofilerWindows (JoY)MobileSeries 60 (JoY)
DBsupport
SRLIB
VAD
MSE
F0 extractionfusion by weighted MSE
GMM
VQ
MFCC
LTAS
31
Future development (2)
Applications
Call center
Forensic applications
Calling elevator
Speech analyzer tool
Access control
common speaker recognition app. interface
Verification
Classifier fusion
Segmentation
Keyword search
srlib
VAD
DB
32
Future development (3)
Technical development

Implement and integrate F0, maybe also other
formants (F1, F2).
Automatic voiced/unvoiced segmentation.
User enrollment.
Use of sequence information (triplets).
Development of WinSprofiler software to the
direction of voice profiler and speech analyzer
tool!

33
Future development (4)
Machine room
Lift car hardware
CAN
GW box
EthernetTCP/IP
Display
Microphone
Our PC
Approach detection
OPC server
SRLIB 3.0
DCOM
Elevator prototype
OPC client
LiftCaller
34
Vision 1 Teleconferencing
Speaker Recognition
Unkonwn
Minna
Bob
35
Vision 2 Call-center

Speech is the main tool for people in
call-center
Voice login of personell
Removes the need for manual entry

36
Vision 3 Language recognition

Related problem to speaker recognition the same
research groups usually study both problems.
Not trivial to solve.
Studied a lot for Asian languages, even for rare
languages that do not have any written form.

37
Vision 4 Medical applications