Speaker Recognition Research in Joensuu - PowerPoint PPT Presentation

About This Presentation
Title:

Speaker Recognition Research in Joensuu

Description:

Software 3: Symbian. Port to Symbian OS with Series 60 UI platform. University of Joensuu ... Machine room. CAN. Ethernet. TCP/IP. Microphone. Display. OPC ... – PowerPoint PPT presentation

Number of Views:166
Avg rating:3.0/5.0
Slides: 39
Provided by: csJoe
Category:

less

Transcript and Presenter's Notes

Title: Speaker Recognition Research in Joensuu


1
Speaker Recognition Research in Joensuu
Puheteknologian talviseminaari
Pasi Fränti
Joensuu 10.3.2006
  • Speech and Image Processing Unit (SIPU)
    http//cs.joensuu.fi/sipu/

2
Goals for PUMS season 3 (1/2)
  1. Usability of automatic speaker identification in
    forensic applications
  2. Compatibility with large databases
  3. Automatization of LTAS fusion with MFCC.
  4. Voice activity detection

3
Goals for PUMS season 3 (2/2)
  1. Speaker verification in real (noisy) environment
  2. Prototype for access control
  3. Solving technical requirements for prototype in
    elevator.
  4. Usability for detecting sound sources in general
  5. Key word search (using HTK or Lingsoft
    Recognizer)

4
PUMS personnel
Pasi Fränti Professor
Ilja Sidoroff
Marko Tuononen, BSc
Rosa Gonzalez-Hautamäki, MSc
Doctoral researchers
Collaborators
Juhani Saastamoinen, PhLic
Ismo Kärkkäinen, MSc
Ville Hautamäki, MSc
Tomi Kinnunen, PhD (Singapore)
Victoria Yanulevskaya
Evgeny Karpov, MSc (NRC)
5
1. Applicability to forensic applications
  • Automatic speaker recognition study has been
    done.
  • Results are not reported but actions taken within
    tasks 3 and 4.
  • Material can be found in Kinnunens PhD thesis
    4 and Niemi-Laitinens presentation.

6
2. Support for large databases
  • - Not yet done -

7
3. LTAS and other features
  • Automatic calculation of LTAS done. Integration
    to WinSprofiler in progress. Reporting in
    progress.
  • Benefit of LTAS is merely its speed and ease of
    use no difficult control parameters.
  • No additional benefit to recognition accuracy.
    MFCC includes the same information.
  • Could be used for preliminary pruning in case of
    large datasets.

8
Noise robustness of F0 feature
Results reported in 3, 5
9
4. Voice activity detection
  • Software for speech segmentation (VoiceGrep).
  • Command line version for Linux.
  • Windows version in WinSprofiler.
  • Testing done in SIPU laboratory.
  • Labtec pc mic 333, 44,1 kHz
  • Recordings were emphasized 24 dB by Audacity
    voice editor

10
4a. Test material and results
  • Material
  • 4 hours in total.
  • Bad quality recordings 11 bits data, of which
    4-5 informatio, and the rest noise.
  • VoiceGrep made 168 detections
  • 56 speech (33)
  • 112 non-speech (67)
  • Material included 71 real speech segments
  • Average segment length 16 s.
  • VoiceGrep found 25 of these (35 )

11
4b. VoiceGrep overall results
12
4c. VoiceGrep example(Correct detection)
End of the speech is missed
Start of the speech is detected correctly
Play sample 1
13
4d. VoiceGrep example(false detections)
Door opening
Running water
Walking
Door
Play sample 2
Play sample 3
14
4e. VoiceGrep example(missed speech segment)
Door
Door
Speech and walking
Play sample 4
15
4f. Entire data set(4 hours)
Data
Speech segments
Result of VoiceGrep
16
5. Speaker verification in noisy environment
  • Systematic testing of the effective parameters
    has been reported in 1.
  • Applicability of speaker verification in real
    environment has been reported in 2 and in
    Kinnunens PhD thesis 5.
  • Additional testing will be done if enough time.

17
5a. Text-dependent verificationin access control
  • Utilizing time series information improves
    recognition.
  • Best result if everyone has their own password.

18
6. Prototype for access control
Emergency button
Microphone
Motion detector
19
7. Calling elevator(technical requirements)
  • Communication with OPC-server
  • Implemented with Matrikon server.
  • Program logic to elevator implemented
  • Reads variables from OPC-server.
  • Interprets and shows elevator status.
  • Includes recording logic.
  • Speaker and voice related stuff
  • Not yet implemented.
  • Main window does not show anything yet.

20
8. Usability for detecting sound sources in
general
  • - Not yet done -

21
9. Keyword search
  • - Not yet done -

22
Publications (season 3)
  1. J. Saastamoinen, Z. Fiedler, T. Kinnunen and
    P. Fränti, "On factors affecting MFCC-based
    speaker recognition accuracy", Int. Conf. on
    Speech and Computer (SPECOM'05), Patras, Greece,
    503-506, October 2005.
  2. H. Gupta, V. Hautamäki, T. Kinnunen and
    P. Fränti, "Field evaluation of text-dependent
    speaker recognition in an access control
    application", Int. Conf. on Speech and Computer
    (SPECOM'05), Patras, Greece, 551-554, October
    2005.
  3. T. Kinnunen, R. Gonzalez-Hautamäki, "Long-Term F0
    Modeling for Text-Independent Speaker
    Recognition" Int. Conf. on Speech and Computer
    (SPECOM'05), Patras, Greece, 567-570, October
    2005.

23
Theses (season 3)Opinnäytetyöt
  1. T. Kinnunen, "Optimizing Spectral Feature Based
    Text Independent Speaker Recognition, PhD
    thesis, University of Joensuu, June 2005.
  2. R. Gonzalez-Hautamäki, "Fundamental Frequency
    Estimation and Modeling for Speaker Recognition,
    MSc thesis, University of Joensuu, July 2005.

24
Applications scenarios
Speaker Recognition
Speaker Verification
Speaker Identification
Whose voice is this?
Is this Bobs voice?
?

(Claim)
Identification
Verification
Imposter!
25
Software 1 Console program
26
Software 2 WinSprofiler
27
Software 3 Symbian
Port to Symbian OS with Series 60 UI platform
28
Software 4 Door SProfiler
Opening laboratory door by speaking
29
Software 5 Lift SProfiler(to appear in season 4
perhaps)
30
Future development (1)
Software integration
Keyword search
WinSprofilerWindows (JoY)MobileSeries 60 (JoY)
DBsupport
SRLIB
VAD
MSE
F0 extractionfusion by weighted MSE
GMM
VQ
MFCC
LTAS
31
Future development (2)
Applications
Call center
Forensic applications
Calling elevator
Speech analyzer tool
Access control
common speaker recognition app. interface
Verification
Classifier fusion
Segmentation
Keyword search
srlib
VAD
DB
32
Future development (3)
Technical development
  • Implement and integrate F0, maybe also other
    formants (F1, F2).
  • Automatic voiced/unvoiced segmentation.
  • User enrollment.
  • Use of sequence information (triplets).
  • Development of WinSprofiler software to the
    direction of voice profiler and speech analyzer
    tool!

33
Future development (4)
Machine room
Lift car hardware
CAN
GW box
EthernetTCP/IP
Display
Microphone
Our PC
Approach detection
OPC server
SRLIB 3.0
DCOM
Elevator prototype
OPC client
LiftCaller
34
Vision 1 Teleconferencing
Speaker Recognition
Unkonwn
Minna
Bob
35
Vision 2 Call-center
  • Speech is the main tool for people in
    call-center
  • Voice login of personell
  • Removes the need for manual entry

36
Vision 3 Language recognition
  • Related problem to speaker recognition the same
    research groups usually study both problems.
  • Not trivial to solve.
  • Studied a lot for Asian languages, even for rare
    languages that do not have any written form.

37
Vision 4 Medical applications
  • Doctor use voice to record summary of patient
    meetings.
  • Access by keyword search.
  • Annotation.
  • Authentication of speaker.

38
Thank for you patience!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com