Title: RADIOLOGICAL DICTATION
1RADIOLOGICAL DICTATION
- Master Thesis Study
- By
- Ali Iskurt
2INTRODUCTION
Problem Definition
- Turkish Biomedical Corpus for Dictation
- Lack of Radiological Triphones for Enhancement in
speech recognition - Lack of tools to process data and transcripts
3APPLICATIONS
- dictate patient information into predefined
templates - print a letter, fax a prescription to a
pharmacist - electronically transmit a patient's report to a
referring physician - over 25,000 medical terms
- extensive vocabularies
4Extensive Vocabularies allow to dictate
- patient history
- patient symptoms and complaints
- general examination findings
- radiology, pathology, and laboratory test results
- course of action taken
- follow-up therapy
5Multi-Source Neural Networks for Speech
RecognitionRoberto GEMELLO, Dario ALBESANO,
Franco MANA, Loreta MOISAIEEE-2000
A Review Of Recent Results
1. From Transactions
6CSELT Hybrid HMM/Neural Networks Technology for
Continuos Speech Recognition Roberto GEMELLO,
Dario ALBESANO, Franco MANA IEEE-2000
7A Review Of Recent Results
1. From Market
- Dragon NAturally Speaking
- IBM
- Offer
- Speech recognition with an accuracy performance
of about 95-98 - Specialties that involve corpus from allergy,
gastroenterology, general medicine, pedagogy,
pediatrics, psychiatry, radiology, urology,... - voice recognition software for several
disabilities such as RSIs (repetitive strain
injuries) and muscle disabilities
8(No Transcript)
9THEORY
- Processing speech data
- Choosing triphones
- Labeling initial boot-strap files
- Training HMMs
- Usage of HMMs as recognizer
- Analysis of Test results
10Processing Stages
11 SUB-WORD BASED RECOGNITION
- HTK Results Analysis
- Date Fri Oct 01 161445 2002
- Ref testrefs.mlf
- Rec recout.mlf
- ------------------------ Overall Results
-----------------
WORD Corr99.77, Acc99.65 H853, D1,
S1, I1, N855
12DATABASE STRATEGY
- Test Database
- Consists of 12 people ( 7 of them did not take
role in training ) - Small Database ( 125 MB )
- Consists of 15 doctors
- Large Database ( 470 MB )
- Consists of 40 normal speakers
13DATABASE STRATEGY
- 8 hours of speech (500 MB)
- Radiologists, personnel in MR centers add upto 15
people. Number of normal speakers is 40 - 10 of data is for testing purpose
- The texts read will be in database as
transcriptions so as to compare with recognized
words.Number of these reports equals to 45. - Each speaker have read 5 or 6 reports lasting
nearly 10 minutes . - A survey on speakers
14(No Transcript)
15(No Transcript)
16REPORTS
OF WORDS IN REPORTS 5531 OF WORDS READ
34341 OF DIFFERENT WORDS 1747 OF
TRIPHONES PRODUCED 1132
17Transcript Statistics
BALANCE.EXE
DIFFERENTWORD.EXE
18How to record and digitize sound ?
- A tape for recording sound
- Sound card of PCs to digitize
- Parametrization of sound (MFCC)
19Advantages of radiologists
- eager to repeat a job that they used to.
- familiar with the technology and open-minded
- interested in recognition study
- voluntary to help in collecting Turkish
biomedical database - faced with dictation tool previously in seminars
given abroad of Turkey - plenty of recorded diagnostic reports about
different patients making it possible to get
transcriptions
20 Work sheet
21 TRAININGPreparation of speech data
- Digitizing sound waves
- WAV files are all made smaller than 700 KB
- Sounds are recorded in 16000Hz and 8-bit mono
unsigned format - Preparation of sound files
- Hcopy.scp
- HCopy
- Pre-preparation before training process
- HSlab labeling manually
- TrainFiles.scp
- HInit a,b,c,C...
- Listening sound waves and correcting texts
- Monophone and triphone label files of Trainfiles
by ANA.EXE -
22A Sample HCOpy.scp file
- ......................
- c\tez\data\EGITIM\WAV\baris_huseyin_00.wav
c\tez\data\EGITIM\MFCC\baris_huseyin_00.mfc - c\tez\data\EGITIM\WAV\baris_huseyin_01.wav
c\tez\data\EGITIM\MFCC\baris_huseyin_01.mfc - c\tez\data\EGITIM\WAV\baris_huseyin_03.wav
c\tez\data\EGITIM\MFCC\baris_huseyin_03.mfc - c\tez\data\EGITIM\WAV\baris_mustafa_00.wav
c\tez\data\EGITIM\MFCC\baris_mustafa_00.mfc - c\tez\data\EGITIM\WAV\baris_mustafa_01.wav
c\tez\data\EGITIM\MFCC\baris_mustafa_01.mfc - c\tez\data\EGITIM\WAV\baris_mustafa_02.wav
c\tez\data\EGITIM\MFCC\baris_mustafa_02.mfc - c\tez\data\EGITIM\WAV\baris_mustafa_03.wav
c\tez\data\EGITIM\MFCC\baris_mustafa_03.mfc - c\tez\data\EGITIM\WAV\baris_rahmi_00.wav
c\tez\data\EGITIM\MFCC\baris_rahmi_00.mfc - c\tez\data\EGITIM\WAV\baris_rahmi_01.wav
c\tez\data\EGITIM\MFCC\baris_rahmi_01.mfc - c\tez\data\EGITIM\WAV\baris_sahiner_00.wav
c\tez\data\EGITIM\MFCC\baris_sahiner_00.mfc - c\tez\data\EGITIM\WAV\baris_sahiner_01.wav
c\tez\data\EGITIM\MFCC\baris_sahiner_01.mfc - c\tez\data\EGITIM\WAV\baris_sahiner_02.wav
c\tez\data\EGITIM\MFCC\baris_sahiner_02.mfc - c\tez\data\EGITIM\WAV\baris_sahiner_03.wav
c\tez\data\EGITIM\MFCC\baris_sahiner_03.mfc - c\tez\data\EGITIM\WAV\baris_sahiner_04.wav...
- ...............
- Up to 960 Files
23A sample script file that involves addresses of
TrainFiles
- .....................
- c\tez\data\EGITIM\MFCC\baris_huseyin_00.mfc
- c\tez\data\EGITIM\MFCC\baris_huseyin_01.mfc
- c\tez\data\EGITIM\MFCC\baris_huseyin_03.mfc
- c\tez\data\EGITIM\MFCC\baris_mustafa_00.mfc
- c\tez\data\EGITIM\MFCC\baris_mustafa_01.mfc
- c\tez\data\EGITIM\MFCC\baris_mustafa_02.mfc
- c\tez\data\EGITIM\MFCC\baris_mustafa_03.mfc
- c\tez\data\EGITIM\MFCC\baris_rahmi_00.mfc
- c\tez\data\EGITIM\MFCC\baris_rahmi_01.mfc
- c\tez\data\EGITIM\MFCC\baris_sahiner_00.mfc
- c\tez\data\EGITIM\MFCC\baris_sahiner_01.mfc
- c\tez\data\EGITIM\MFCC\baris_sahiner_02.mfc
- c\tez\data\EGITIM\MFCC\baris_sahiner_03.mfc
- c\tez\data\EGITIM\MFCC\baris_sahiner_04.mfc
- c\tez\data\EGITIM\MFCC\baris_sahiner_05.mfc
- c\tez\data\EGITIM\MFCC\baris_yasar_00.mfc
- c\tez\data\EGITIM\MFCC\baris_yasar_01.mfc
- c\tez\data\EGITIM\MFCC\baris_yusuf_00.mfc
24Training StepsHTK Instructions Used
25First HMMs
- Monophones
- a, b, c, C, d,...z, Z
- C, G, I, O, S, U special to Turkish
- InitA, InitB
- Triphones
- From 2 1001 triphones
- From radyoloji.exe 1132 triphones
26BIO-TRIPHONE STRATEGY
RADYOLOJI.EXE
WORDS FROM REPORTS
LIST OF TRANSCRIPTS
- TRIPHONES OF TEXT FILES
- MONOPHONES OF TEXT FILES
ANA.EXE
27Reports Used For Radiological triphones
28HMMs PRODUCED
29TESTS
- Preparation before recognition
- Form grammar file from the list of words in test
files - Use Hparse for wordnet file
- Make a dictionary
- Use Selectwords or Differentword
- to avoid repetitions
- Use Uclusesdonusumu.cpp to add Z ( silence )
- as suffix and prefix to words
- Use UClusesdonusumu for triphones expansion of
words - Make a Master Label File of Test files for
analysis
30Information About Test files
- Phonetic balance Statistics
- Insert.cpp (75 rows of words maximum)
- Balance.cpp (30 phones and their frequencies)
- Different.cpp (different words in testing files )
-
- TestFiles.scp
- Files used in training parts
- Files only for Testing purposes
31Typical Test Files
- A Test list with 5 files
- c\tez\data\TEST\MFCC\candan_fatma_00.mfc
- c\tez\data\TEST\MFCC\candan_volkan_02.mfc
- c\tez\data\TEST\MFCC\erdogan_kursat_07.mfc
- c\tez\data\TEST\MFCC\baris_huseyin_03.mfc
- c\tez\data\TEST\MFCC\erdogan_berkan_03.mfc
- A Test list with 3 files
- c\tez\data\TEST\MFCC\memduh_sahiner_03.mfc
- c\tez\data\TEST\MFCC\hakan_berkan_00.mfc
- c\tez\data\TEST\MFCC\fulden_tugce_00.mfc
- A Test list with 7 files
- Memduh_sahiner_03
- Memduh_sahiner_04
- Memduh_sule_03
- Musevi_kursat_09
- Musevi_ tugce_05
- Zeynep_berkan_04
- Zeynep_kursat_05
32Test Files
Files only for Testing Purposes
- 10 files from only 1 speaker
- 4 files from 4 speakers
- 3 files from only 1 speaker
- Files for future grouping
33RECOGNITION RESULTS
3410 TEST FILES
- c\tez\data\TEST\MFCC\cemil_baris_00.mfc
- c\tez\data\TEST\MFCC\cemil_baris_01.mfc
- c\tez\data\TEST\MFCC\cemil_baris_02.mfc
- c\tez\data\TEST\MFCC\cemil_naile_00.mfc
- c\tez\data\TEST\MFCC\cemil_naile_01.mfc
- c\tez\data\TEST\MFCC\cemil_naile_02.mfc
- c\tez\data\TEST\MFCC\cemil_naile_03.mfc
- c\tez\data\TEST\MFCC\cemil_mumtaz_00.mfc
- c\tez\data\TEST\MFCC\cemil_mumtaz_01.mfc
- c\tez\data\TEST\MFCC\cemil_mumtaz_02.mfc
35TEXT OF TEST FILES
normal normaldir olabilecek olarak olup
operasyona oran oranI organlar orta osseOz
pankIreas paraaortik parankim parankiminde
parankimleri pararektal paravezikal
paternindedir patolojik pelviste
penceresinden perikardiyal peripankIreatik
peritoneal pilanda pilanlar pilevral
pilevraparankimal renal rent retro
rezolUsyonlu
Number of Words in 10 test files 324
tIrake tIrakea tIraseleri tabi tabidir
takip tanI teSekkUr teknik tire toraks
uyumlu vaskUler ve veya virgUl yIllIk
yUksek yaGlI yapIlar yapIlarI yapIlarIn
yapIlarda yoGunluGundadIr yok yoktur
yollarInIn yumuSak
hemoptezi hikayesi homojen iki iliak imajI
imajlarda inceleme incelemede infiltrasyon
inguinal intIrahepatik intensite itrah
izlenmedi izlenmektedir izlenmemektedir
izlenmemiStir kIlinik kIrk kalInlIGI
kalibrasyonlarI kalp kalsifikasyonlar
karaciGer kardiofirenik kardiotorasik kateo
kemik kesesi kist
Number of Different words 217
sInIrlarda sInIrlardadIr saG safra saptanmadI
satIrbaSI saygI seCilmemiStir segmentte
sekonder sel serbest serbesttir sigara
simetrik sinUsler sinyal sol sonuC sonuc
soyadI
kist kitle kitlesel kontIrast konturlarI
konum koronal kostofirenik lap lezyon
lezyonla lob lojunda lokalizasyonda mUmtaz
mayi mediyasten mediyastinal mesane
metastatik mevcuttur milimetrik minimal
morfolojik muntazam naile nedeniyle nefes
nokta noktasal
CekmemiStir Cevre OksUrUk On OykUsU
Ozefagusa Ureterler Ust UstUste aCIktIr
aGIrlIklI abdomen abdominal adI ait akIm
akIma akciGer aksiyel aktif alInan alanIna
alanlarInda alt ana apikal
bIronSlarIn bObrek balgam barIS batInda
belirgin bete bilgi bitti boyutlarda
boyutlu boyutta bulgusu cea dUzenli dalak
dansite dansiteler darlIGI deGerleri
deGiSikliGi diGer diafIragmatik dikkat
doGal doGaldIr dokular dokularI dolum duvar
eGrilikler edilen efUzyon emar etyolojisi
fibrotik fokal fonksiyonlarI gOrUlmedi
gOrUnUm gOrUnUmU gOrUntU gOstermektedir
gOzlenmemiStir genital giren hastada haste
hattadIr
36DICTIONARY
On Z-On O-nc O-nc OykUsU
Z-Or O-yl i-ki k-UC U-st k-UC k-UC
Ozefagusa Z-Oz O-ze z-er Z-fa
k-ab y-gu m-uS n-sa k-aC k-aC Ureterler
Z-Uz U-re r-et e-te t-er e-rl
r-le l-er e-rC e-rC Ust Z-Us
U-st s-ti s-ti UstUste Z-Us
U-st s-ti k-UC U-st s-te t-es t-es
aCIktIr Z-aC a-CI C-Ik i-kt
p-tI t-Ir e-rC e-rC aGIrlIklI
Z-aG a-GI G-Ir I-rl r-lI l-Ik I-kl k-lI
l-IS l-IS abdomen Z-at a-bi
Z-do d-ol a-me m-en e-nt e-nt abdominal
Z-at a-bi Z-do d-ol l-mi m-in
i-na m-al a-lk a-lk adI Z-ad
a-dI d-IS d-IS ait Z-ay b-it
i-ti i-ti ....... ....... ...... ...... yapIlar
da Z-ya y-ap a-pI p-Il I-la
l-ar a-rd r-da k-aC k-aC yoGunluGundadIr
Z-yo y-ol o-Gu G-un u-nl u-lu
d-uG u-Gu G-un u-nd n-da k-ad a-dI d-Ir
e-rC yok Z-yo y-ok o-ku
o-ku yoktur Z-yo y-ok
a-kt k-ta t-ur u-rt u-rt yollarInIn
Z-yo y-ol o-lm l-la l-ar a-rI
r-In I-nI n-In I-nd I-nd yumuSak
Z-yu y-um u-mu m-uS u-Sa c-ak
a-kt a-kt
37Master Label File of Test Files
- !MLF!
- "c/tez/data/TEST/MFCC/cemil_baris_00.lab"
- adI
- soyadI
- barIS
- kIlinik
- bilgi
- hemoptezi
- kIrk
- yIllIk
- .....
- ..................
- .........
- vaskUler
- yapIlarda
- akIm
- mevcuttur
- .
- "c/tez/data/TEST/MFCC/cemil_mumtaz_02.lab"
38COMMANDS
- Hvite -H C\tez\BIO\Gauss3\HMM13\newMacro
s.hmm -S C\tez\TEST\10Cemiltest\testFiles.txt
- -i C\tez\out\BIO_gauss3_H
MM13_p55.rec p -55 - -w C\tez\TEST\10Cemiltest\keys.net
- C\tez\TEST\10Cemiltest
\\test10dict.dict C\tez\ktp\triphlist.lst
39Out.Rec versus MLF
- !MLF!
- "c/tez/data/TEST/MFCC/cemil_baris_00.rec"
- 0 13700000 kontIrast -6867.796875
- 13700000 19200000 soyadI -3000.899170
- 19200000 29200000 patolojik -5492.641113
- 29200000 39500000 kIlinik -5350.202637
- 39500000 46600000 bilgi -4045.308350
- 46600000 57900000 hemoptezi -6250.079590
- 57900000 66600000 kalp -4735.215820
- 66600000 74800000 yIllIk -4527.207031
- 74800000 81700000 sigara -4069.599121
- 81700000 91900000 OykUsU -5629.986328
- 91900000 97600000 On -2992.566650
- 97600000 107000000 tanI -4657.532227
- 107000000 123400000 hemoptezi -8549.015625
- 123400000 136300000 etyolojisi -7034.609375
- 136300000 141800000 yUksek -3253.933594
- 141800000 155400000 rezolUsyonlu -7363.734863
- 155400000 166500000 toraks -5850.388184
!MLF! "c/tez/data/TEST/MFCC/cemil_baris_00.lab"
adI soyadI barIS kIlinik bilgi hemoptezi kIrk yIl
lIk sigara OykUsU On tanI hemoptezi etyolojisi yUk
sek rezolUsyonlu toraks bete inceleme saG akciGer
Ust lob apikal segmentte minimal pilevraparankimal
. "c/tez/data/TEST/MFCC/cemil_baris_01.lab" fibr
otik dansiteler izlenmektedir
40HREsults I C\tez\kelimealtalta\test5dosya.mlf
c\tez\liste\.triphonelist
C\tez\out\out21haziran_01.rec
HTK Results Analysis
Date Wed Sep 11
155003 2002 Ref C\tez\TEST\10Cemiltest\tes
10mlft.mlf Rec C\tez\out\BIO_gauss3_HMM13_p5
5.rec ------------------------ Overall Results
-------------------------- WORD Corr94.44,
Acc93.21 H306, D1, S17, I4, N324
Percent Correct N - D - S x 100 N
Percent Accuracy N D S - I x 100
N
414 Test files from 4 different People
"c/tez/data/TEST/MFCC/aliiskurt_abduRRezzak_00.re
c 53 Words "c/tez/data/TEST/MFC
C/eyyub_turan_00.rec 48 words "c/tez/data/TEST
/MFCC/ismailbican_aziz_00.rec" 71 Words "c/tez/d
ata/TEST/MFCC/ismailbican_naile.rec" 146 words T
otal Number Of Words 320 Number Of
Different Words 217
SCORE OF 4 Test Files
- BIO test4
- 3 GAuss HMM13
- -p -70
- WORD Corr83.28, Acc76.39 H254, D6, S45,
I21, N305 - -p -55
- WORD Corr83.28, Acc75.08 H254, D5, S46,
I25, N305
42BIO Best Test Results
431001 Best Test Results
44ANALYSIS
1 Essentials Of Analysis 2 Explanation Of
Tables 3 Enhancement By P Coefficient 4
Enhancement By Larger Database 5 Saturation Of
Trained Models 5.1 Case Of Large Database 5.2
Case Of Small Database 6 Enhancement By
Radiological Triphones
453. Enhancement by P
- Effects of P value Correctness versus Accuracy.
HMM-13 with 3 Gaussian model is preferred. X-axis
shows p values. When p equals -120 Correctness
and Accuracy curves come closer at an acceptable
high point of 70
46 Determining best choice for P
Determining which p value gives the best
performance of Correctness. If a tip is seen with
two sides having lower percentages, then
fine-tuning must be done in that small range. The
best choice is p -15
47 Model Change affects P
Determining which p value gives the best
performance of Correctness. If a tip is seen
with two sides having lower percentages, then
fine-tuning must be done in that small range.
The best choice is p -15
Determining which p value gives best performance
for HMM-10. Model change forces p value to
change from -15 to -30 to give the best result
48Test Group Change affects P
Correctness results for 3 different groups of
test files on HMM-10
494. Enhancement By Larger Database
Best performances of Correctness chosen from
HMM-9, HMM-10, HMM-11, HMM-12 and HMM-13 for both
Large and Small Databases. Five test files are
used for all tests here. X-axis is an ordered
list of models from HMM-9 to HMM-13
505.Saturation Of Models
Case Of Large Database
For 5 test files, the best results of models are
collected together for LARGE Database. Here, we
can see nearly a flat curve from hmm9 to hmm14
which means little enhancement is achieved by
further training. Moreover, accuracy drops
drastically after HMM-10
51Case Of Small Database
For 5 test files, the best results of models are
collected together for SMALL Database. Here, we
can see nearly a climbing curve from hmm9 to
hmm14, which means enhancement, is achieved by
further training.
526. Enhancement By Radiological Triphones
Radiological triphones and others are compared on
the same platform of HMM-10. Both correctness
and accuracy is better when radiological
triphones are preferred
53Conlusion and future work
- Contributions
- Collection of radiological sound data of about 8
hours from 55 speakers - survey on speakers to learn their geographical
region, smoking statue, age, gender, foreign
language properties - Seven C Programs for processing database as
tools. - Differentword.cpp
- Balance.cpp
- Insert.cpp
- Selectword
- Uclusesdonusumu.cpp
- Ana.cpp
- Radyoloji.cpp
- Trained HMMs
- Investigation of P value effectiveness in
Recognition - Saturation in further training is proved for two
types of database - Accuracy of over 90 and Correctness of 95 as
best performance
54Future Work
- Longer sound database can be collected as much
different people as possible - Report files may be increased to cover more
words in radiology - Better list of triphones
- higher capable computers in LAN
- accuracy level must be improved to 90 for all
types of test files - live audio input can be used
- A graphical user interface must be written for
doctor- computer interaction. It must have report
templates and many facilities
55- Gained abilities
- Good relations with doctors and patients for
future biomedical researches are established. By
that way, hospital and radiological environment
is observed with biomedical tools on work which
we have learnt in courses. - It was another goal to concentrate on free tool
of HTK and to be able to use it well enough to
prepare our sound files, train HMMs and test
them. Difficult steps of labeling, listening,
covering the general concepts of HTK and using
its instructions and supplying the needs of
options that instructions require were
successfully done. - C program writing is learned and improved with 7
C programs - Features of MS excel, MS word, .cool edit 2002,
GoldWave , transcriber, HSlab , MS powerpoint,
ULTRA edit have been learned in details.
56DEMONSTRATION
- c\tez\DEMO\mehmed_mumtaz_00.wav speaker 1
- c\tez\DEMO\levend_mumtaz_01.wav speaker 2
- c\tez\DEMO\cengizhan_mumtaz_02.wav speaker 3
HCOPY
Script for Hcopy C\tez\demo\demo.scp
Config C\tez\demo\config26.cfg
c\tez\DEMO\mehmed_mumtaz_00.mfc c\tez\DEMO\leven
d_mumtaz_01.mfc c\tez\DEMO\cengizhan_mumtaz_02.mf
c
MLF c\tez\DEMO\demomlf.mlf NET
c\tez\DEMO\demonet.net DICT c\tez\DEMO\demodic
t.dict -S c\tez\DEMO\demoscp.scp
HVITE HRESULTS
PERFORMANCE
57Hvite T 1 H c\tez\demo\BIO\HMM10\newmacros.hmm
S c\tez\DEMO\demoscp.scp i
c\tez\demo\OUT\BIO_HMM10_00.rec p -30 w
c\tez\DEMO\demonnet.net c\tez\DEMO\demodict.dic
t c\tez\demo\liste\triphlist.lst
- Hresults I c\tez\DEMO\demomlf.mlf
c\tez\DEMO\triphlist.lst c\tez\DEMO\OUT\BIO_H
MM10_30.rec
58SON
- Ali iskurt
- herkese iyi çalismalar
- diler
- ......................