NLP in Thailand - PowerPoint PPT Presentation

About This Presentation
Title:

NLP in Thailand

Description:

No word delimiters. Same form but several functions. Same form but several meanings ... NECTEC (National Electronic and Computer Technology Center) , Ministry of ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 20
Provided by: els76
Learn more at: http://www.elsnet.org
Category:

less

Transcript and Presenter's Notes

Title: NLP in Thailand


1
NLP in Thailand
  • by
  • Asanee Kawtrakul
  • Kasetsart University

2
Outline
  • Thailand Language Features
  • What we do and the problems
  • The main actors
  • Research Model and Infrastructure
  • What do we need more?

3
Thai Language Characteristics
  • Dialects and Tone
  • Isolating language
  • Uninflected
  • Monosyllabic
  • No word delimiters
  • Same form but several functions
  • Same form but several meanings

4
Thai Language Characteristics
  • Grammar coverage
  • Word formation/Recognition
  • Compound words vs Sentences
  • Proper name vs Common noun
  • Loan word (transliterated foreign words) without
    special orthography

5
Dialects
North 18.8
North-East 34.2
Central 33.7
South 13.3
6
What do we do?
  • TEXT PROCESSING
  • SPEECH PROCESSING
  • IMAGE PROCESSING

7
Text Processing
8
Text Processing and Problems
  • Lack of Legal Corpus
  • Small Corpus
  • Lack of Standard(Pos, Semantic Concepts)
  • Redundancy work
  • Statistical Based Approach
  • Knowledge Based Approaches

9
Speech Processing
  • Speech recognition
  • Speech generation

10
Speech Processing and Problems
  • Recognition
  • Generation
  • Not Only Dialect but Tone
  • Isolated word not Continuous speech
  • Word Boundary detection

11
Image Processing
  • Thai optical character recognition
  • Hand written recognition

12
OCR and Problems
  • Isolated Characters

13
The Main Actors
  • Universities
  • NECTEC (National Electronic and Computer
    Technology Center) , Ministry of Science and
    Technology Environment
  • SIGNLP

14
The Main Actors
  • More than 50 experienced researchers (minimum 5
    years research)
  • More than 100 young researchers

15
Financial Supporter
  • National Electronics and Computer Technology
    Center (NECTEC)
  • National research council of Thailand (NRCT)
  • Kasetsart University Research and Development
    Institute (KURDI)
  • Thai Research Foundation (TRF)
  • etc.

16
Research Model and Infrastructure
  • Short Term
  • Long Term
  • Simple But Work
  • Collaboration between end users, universities and
    Funding Agency (including Private sectors)
  • Robust and very large scale
  • Enlarge the number of researchers

17
What do we need more?
  • Share resources (Corpus, Dictionary, Tools, etc.)
  • Share Experiences and Knowledge
  • Set Big Umbrella and distribute workload
  • Establish research network
  • Partnership

18
Conclusion
  • Most Thai uses Thai Language
  • Thai Language Processing has good future in the
    market IF.

19
  • We have more Collaborative work
  • NLP Market for 1/2 of 60 millions
Write a Comment
User Comments (0)
About PowerShow.com