Language Resources in Indonesia - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Language Resources in Indonesia

Description:

... information retrieval and extraction, machine translation ... Multilingual Machine Translation System (CICC-MMTS) KEBI (Indonesian Electronic Dictionaries) ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 12
Provided by: rich515
Category:

less

Transcript and Presenter's Notes

Title: Language Resources in Indonesia


1
Language Resources in Indonesia
  • Language Technology Applied Information
    Laboratory
  • Directorate for Information Technology and
    Electronics Agency for the Assessment
    Application of Technology (BPPT)
  • Indonesia

2
TBIT Laboratory - BPPT
  • Apply, assess and develop Language Technology
    Applied Information Technology supporting
    Governments program in development of IT
    Electronics in Indonesia
  • Advise and setup government national policy in
    developing language technology and information
    technology
  • Develop and deploy language technologies in the
    area of language processing, text analysis and
    generation, information retrieval and extraction,
    machine translation
  • Develop and maintain Language Resources i.e.
    grammar rules, electronic dictionaries and
    annotated corpus
  • Develop Electronic Data Interchange (EDI) and
    Electronic Commerce suite for SME

3
Project Portfolio
  • Multilingual Machine Translation System
    (CICC-MMTS)
  • KEBI (Indonesian Electronic Dictionaries)
  • UNL (Universal Networking Language)
  • INCI (Indonesian National Corpus Initiative)
  • Online I-E Dictionary on news portal (Detik.com)
  • Multimedia Dictionary (including speech
    synthesizer)
  • Yanetra (NLP tools for the blind)
  • Others
  • Manufacturing Technology supported by advanced
    and integrated information system through
    International Cooperation (MATIC) for
    Automotive, Apparel, and Electronics
  • Web Information Gateway for Apparel
  • Electronic Commerce Projects

4
Indonesian Electronic Dictionaries - KEBI
  • Word dictionary (50K root words 250K
    derivational words)
  • Concept dictionary
  • Co-occurrence dictionary
  • Terminology dictionary (15K terms)

5
Indonesian Dictionary Online KEBI Online
http//nlp.aia.bppt.go.id
6
Indonesian-English Online Dictionary
  • Indonesia-English Online Dictionary on Detik.com
    Portal (number 1 for online breaking news)

7
Indonesian National Corpus Initiative INCI/KNBI
  • Source from national news agency LKBN ANTARA
  • 50.000 sentences
  • 1 million words
  • ambiguous word-type
  • ambiguous word-token
  • POS and phrase attachment ambiguity

8
BIAS (Bahasa Indonesia Analysis System)
  • Part of CICC-MMTS
  • Improvement using stochastic-symbolic approach
  • Supervised and unsupervised learning
  • 15.000 sentences of annotated corpus (based on
    GDA tagset)
  • ISTAG (POS Tagger)
  • ISPARSE (Skeleton Parser)

9
UNL Project
10
(No Transcript)
11
Other resources
  • Speech recognition system (Bandung Institute
    Technology)
  • Indonesian spelling checker for Microsoft Word
    (Gajah Mada University)
  • Computational lexicon research (National Language
    Center)
  • Computational morphology (Atmajaya University)
Write a Comment
User Comments (0)
About PowerShow.com