Title: Infrastructures in Taiwan
1Infrastructures in Taiwan and for the Chinese
Languages Chu-Ren Huang Institute of Linguistics
Academia Sinica churen_at_sinica.edu.tw ACL 2000
WORKSHOP Infrastructures for Global
Collaboration Saturday, October 7, Hong Kong
2Types of Infrastructures Sharable resources (for
Chinese computational linguistics) Mechanisms for
international collaboration Mechanisms for
scholarly exchange
3Host Institutes -The Association for
Computational Linguistics and Chinese Language
Processing (ACLCLP, a.k.a. ROCLING) -Academia
Sinica -National Science Council (NSC)
4Sharable Resources for Chinese Computational
Linguistics Corpora Lexicons Procedures http//ro
cling.iis.sinica.edu.tw/ROCLING/
5Sharable Resources for Chinese Computational
Linguistics--Corpora -Academia Sinica Balanced
Corpus of Mandarin Chinese (Sinica
Corpus) -Sinica Treebank -Standard Segmentation
Corpus -ROCLING Corpus -Mandarin-Across-Taiwan
(MAT) Speech Database
6Academia Sinica Balanced Corpus of Mandarin
Chinese (Sinica Corpus) 5 million words,
segmented and tagged Direct WWW
Access -http//www.sinica.edu.tw/tibe/2-words/mod
ern-words/index.html OR -http//www.sinica.edu.tw
/ftms-bin/kiwi.sh License Information -http//roc
ling.iis.sinica.edu.tw/ROCLING/corpus98/sinicor_E.
htm
7Sinica Treebank 1.0 38,725 Trees 239,532
Words Direct WWW Access (1000 sample
trees) http//godel.iis.sinica.edu.tw/CKIP/trees10
00.htm License Information http//rocling.iis.sin
ica.edu.tw/ROCLING/Treebank/Treebank-E.htm
8Mandarin-Across-Taiwan (MAT) Speech Database
Speech files are collected through telephone
networks. The content Includes spontaneous speech
(short answering statements) and read speech
(numbers, Mandarin syllables, words of 2 to 4
syllables, phonetically balanced
sentences). MAT-160 (160 speakers) MAT-2000
http//rocling.iis.sinica.edu.tw/ROCLING/MAT/inde
x_cf.htm
9Sharable Resources for Chinese Computational
Linguistics-Procedures Segmentation Standard for
Chinese Language Processing Segmentation
Standard http//godel.iis.sinica.edu.tw/ROCLING/ju
huashu1.htm Standard Segmentation Corpus (2
million words, segmented) http//godel.iis.sinica.
edu.tw/ROCLING/corpus98/segcorp_E.htm Standard
Segmentation Lexicon (42,138 entries, w/
frequency) http//godel.iis.sinica.edu.tw/ROCLING/
corpus98/segdic_E.htm Segmentation Program (free
download) http//godel.iis.sinica.edu.tw/CKIP/ws/
10Sharable Resources in Languages Other than
Modern Mandarin Classical Chinese
Corpora http//www.sinica.edu.tw/tibe/2-words/old
-words/index.html Corpus of Formosan Austronesian
Languages Under construction, part of the
National Digital Archive Initiative Lexical
Databases of other Sino-Tibetan and
Tibeto-Burmese Languages
11Mechanisms for International Collaboration
Major Sponsors of International Collaboration
Involving Taiwan --The Chiang Ching-kuo
Foundation for International Scholarly
Exchange http//www.cckf.org http//www.cckf.org.
tw --The National Science Council --Academia
Sinica
12Synchronic and Diachronic Chinese Corpora Three
Projects Sponsored by the CCK Foundation
(1990-1995) Chu-Ren Huang, Keh-jiann Chen and
Pei-chuan Wei, Academia Sinica Paul Thompson,
SOAS, University of London Chaofen
Sun, Stanford University
13Mechanisms for Scholarly Exchange and
Collaboration Department of International
Programs, NSC http//www.nsc.gov.tw/int/2_coopera
tion/index_02.html Canada NRC France CNRS
Japan EAACST Germany DFG, DAAD,
DKFG Netherlands NWO, IIAS USA NSF, NIH UK
Royal Society of London, ETC
14A NSF/NSC International Joint Project NSF Asian
Language Digital Library Project Ching-Chih
Chen, Simmons College NSC International Digital
Library Collaborative Projects --Lexicon-based
Knowledge Linking -Approaches Towards a WordNet
Infrastructure for Multilingual Digital Library
Chu-Ren Huang, Academia Sinica --Linguistic
Technology and Resources for English-Chinese
Bilingual Information System Hsin-Hsi Chen,
National Taiwan University
15Mechanisms for International Collaboration-Bilater
al Projects -Case by Case Negotiation Academia
Sinica vs. Hong Kong Chinese University, LDC,
Stanford, UCSB etc.
16Mechanisms for Scholarly Exchange-Conferences ROCL
ING (annually since 1988) PACLIC Pacific Asia
Conference on Language Information and
Computation (regional conference involving Hong
Kong, Japan, Korea, Singapore, and Taiwan)
http//www.rcl.cityu.edu.hk/paclic15 COLING2002 ht
tp//www.COLING2002.sinica.edu.tw
17Mechanisms for Scholarly Exchange- Exchange
Scholars Academia Sinica and EHESS Yearly
exchange Academia Sinica and University of
Pennsylvania (under negotiation) NSC and CNRS,
NSC and NWO Cognitive Science
18Mechanisms for Scholarly Exchange- Post-doctoral
Fellows -Academia Sinica Post-doctoral
Fellowships Application through Project PIs or
directly by applicants -NSC Post-doctoral
Fellowships
19Mechanisms for Scholarly Exchange-International
Students Computational Linguistics and Chinese
Language Processing An international graduate
(PhD) program (Proposal under review) Visiting
Students Internships