Title: International Domain Name
1International Domain Name
- TWNIC
- Nai-Wen Hsu
- snw_at_twnic.net.tw
2Domain name
- RFC 1035
- A label can not longer than 63 characters
- A domain name can not longer than 255 characters
- Maximum labels 127
- Only accept a-z,0-9,- as domain name
- Limited ASCII character code point, 37 LDH
(Letter-Digit-Hyphen)
3International Domain Name
- IETF IDN WG adopt UNICODE 3.2
- Greek, Cyrillic, Armenian, Hebrew,
Arabic,Syriac, Thaana, Devanagari,
Bengali,Gurmukhi, Gujarati, Oriya, Tamil,
Telugu,Kannada, Malayalam, Sinhala, Thai, - 95,156 characters
4International Domain Name sample
- ??????.jp
- gwmöbler.com
- ????.tw
- ??????????.cn
- ?????.kr
- ????? . ???
5IETF IDN Standard
- IDNA (RFC3490)
- Internationalizing Domain Names in Applications
- NAMEPREP(RFC3491)
- A Stringprep Profile for Internationalized Domain
Names - PUNYCODE(RFC3492)
- A Bootstring encoding of Unicode for
Internationalized Domain Names in Applications - STRINGPREP(RFC3454)
- Preparation of Internationalized Strings
6IDNA components and interfaces
User
Input and display local interface methods (pen,
keyboard, ...)
IDNA
IDNA-aware Application (ToASCII and
ToUnicode operations may be called here)
End system
Call to resolver ACE
Application-specific Protocol ACE Unless the
protocol Is updated to handle Other encodings
xn--de-jg4avhby1noc0d
Resolver
DNS Protocol ACE
"Application" is where the application splits a
host name into labels, sets the appropriate
flags, and performs the ToASCII and ToUnicode
operations.
DNS Servers
Application Servers
7IDNA Structure
Nameprep A Stringprep Profile for
Internationalized Domain Names
User input
(UNICODE)
IDNA
- NAMEPREP
- Mapping
- Normalization
- Prohibit
STRINGPREP
ToASCII
ToUnicode
ACE(PUNYCODE)
To resolver
ACE
8NAMEPREP
- A Stringprep Profile for Internationalized Domain
Names - Mapping
- Stringprep table B.1,B.2
- Normalization
- Form KC
- Prohibited Output
- Stringprep table C.1.2,2.2,3,4,5,6,7,8,9
9NAMEPREP -- Mapping
- Commonly mapped to nothing 27
- Ex
- Mapping for case-folding used with NFKC 1371
- ExA ? a (U0041?U0061) ?
(U03AB?U03CB) ? (U3371?U0068 U0070
U0061)
10NAMEPREP -- Normalization
- Unicode normalization with form KC
11NAMEPREP -- Normalization
12NAMEPREP Prohibited output
- Non-ASCII space characters 17
- Ex (NO-BREAK SPACE)
- Non-ASCII control characters 54
- Ex (DEVICE CONTROL STRING)
- Private use 133371
- Non-character code points 49
- Surrogate codes 2048
13NAMEPREP Prohibited output
- Inappropriate for plain text 4
- Inappropriate for canonical representation 12
- Change display properties or are deprecated 13
- Tagging characters 97
14PUNYCODE
- A Bootstring encoding of Unicode for IDNA
- One of the ACE(ASCII Compatible Encoding)
- Translate non-ASCII characters to ASCII
characters - Prefix xn--
- Ex????.tw ? xn--ciun9hb52c2za.tw
15Insufficient in IDN standard
- Current IDN standard (IDNA, NAMEPREP, PUNYCODE)
can not solve Chinese domain name requirement - Tradition/Simplify Chinese mapping
- Ex ? ?? ?
- Writing variant mapping
- Ex ? ?? ?
16(No Transcript)
17Insufficient in IDN standard
- They are the same meaning but it is different
character in different countries - In China
- ?(529D)
- In Japan
- ?(52E7)
- In Taiwan
- ?(52F8)
18IDN administration guide line
- Registration policy to solve those problems
listed above - Every language has a variant table with 3 fields
- valid code point
- recommended variant
- character variant
19Variant Table sample
Valid code point (VCP) Recommended variants by .tw (twRV) Recommended variants by .cn (cnRV) Character Variant(s) (CV) Remarks
?(4E01) ?(4E01) ?(4E01) ?(4E01) Singular-relation character(1)
?(4E04) ?(4E0A) ?(4E0A) ?(4E04) ?(4E0A) Pair-relation characters (2.1)
?(4E0A) ?(4E0A) ?(4E0A) ?(4E04) ?(4E0A) Pair-relation characters (2.1)
?(4E07) ?(4E07) ?(4E07) ?(4E07) ?(842C) Pair-relation characters (2.2)
?(842C) ?(842C) ?(4E07) ?(4E07) ?(842C) Pair-relation characters (2.2)
20Variant Table sample
Valid code point (VCP) Recommended variants by .tw (twRV) Recommended variants by .cn (cnRV) Character Variant(s) (CV) remarks
?(53F6) ?(8449) ?(53F6) ?(53F6) ?(8449) Pair-relation characters (2.3)
?(8449) ?(8449) ?(53F6) ?(53F6) ?(8449) Pair-relation characters (2.3)
?(4E2A) ?(500B) ?(4E2A) ?(4E2A)?(500B)?(7B87) Multiple-relation Characters
?(500B) ?(500B) ?(4E2A) ?(4E2A)?(500B)?(7B87) Multiple-relation Characters
?(7B87) ?(500B) ?(4E2A) ?(4E2A)?(500B)?(7B87) Multiple-relation Characters
21Variant Table
- Singular-relation character (VCPtwRVcnRVCV)
13888(66.4) - VCPtwRV?cnRV 2783 (13.3)
- VCPcnRV?twRV 2453(11.7)
- VCP?(twRVcnRV) 333(1.6)
- VCP?twRV?SCR 387(1.9)
22Variant Table
Number of character variant(s) 1 2 3 4 5 6 7 8
Number of Characters 13888 66.4 5156 24.7 1158 5.5 424 2.0 165 0.79 60 0.29 35 0.17 16 0.08
23Variant Table
- The table draft is prepared by the CCMT Task
force - organized by TWNIC from January, 2002.
- Task force members have 9 experts from
- language linguist, computer experts and DNS
experts. - The table draft has submitted to the Bureau of
Standards, - Ministry of Economic Affairs to final review.
24Registration procedure
- A Registrant should select the language(s)
- Activation of the requested domain name(s)
Reservation of the equivalence(s) should be
provided by the Registry, within the
language-based character set - The registrant can require the activation of the
reserved equivalent domain name(s) at any time
25Registration sample
- A user select zh-tw and zh-cn language with
domain name ???.com - ???.com (Recommended variants for zh-tw)
- ???.com (Recommended variants for zh-cn)
- ???.com (Character Variant)
- ???.com (Character Variant)
26Q A