Title: Advanced Globalization Topics with Microsoft 'NET Framework
1Advanced Globalization Topics with Microsoft .NET
Framework
- François Liger
- Program Manager
- Microsoft Corporation
- francl_at_microsoft.com
2Summary
- Dealing with Cultures
- Handling strings
- Interoperability
- Dealing with encodings
3Cultures
- The .NET Framework uses cultures
- Provide cultural preferences
- Replaces and extends the LCID in Win32
- CultureInfo class
- RFC 1766 derived hierarchy
- Three types of cultures
- Two roles
- CurrentCulture globalization role
- Date, and number formatting
- String comparison and casing
- CurrentUICulture localization role
- Resource selection for user interface
4Culture types and roles
- Invariant culture
- culture-invariant default
- Neutral culture
- Based on language
- Resource only
- No formatting
- CurrentUICulture only
- Specific culture
- Based on language region
- Resource Formatting specifics
- CurrentCulture CurrentUICulture
5CultureInfo
6Using CultureInfo for globalization
- CurrentCulture
- Set by default (based on GetUserDefaultLCID)
- Culture-sensitive APIs (except resources) use
this culture by default
MyString "this is a string" MyCaps
MyString.Upper()
7Setting culture explicitly
- Useful for server scenarios
- Examples
- Thread.CurrentThread.CurrentUICulture new
CultureInfo(ja) - Thread.CurrentThread.CurrentCulture new
CultureInfo(ja-JP) - Thread.CurrentThread.CurrentCulture
CultureInfo.CreateSpecificCulture(Request.UserLang
uages(0))
8Using CultureInfo for globalization
- Explicitly indicating culture in API
- Use when several cultures need to be handled at
once
// convert a string formatted using
hi-IN CultureInfo HindiIndia new
CultureInfo(hi-IN) CultureInfo FrenchFrance
new CultureInfo(fr-FR) string s
12,34,567.00 int n int.Parse(s,
NumberStyles.AllowDecimalPoint
NumberStyles.AllowThousands, HindiIndia) Console.
WriteLine(n.ToString("n", FrenchFrance))
9Using CultureInfo for globalization
- CultureInfo.InvariantCulture
- For UI, prefer culture-sensitive formatting
- However, for culture-neutral operations (e.g.
database storage, back end processing, system
object comparisons) you may need stable,
culture-unrelated format, such as - ,.
- dd MMMM yyyy HHmmss GMT
- international currency symbol
10Using culture for localization
- Use built-in fallback
- Only set culture explicitly if really required
11Extending culture
12Handling strings
- Combining characters
- Supplemental characters (Surrogate pairs)
- Recommendations
13Combining characters
- Some characters can be expressed
- As pre-composed characters
- As base character combining character(s)
14Combining Characters
- Supported through the StringInfo class
- StringInfo.ParseCombiningCharacters()
- Parses a string into text elements.
- A text element is a unit of text that is
displayed as a single character. - StringInfo.GetTextEnumerator
- The StringInfo class provides access to the
TextEnumerator class via the StringInfo.GetTextEnu
merator method.
15Supplemental Characters
- Over 1 million additional characters
- Why should you care?
- Used for East Asian Languages
- Used in Japan, Taiwan
- Used for converting to/from GB18030
- Used for other scripts too
- Surrogate pairs
- High surrogate Low surrogate
- A surrogate pair points to a supplemental
character - Strictly-defined ranges for surrogate characters
- Easier than DBCS (e.g. to walk strings)
- Still requires some effort
16Supplemental Characters
Basic Multilingual Plane
Supplemental characters
16-bit
21-bit
High surrogate
Low surrogate
UD840
UDC58
17Supplemental Characters
- Supported through the StringInfo class
- StringInfo.ParseCombiningCharacters()
- Parses a string into text elements.
- A text element is a unit of text that is
displayed as a single character. - StringInfo.GetTextEnumerator
- The StringInfo class provides access to the
TextEnumerator class via the StringInfo.GetTextEnu
merator method. - Encoding classes support supplemental characters
when appropriate (e.g. GB18030)
18Surrogate pairs combining characters
19Comparing strings
- Which API to choose?
- Demo
20String handling recommendations
- Deal with strings whenever possible
- Only deal with individual characters if there is
no other choice - Rely on the StringInfo methods to handle
surrogates and combining characters - StringInfoParseCombiningCharacters
- StringInfoGetTextEnumerator
21Interoperability with Win32
- Internationalization APIs available in managed
code - Hower sometimes, the need for a native API occurs
- CultureInfo exposes an LCID for interop
- Watch for String type though
22Interoperability with Win32
23Encodings
- Encoding class
- Provides base class for all other encodings
- Support for codepage based encodings
- E.g. support for GB18030 through this class
- Unicode (UTF-16 and UTF-8) encodings classes
- UnicodeEncoding
- UTF8Encoding
24Encodings
25Questions and Links
- Questions?
- Useful Links
- http//www.microsoft.com
- http//www.gotdotnet.com
- http//www.asp.net