Title: Globalization Features in Whidbey’s CLR
1Globalization Features in Whidbeys CLR
- Michael Kaplan
- Technical Lead
- Globalization Infrastructure, Fonts and Tools
- Microsoft Windows International Division
- http//blogs.msdn.com/michkap
2Customized Cultures and Regions
- CultureAndRegionInfoBuilder class
- Create an override to an existing culture
- Create based on an existing culture
- Create from scratch
- Must be an administrator to register
- Can register the file on multiple machines
3CultureAndRegionInfoBuilder sample
- CultureAndRegionInfoBuilder carib new
CultureAndRegionInfoBuilder(de-DE-MineMine,
CultureAndRegionModifiers.None) - // load up all of the existing data for German
and for Germany.... - carib.LoadDataFromCultureInfo(new
CultureInfo(de-DE", false)) - carib.LoadDataFromRegionInfo(new
RegionInfo(de) - // Change a property
- carib.ThreeLetterISORegionName ZZZ
- // Register the culture on the machine
- carib.Register()
- // Use the new culture
- CultureInfo ci new CultureInfo(de-DE-MineMine)
4CaRIB serialization with LDML
- Locale Data Markup Language
- Described in UTS35 at http//unicode.org/reports/
tr35/ - CaRIB objects can be saved as LDML files
- Data can be loaded from LDML files
- CaRIB will do its best with files it did not
create
5LDML Sample
- string file1 Path.GetTempFileName()
- File.Delete(file1)
- CultureInfo ci new CultureInfo("ar-EG")
- RegionInfo ri new RegionInfo("de-DE")
- CultureAndRegionInfoBuilder carib new
CultureAndRegionInfoBuilder("x-en-US-Pepsi",
CultureAndRegionModifiers.None) - carib.LoadDataFromCultureInfo(ci)
- carib.LoadDataFromRegionInfo(ri)
- carib.Save(file1)
- carib CultureAndRegionInfoBuilder.CreateFromLdml
(file1) - carib.Register()
6When Windows knows more than .NET
- As of XPSP2, there are 25 new locales in Windows
- Bengali - India
- Croatian - Bosnia and Herzegovina
- Bosnian - Bosnia and Herzegovina
- Serbian - Bosnia and Herzegovina (Latin)
- Serbian - Bosnia and Herzegovina (Cyrillic)
- Welsh - United Kingdom (more info in English, in
Welsh) - Maori - New Zealand
- Malayalam - India
- Maltese - Malta
- Quechua - Bolivia
- Quechua - Ecuador
- Quechua - Peru
- Setswana / Tswana - South Africa
- isiXhosa / Xhosa - South Africa
- isiZulu / Zulu - South Africa
- Sesotho sa Leboa / Northern Sotho - South Africa
- Northern Sami - Norway
- Northern Sami - Sweden
7Windows-only Cultures
- The solution Windows-only cultures!
- Synthesizes a CultureInfo object when Windows
supports a locale that the .NET Framework does
not know how to create itself
8Windows only culture test
- foreach(CultureInfo culture in CultureInfo.GetCult
ures(CultureTypes.WindowsOnlyCultures)) -
- Console.WriteLine(ci.Name)
-
- // New cultures on XP SP2 include
- // mt-MT, bs-BA-Latn, smn-FI, smj-NO, smj-SE,
sms-FI, sma-NO, - // sma-SE, quz-BO, quz-EC, quz-PE, ml-IN, bn-IN,
cy-GB, and more
9Special CultureInfo support for SQL Server 2005
(Yukon)
- SQL Server locale semantics
- One setting for UI and formatting
- Another setting for collation/encoding
- .NET/Windows semantics
- One setting for UI
- Another setting for formatting/collation
- Solution
- Special GetCultureInfo override that takes two
CultureInfo names for the two SQL Server settings
10How Yukon uses this support
- Microsoft.ReportingServices.Diagnostics.Localizati
on - CatalogCulture
- ClientPrimaryCulture
- DefaultReportServerCulture
- FallbackUICulture
- InstalledCultureNames
- ReportParameterCulture
- SqlCulture
11New locale properties/methods
- TextInfo
- CultureName
- LCID
- CompareInfo
- Name
- DateTimeFormatInfo
- ShortestDayNames
- MonthGenitiveNames
- AbbreviatedMonthGenitiveNames
- NumberFormatInfo
- NativeDigits
- DigitSubstitution
- CultureInfo
- IsCustomCulture
- IetfLanguageTag
- CultureTypes
- GetCultureInfo()
- GetCultureInfoByIetfLanguageTag()
- RegionInfo
12Updates to encodings
- Now built into the BCL
- Improved performance
- more flexibility
- consistent results across supported platforms
- Encoding enumeration API
- UTF-32 support (little endian and big endian)
- UTF-16 big endian support
- Encoding/decoding fallbacks
- Exception
- Replacement
- Best fit
- Custom
13- public class NumericEntitiesFallback
EncoderFallback - public override EncoderFallbackBuffer
CreateFallbackBuffer() - return new NEFallbackBuffer()
-
- public override int MaxCharCount
- get
- return 8
-
-
-
- public class NEFallbackBuffer
EncoderFallbackBuffer - // Store our default string
- private String strEntity
- int fallbackCount -1
- int fallbackIndex 0
- // Fallback Methods
- // If we had a buffer already we're being
recursive, throw, it's - // probably at the suspect character in our
array. - if (fallbackCount gt 0)
- ThrowLastCharRecursive(Char.ConvertToUtf32
(charUnknownHigh, charUnknownLow)) - // Go ahead and get our fallback
- strEntity String.Format("0",
Char.ConvertToUtf32(charUnknownHigh,
charUnknownLow)) - fallbackCount strEntity.Length
- fallbackIndex 0
- return fallbackCount ! 0
-
- public override char GetNextChar()
- // We want it to get lt 0 because 0 means
that the current/last - // character is a fallback and we need to
detect recursion. We - // could have a flag but we already have this
counter. - fallbackCount--
14Collation Improvements
- OrdinalIgnoreCase
- Same results as ToUpper/Ordinal
- Matches OS file system results
- Correct Serbian collation
- Fixed in Windows XPSP2
- Customer reported (MSDN Feedback Center)
- Better handling of ignored/ignorable characters
- IndexOf/LastIndexOf/IsPrefix/IsSuffix
- StartsWith/EndsWith, too
15OrdinalIgnoreCase sample
- string strTest1 "IamAString"
- string strTest2 "STRING"
- if(strTest1.EndsWith(strTest2, StringComparison.Or
dinalIgnoreCase)) - Console.WriteLine(Successful test!)
16Unicode normalization
- Described in UAX15 at http//www.unicode.org/repo
rts/tr15/ - String.IsNormalized()String.IsNormalized(Normaliz
ationForm normalizationForm) - String.Normalize()String.Normalize(NormalizationF
orm normalizationForm) - NormalizationForm enumeration
- FormC, FormD, FormKC, FormKD
- õhµ (U00f5 U0068 U0302 U00b5 U00a8)LATIN
SMALL LETTER O WITH TILDE LATIN SMALL LETTER H
COMBINING CIRCUMFLEX ACCENT MICRO SIGN
DIAERESIS - FormC õhµ (U00f5 U0125 U00b5 U00a8)
- FormD ohµ (U006f U0303 U0068 U0302
U00b5 U00a8) - FormKC õhµ (U00f5 U0125 U03bc U0020
U0308) - FormKD ohµ (U006f U0303 U0068 U0302
U03bc U0020 U0308) - In collation, õhµ ? ohµ ? õhµ ? ohµ
17- namespace àáâãäå
- using System
- using System.Text
- using System.Globalization
- class àáâãäa
-
- STAThread
- static void Main(string args)
- àáâãäå() aaaaaa() aáâãäå()
aaâãäå() aaaãäå() aaaaäå()
aaaaaå() -
- static void àáaaäå(string aaâãaa)
- StringBuilder àáâãäa new
StringBuilder() - StringInfo àaâãäå new
StringInfo(aaâãaa) - àáâãäa.Append(aaâãaa.Normalize(No
rmalizationForm.FormC)) - àáâãäa.Append(" ")
- for(int aaâaaå0 aaâaaå lt
àaâãäå.LengthInTextElements aaâaaå)
18IDN Mapping APIs
- IdnMapping class
- Based on three RFCs (standard based on Unicode
3.2) - 3490 - Internationalizing Domain Names in
Applications (IDNA) - 3491 - Nameprep A Stringprep Profile for
Internationalized Domain Names (IDN) - 3492 - Punycode A Bootstring encoding of Unicode
for Internationalized Domain Names in
Applications (IDNA) - \u5B89\u5BA4\u5948\u7F8E\u6075-with-SUPER-MONKEYS
becomesxn---with-SUPER-MONKEYS-pc58ag80a8q
ai00g7n9n - Properties
- AllowUnassigned (allows new Unicode characters)
- UseStd3AsciiRules (more like DNS rules)
- Methods
- GetAscii - Gets ASCII (Punycode) version of the
string - GetUnicode - Gets Unicode version of the string,
normalized and limited to IDNA characters.
19Unicode property information
- New CharUnicodeInfo class
- Extends methods on Char
- Offical data from the Unicode Character Database
at http//www.unicode.org/ucd/ - IsWhiteSpace
- GetNumericValue
- GetDigitValue
- GetDecimalDigitValue
- GetUnicodeCategory
- GetBidiCategory
20New text element support in the StringInfo class
- StringInfo ctor that takes a string
- StringInfo.String
- StringInfo.LengthInTextElements
- StringInfo.SubstringByTextElements()
- Both use ParseCombiningCharacters() to get their
results
21New StringInfo props/methods sample
- StringInfo si New StringInfo("A\u0300\u0301\u030
0e\u0300\u0301\u0300) - Console.WriteLine(si.LengthInTextElements) //
Length is two - for(int ich 0 ich lt si.LengthInTextElements
ich) - Console.WriteLine(si.SubstringByTextElements(i
ch, 1)
22New supplementary character support in lots of
methods
- New signature -- (String s, int index)
- IsControl, IsDigit, IsLetter, IsLetterOrDigit,
IsLower, IsNumber, IsPunctuation, IsSeparator,
IsSurrogate, IsSymbol, IsUpper, IsWhiteSpace,
GetUnicodeCategory, GetNumericValue,
IsHighSurrogate, IsLowSurrogate, IsSurrogatePair - ConvertToUtf32, ConvertFromUtf32 methods
23References
- MSDN Magazine Article
- Make the .NET World a Friendlier Place with the
Many Faces of the CultureInfo Class March 2005 -
http//msdn.microsoft.com/msdnmag/issues/05/03/Cul
tureInfo/ - SQL Server Books Online
- International Considerations for SQL Server
http//whidbey.msdn.microsoft.com/library/en-us/ic
sql9/html/50dc4fa8-4772-46a8-a8ef-bc134502b4e0.asp
- My Blog
- http//blogs.msdn.com/michkap
- Some other blogs for intl support in Whidbey
- http//blogs.msdn.com/AchimR
- http//www.dasblonde.net/
- http//blogs.msdn.com/BCLTeam
- Other useful sites
- http//www.microsoft.com/globaldev/
- http//lab.msdn.microsoft.com/productfeedback/
- http//www.unicode.org/
24Globalization Features in Whidbeys CLRQuestions