An ICU Overview - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

An ICU Overview

Description:

Caret Display. Hit Testing. Selection Highlighting. Caret Movement. Layout Metrics. Line Break. Prague 5/17/09. 23nd International Unicode Conference ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 22
Provided by: helenac7
Category:
Tags: icu | caret | overview

less

Transcript and Presenter's Notes

Title: An ICU Overview


1
An ICU Overview
  • Mark Davis
  • Chief Globalization Architect, IBM
  • IBM Globalization Center of Competency

2
Agenda
  • What is ICU?
  • Architecture Overview
  • Significant New ICU Features
  • Near Future Features
  • References
  • Q and A

3
ICU Features
  • Unicode text handling
  • Character set conversions (700)
  • Collation Searching
  • Locales (170)
  • Resource Bundles
  • Calendar Time zones
  • Complex-text layout engine
  • Regular Expressions
  • Breaks character, word, line, sentence
  • Formatting
  • Date time
  • Messages
  • Numbers currencies
  • Transforms
  • Normalization
  • Casing
  • Transliterations

4
Unicode Text Handling
  • C
  • UChar null-terminated or with length
  • C
  • UnicodeString full featured string class
  • Java
  • Uses normal JDK String, adds utilities
  • All handle supplementary characters
  • Required for GB 18030 and JIS 213 repertoire

5
Unicode Text Handling II
  • All Unicode properties
  • UnicodeSet
  • fast, low-memory
  • boolean combinations of properties ranges
  • \pwhitespace\pLatin-aeiuo
  • in regular expressions, transform filters,
    stand-alone

6
Character Set Conversion
  • 700 supported character sets
  • Precise alias information
  • When you ask for SJIS, you can request the
    precise definition windows, ibm, solaris,
  • Buffer management handles characters that cross
    buffers
  • Customizations allowed for illegal sequences, and
    undefined characters
  • Unicode Text Compression SCSU, BOCU

7
Collation and Searching
  • Fast international comparison and string search
    fully UCA compliant
  • Compressed sort keys, optimized string
    comparison, sublinear string search
  • Supports precise binary sortkey stability over
    time
  • Fully data driven
  • API / rule customizations strength,
    normalization, upper vs. lowercase first,

8
Calendar Time Zones
  • International Calendars Arabic, Buddhist,
    Hebrew, and Japanese
  • Required for correct presentation of dates in
    some countries.

9
Formatting
  • Date time
  • Messages
  • Completely localizable, Plural support
  • Numbers currencies
  • Scientific Notation, Spelled-out (checks, etc.)
  • Dual Currency support e.g. Indian Rupee
  • In Hindi
  • In English 1,234.57 Rupees

10
Transforms
  • Unicode Normalization
  • Highly optimized for performance
  • performance utilities concatenation, detection,
    comparison
  • Casing (upper, lower, title, folding)
  • General Transforms
  • Script transliterations
  • Half-width/Full-width, Hex, etc.
  • Chain transforms together, filter source
    characters
  • Rule-based, customizable at runtime.

11
Word, line sentence breaks
  • Fast state-table implementation
  • Customizable
  • Rule-based customizable at runtime
  • Special customizations, e.g. Thai

12
Complex-text layout engine
  • Glyph processing, positioning adjustment
  • ligature substitution, contextual forms, kerning,
    accent placement, Bidi scripts, etc.
  • Support for
  • Drawing
  • Caret Display
  • Hit Testing
  • Selection Highlighting
  • Caret Movement
  • Layout Metrics
  • Line Break

13
Architecture Overview
  • Locale Based Services
  • Locale is an identifier, not a container
  • Object in C and Java, char in C
  • Default locale is set to the platform locale
  • Resource inheritance

14
Architecture Overview
  • Open and Close Service Model
  • Better performance by avoiding setup costs per
    operation
  • ICU Threading Model
  • Multiple versions in use simultaneously
  • Large resources shared in read-only cache
  • Modularization
  • Link against multiple ICU version
  • Build partial ICU versions

15
Architecture Overview
  • Data Driven Services
  • Customize at build-time or run-time
  • Interchange with other platforms
  • same results on each
  • Rule-based
  • Collation, Word-breaks, Transforms
  • Pattern-based
  • Formats, UnicodeSet
  • Table-based
  • Character Conversion

16
Architecture Overview ICU4C
  • Simple Error Handling
  • C subset for portability
  • Support for multi-threaded environment
  • Version Management
  • Multiple versions at the same time
  • Data and library versioning
  • String Buffer Management
  • Preflighting and overflow protection

17
Recent Features (I)
  • Unicode Regular Expressions (phase 1)
  • Full Unicode properties/values
  • Charset Conversion Enhancements
  • Alias Management
  • platform matching
  • Compression preserving binary order
  • Customization
  • Modularized ICU library building
  • Service Registration (phase 1)
  • Dual Currency Support

18
Recent Features (II)
  • Memory Management
  • Load and unload ICU libraries
  • Choice of heap allocation
  • Performance
  • Collation
  • Fast Unicode Normalization
  • UnicodeSet
  • Test framework tests

19
2003 SS 2.6, WS 2.8
  • Unicode 4.0 Update
  • More multi-threading support, customization,
    modularization
  • Improved RegEx, TextBoundaries, TextLayout
  • IDN conversion
  • Collation UCA 4.0, Partial Sort Keys,
    Multi-charset
  • Ongoing work porting, docs, perf.,
  • Related LDML

20
References
  • ICU main site
  • http//oss.software.ibm.com/icu/
  • Links to
  • Download ICU
  • User Guide, Technical FAQ, Support, Bug Reports
  • Unicode Consortium
  • http//www.unicode.org
  • Unicode glossary, Unicode character database
  • IBM Developerworks
  • http//www.ibm.com/developerworks/unicode

21
Questions and Answers
Write a Comment
User Comments (0)
About PowerShow.com