Title: Windows 2000 Indian Language Developers Conference
1Windows 2000 Indian Language Developers
Conference
- F. Avery Bishop
- Senior Program Manager for Multilingual Developer
Communications, and - David C. Brown
- Development Lead for Complex Script Enabling in
Windows Operating Systems - Microsoft Corporation
2Agenda for the Day
- Welcome and Keynote
- International Features of Windows 2000
- Complex Script Processing in Windows 2000
- Uniscribe The Unicode Script Processor
- Lunch
- Guidelines for supporting complex scripts in
Win32 applications - Supporting Indian text in Enterprise applications
- Introduction to Open Type Fonts
- Microsoft developer programs in India
3Updates on Session Materials
- Todays presentations vary slightly from your
session handouts - For updates to ppt files and demos,
seewww.microsoft.com/globaldev
4International Features in Microsoft Windows
2000F. Avery BishopSenior Program
ManagerMicrosoft Corporation
5Agenda International Features of Windows 2000
- Definitions of key concepts
- Windows 2000 single-binary internationalization
- Multilingual content
- Windows 2000 Multilanguage version
- New complex script support, including
- Support for Indian languages
- Complex Scripts in web pages
- Right-to-left layout of shell, applications
Old name Windows NT 5.0
6Definitions
- ScriptA set of symbols used to write one or
more languages - Locale
- A place or locality (Dictionary definition)
- Set of user preferences related to language and
local customs - Language GroupTerm used to describe the
supported script families in Windows NT 5
7Definitions
- System LocaleNot really a locale. Determines
which script non-Unicode applications will
support (e.g., what Windows 9x system Windows NT
emulates) - User LocaleUser preferences for formatting of
dates, currencies, numbers, etc. - Input LocalePairing of input language and
method of of input determines what language is
currently being entered and how
8Definitions
- Enabling for a scriptAdding support for input,
display, and output of the script - LocalizationTranslating user interface elements
- GlobalizationDeveloping software such that
feature design and code design are not limited to
a single locale or script
9Definitions
- Complex ScriptsScripts that require contextual
processing for display, editing, and other
processing
10All language versions of Windows 2000 use the
same core binary files!So What?
- Advantages to Users
- Can enter text in any supported language on any
version of Windows 2000 - Any language version of well written Win32 app
runs on any language version of Windows 2000 - Advantages to developers
- Develop all language versions on one system
- Can develop and ship a single binary for all
languages
11More on Unified language Support in Windows 2000
- Effect of system default locale on application
- ANSI applications require appropriate system
locale setting - ANSI/Unicode applications may require system
locale setting (more on this later) - Pure Unicode applications work with any system
locale - Native Unicode support
- Important New scripts will have no codepage, the
support is through Unicode only (e.g., Indian
scripts, Armenian, Gregorian)
12Unicode allows processing of Multilingual Content
- System components
- Internet Explorer 5.0 can do amazing things!
- Others Winlogon, File system, Notepad, etc.
- Unicode applications
- Office 2000
- Your application!
13Windows 2000 Multilanguage Version
- Language of menus and dialogs is a
per-user-setting - Installable language modules
- Sold through MOLP, Select, and Enterprise
Agreement - Available to developers through MSDN
14Support for Complex Scripts in Windows 2000
A complex script is one that requires special
processing, such as
- Bi-directional (BiDi) reordering (Arabic, Hebrew)
- Contextual shaping (Arabic, Indic family)
- Display of combining characters (Arabic, Thai,
Indian) - Specialized word-break and justification rules
(Thai) - Disallowing illegal character combinations
(Indian, Thai)
15RTL Orientation, or Mirroring
16Right-to-Left Mirroring API
- One function call will mirror all windows in an
application - Can also mirror selective windows
- APIs to suppress mirroring of bitmaps
- May need to modify coding practices
17Support for Indian Languages in Windows 2000
- APIs handle Devanagari and Tamil text through
Unicode - Locale support
- Time, Date, number, currency formats
- Sorting
- Conversion
- Explicit function calls convert to/from ISCII
- No Windows 98 compatibility mode
18How We Developed Indian Script Support in Windows
2000
- Worked with Government organizations
- Consulted with NCST, CDAC, academics
- Brought engineers from NCST
- Added Indian shaping engines to Uniscribe
- Helped define feature tables for Open Type
- Hired Hindi/Tamil speakers to test
19Complex Scripts in Web pages
- IE 5.0 supports complex scripts, including
Devanagari and Tamil in - Standard HTML text
- DHTML All properties in DOM
- XML
- Recommended encoding is UTF-8
- Place charsetutf-8 in HTTP header
- Allows mixed scripts
20Demo!
21Questions?
22Further Information and Resources
- http//www.microsoft.com/globaldev(Watch for
updates!) - MSJ articles, e.g.,
- Uniscribe http//www.microsoft.com/msj/1198/multi
lang/multilangtop.htm - Multilingual UI Coming April 1999
- Send suggestions to nlshelp_at_microsoft.com
23Break!
24Complex Script Processing in Microsoft Windows
2000 David BrownDevelopment LeadMicrosoft
Corporation
25Agenda
- Overview
- Implementation
- Details
261. Overview
- Distinct language groups
- Mix any and all scripts
- Most apps are easy to develop
- CS Complex Script
27Complex Script Language groups
- Arabic, Hebrew, Indic, Thai, Vietnamese
- Part of ALL versions of Windows 2000
- Enable in Control Panel - Regional Settings
- Turn it on today!
28All scripts, any mix
- Unicode makes representation easy
- Common framework and APIs
- Individual script and font handlers
- Multilingual for no extra effort
29Built into standard system APIs
- Plain text
- ExtTextOut, Drawtext, TabbedTextOut
- System edit control
- Dialog boxes
- Formatted text
- Richedit
- HTML control
- See the Win32 SDK
- Dont write your own formatting
30Font fallback
- Standard system fonts
- For dialogs, plaintext edit controls
- and other plaintext display
- Dialog boxes work automatically
31Summary
- CS support is standard in Windows 2000
- No restrictions on script combinations
- Easy (unless you are implementing your own
formatting)
322. Implementation
- Callouts from GDI and USER
- Performance
- Text broken by script and direction
- Script handlers
- LPK.DLL
33Callouts from GDI and USER
- ExtTextOut, DrawText passed early to LPK.DLL
- Plaintext edit control has many callouts
- Caret placement
- Text measurement
- Line breaking
- Word advance
- Safe, stable changes to OS core
34Fast path for non CS
- Normal GDI 11 char to glyph
- Simple side by side placement
- No CS characters
- If right-to-left, no neutrals
- If Digit substitution, no digits
- Performance is good
35Split by script and direction
- Separate e.g. Devanagari, Tamil, Western
- Left-to-right or right-to-left
- Unicode bidirectional algorithm
- Atomic item of display
36Handler for each script
- Script shaping and reordering
- Devanagari - matra I reordered before consonant
cluster - Tamil - vowel sign O surrounds consonant cluster
- Urdu - Initial, media, final, alone forms
- Various font formats
- Backward compatability
- Shaping - ligatures, contextual forms
- Placement of marks
- Script handlers understand scripts
37Language Pack LPK.DLL
- Apply NLS settings (preferred digits)
- Plaintext edit control
- Calls to Uniscribe string handling
- LPK.DLL is OS ltgt Uniscribe bridge
38Application
USER GDI
LPK.DLL
Uni-scribe
39Summary
- Callouts from GDI and USER
- Performance issues
- Split by script and direction
- Script handlers
- LPK.DLL
403. Details
- Clusters
- Caret placement and Mouse hits
- Word breaking
- Font metrics
- Measuring text
- Metafiles
41Clusters
- Indivisible - Indian, Thai, Vietnamese
- Divisible - Arabic
42Caret, mouse hits
- For indivisible clusters
- Arrow keys skip over clusters
- Del deletes entire cluster
- Backspace decomposes cluster one character at a
time - Arrows and Mouse select whole clusters
- Left click snaps to nearest boundary
- For divisible clusters
- Caret shows proportional position
- Use system controls or query Uniscribe
43Font metrics
44Font metrics
45Font metrics
46Matching fonts
- When CS text is predominant
- Full CS line spacing
- Increase Western height
- When Western text is predominant
- Compromise line spacing
- Accept some clipping
- System edit control
- Line spacing from single font
- Richedit, HTML control
- Line spacing adjusted for multiple fonts
47Measuring text
- Adding characters can make text smaller
48Metafiles
- Device independent
- Store Unicode - Enhanced metafile
- Use ExtTextOut(W)
- Windows adjusts widths for different playback
fonts - Device dependant
- Avoid
- Stores glyphs
- Requires identical font for playback
49Summary
- Caret placement and Mouse hits
- Word breaking
- Font metrics
- Measuring text
- Metafiles
- Format with richedit, MSHTML
50Resources
- Uniscribe - next talk
- OpenType - later today
- Win32 SDK
- Richedit
- RTF
- messages
- Text object model
- HTML control
- HTML
- Document object model
51Questions?
52Conclusions
- Windows 2000 is multilingual
- Included on every CD
- Format with system controls
- is much easier than writing your own
- You can write your own formatting
- Uniscribe provides all you need
53Uniscribe The Unicode Script Processor David
BrownDevelopment LeadMicrosoft Corporation
54Agenda
- Overview
- Layers
- Low level APIs
- High level APIs
551. Overview
- Uniscribe is a DLL
- Client applications
- Hides language details
- Hides OS details
56USP10.DLL
- Platforms
- Windows 2000
- Windows NT 4
- Windows 98
- Windows 95 (excluding Far East)
- Single worldwide binary
- Installs with Windows2000, IE5, Office 2000
57Client applications
- Windows 2000
- Word 2000
- Excel 2000
- Access 2000
- PowerPoint 2000
- MSHTML (IE5)
- Richedit 3
- MS Agent
- Frontpage Express
- HTML/RTF converter
58Hides language details
- Syllable structure (Indian, Thai)
- Contextual shaping (Arabic)
- Caret placement
- Wordbreak
- National digits
- Bidirectional layout (Arabic, Hebrew)
59Hides Unicode OS details
- APIs are Unicode on all platforms
- Hides glyph codes
- Hides font differences
- Shaping tables
- Fixed repetoire fonts
60Summary
- Cross platform Unicode display API
612. Layers
- Win32 glyph support
- OpenType
- Shaping engines
- Low level APIs
- Formatted text, Full control, Less simple
- High level APIs
- Plaintext, Simple
62Win32 API
- Truetype fonts
- Internally indexed by glyph
- Glyph manipulation
- ExtTextOut(ETO_GLYPHINDEX)
- GetGlyphOutline(GGO_GLYPHINDEX)
- Font table access
- GetFontData
63OpenType
- Provides standard table structures
- Contextual glyph substitution
- Mark to base attachment
- Defines instances for scripts
- Examples
- Initial form of Arabic letter
- Half form of Devanagari consonant
- Attachment position for Nukta
64Shaping engines
- Per script
- Understand language rules
- Understand font features
- OpenType provides full control
- Many older fixed layout fonts
65Low level APIs
- Low level item support for formatting apps
- Break string by script and direction
- Shaping
- Caret and mouse
- Word breaking, justification
66High level APIs
- Simple string support for LPK and plaintext apps
- Features not in low level APIs
- Font fallback
- Tabstops
- Bidi highlighting
- Similar functionality to ExtTextOut, DrawText,
TabbedTextOut
67Summary
- High level plaintext APIs
- Low level formatting APIs
- Shaping engines
- OpenType
- Win32 API
683. Low level APIs
- Formatting text
- Style runs
- Measurement
- Paragraph filling
- Rendering
69One run
70Script and Direction Boundaries
- ScriptItemize generates items
- Each item has single script and direction
- Implements the Unicode Bidi algorithm
- Application must merge items into its own style
runs - Runs are unique in
- Font, Style
- Script, Direction
71Glyphs and Metrics
- One run at a time
- ScriptShape generates
- glyphs,
- glyph attributes
- map of character to glyph buffer offsets
- ScriptPlace generates
- advance widths
- combining character x,y offsets
72Line Filling
- Measure runs in logical order until the line
overflows - ScriptBreak provides codepoint attribute
information - Whitespace
- Start of word for scripts such as Thai
- Break the overflow run using these attributes
73Word breaking
- ScriptBreak
- Thai, Khmer run words together
- This is 5 words
- Grammatical analysis
- Dictionary
74Layout and Rendering
- ScriptLayout for visual order
- Embedding levels from ScriptItemize
- Use for generic multilingual support
- ScriptTextOut renders each run
- Glyphs from ScriptShape
- Positions from ScriptPlace
75Caret Placement Mouse Hits
- ScriptXtoCP, ScriptCPtoX
- CP - character position
- X - horizontal coordinate
- Which edge?
- In bi-directional text the trailing edge of one
character is not necessarily adjacent to the
leading edge of the next character
76Leading and Trailing edges
77Font Fallback
- The more scripts you support the more you need
font fallback - ScriptShape returns HRESULT USP_E_SCRIPT_NOT_IN_FO
NT if you ask it to shape a run that the selected
font cannot support
78Summary
- Script
- Itemize
- Shape, Place
- Break, Layout
- TextOut
- CPtoX, XtoCP
794. High level APIS
- Purpose
- Analysis
- Display
- Font fallback
80Purpose
- For Windows 2000
- ExtTextOut
- DrawText
- System edit control
- Cross-platform Unicode plaintext display
- Easier than low level APIs
81Analyze
- ScriptStringAnalyse
- Itemizes, shapes, places etc.
- Features
- Variety of tabbing options
- Clipping, justification
- Font fallback
- Control character representation
- hotkey substitution
- password entry
- Returns handle to analysis
82Querying the analysis
- ScriptString
- Size
- pcOutChars
- pLogAttr
- GetOrder
- CPtoX, XtoCP
- GetLogicalWidths
- Validate
83Displaying and freeing
- ScriptStringOut
- Clipping rect like ExtTextOut
- Selection highlighting
- ScriptStringFree
84Bidi highlighting
- Arabic letters right-to-left
- Arabic numbers left-to-right
85Font fallback
- When font in HDC is missing
- Codepoints
- Fallback clusters with codepoints not present in
the font - Scripts
- Fallback when Item script not supported by the
font - Finally to GDI for Far East font linking
- Requires Microsoft Sans Serif
86Summary
- ScriptString
- Analyse
- query analysis ...
- Out
- Free
87Demo
88Resources
- OpenType talk
- Complex script sample CSSAMP
- Win32 SDK
- Microsoft Systems Journal
- November 1998
89Questions?
90Conclusion
- Unicode plaintext display APIs
- Unicode formatted text support APIs
- Cross-platform
91Lunch!