Introducing Voyager with Unicode

About This Presentation

Title:

Introducing Voyager with Unicode

Description:

Voyager with Unicode : A Catalogers Session Connie Braun Training Consultant Agenda Release Update General release occurred October 6, 2004! 4 production ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 69

Provided by: libUwater

Category:

more less

Transcript and Presenter's Notes

Title: Introducing Voyager with Unicode

1
Voyager with Unicode A Catalogers Session
Connie Braun Training Consultant
2
Agenda
Introduction Your Work Environment
Conversion New Features Learning More QA
3
Release Update

General release occurred October 6, 2004!
4 production partners
1 Windows Server, 3 Solaris
8 test server partners
4 Task Force members (large non-roman
collections)
1 large consortia with Universal Borrowing
Universal Catalog
2 European customers
As of 01/20/05, 71 customers have upgraded and
are functioning in a production environment with
Voyager with Unicode. Approximately 50 upgrades
are scheduled between now and May 2005.

4
Why Unicode in Voyager?

Brings Voyager up to current IT standards
Finds and displays records in the native
language
Create and edit any MARC record using UTF-8
Import and export of records with any supported
character set
Operators may select a Unicode-compliant font of
their choice
Display Unicode characters in OPAC without
proprietary software

5
Implementing Voyager with Unicode
For our customers, its business as usual, but
with some interesting changes and improvements,
especially in Cataloging. Helping everyone to
implement a Unicode-compliant system is
Endeavors aim. The Unicode standard is an
important step towards realizing that
goal. Implementing the Unicode standard is an
extension of Endeavors original mission access
to information regardless of location or format.
6
Following Standards

Follows Standards (not proprietary)
See http//www.unicode.org for much more detail
on these standards.
See http//lcweb.loc.gov/marc/specifications/specc
harucs.html for details on LCs format of MARC
records that use Unicode. Voyager follows this
specification.
Specifics on the Code Tables may be viewed at
http//www.loc.gov/marc/specifications/specchartab
les.html
The Voyager implementation of the Unicode
standard gives libraries and their users greater
flexibility when accessing collection materials
that contain both Roman and non-Roman text.

7
Multilingual Input and Display

By introducing improved multilingual input and
display capabilities in Voyager, characters now
display correctly according to the Unicode and
MARC standards.
Greater script coverage for cataloging items in
your collections, published in languages around
the world.
How many? The total number of possible characters
for UTF-8 is 2,147,483,648!

8
Preview Server

Anyone interested in trying out Voyager with
Unicode before your upgrade? You can!
http//support.endinfosys.com/cust/voy/upgrade/uni
code/testwv_pre.html provides all the details
necessary to get you started
Preview Server uses the Voyager training database
that has been augmented with numerous records in
both Roman and non-Roman languages
Try keyword searches
non roman script japanese
non roman script arabic
roman script french
roman script italian

9
Agenda

Introduction
Your Work Environment
Workstation Requirements
Setting Up For Languages Other Than English
Tag Tables
Session Defaults and Preferences
Conversion
New Features
QA

10
Workstation Requirements

In order to enjoy the full range of benefits, PCs
must have up-to-date operating systems and
productivity software.
This means that staff PCs will need
Windows 2000 or XP operating system
Unicode standard compliant Internet browser
IE 6
Netscape 6
Unicode-compliant font Lucida and Arial Unicode
MS

11
MS Windows

Voyager is more integrated with Windows in terms
of
Standard Windows 2000/XP Unicode support
Standard Unicode fonts
Standard input using Input Method Editors (IMEs)
Standard browser support

12
Setting Up for Languages Other Than English

Workstations need to be specifically configured
to work with languages other than English
Likely will require technical IT assistance to
install needed languages on staff PCs
Best to install all languages so that cataloger
may easily include new ones as necessary

13
Adding Languages to PCs

Regional and language options are specific to
each PC
Among options available via Start Settings
Control Panel
Details button on Languages tab lets operator
view or change languages and methods to enter
text
Can include supplemental language support, too

14
Choosing Languages

Languages added to PCs will match languages for
items found in your collections
Add and remove according to your needs as few or
many as necessary
May also set preferences for language bar and key
settings

15
Tag Tables

MARC Tag Tables have been completely revised and
rewritten for Voyager with Unicode

16
Tag Tables

Ability to modify tag table configuration remains
the same as in earlier releases
But, may not specify anything for Leader position
9 since that byte is now hard-coded to identify
records that have been converted to UTF-8
May want to consider whether or not library will
need or want to revise Tag Tables for local use
See Appendix A of Cataloging Users Guide for
full details on revising, maintaining and
updating the Tag Tables

17
Record Validation

MARC validation
MARC21 character set validation
Authority control validation
Decomposition of accented characters for MARC21

18
Session Defaults and PreferencesRecord
Validation

Bypass MARC21 Character set validation
Uses MARC21 Repertoire.cfg to control validation
of the MARC21 character set
Helps to enforce MARC21 standard
Bypass Decomposition of accented characters for
MARC21
Allows records to be saved to the database
without decomposing the characters
IMPORTANT If you select this option, MARC21
rules are ignored. We strongly recommend that
this check box be un-checked, in order to comply
with the MARC21 standard.

19
Session Defaults and PreferencesMapping Tab

Expected Character Set of Imported Records now
has six options

20
Session Defaults and Preferences Colors/Fonts Tab
21
Agenda

Introduction
Your Work Environment
Conversion
Data Conversion
Conversion Error Logging
Conversion Details
Identifying Non-Unicode Data
The Rest of Voyager
New Features
Learning More
QA

22
Data Conversion

Conversion process during upgrade treats data
differently than when importing records through
Cataloging client or via BulkImport
MARC records are converted from VRLIN (Voyager
legacy encoding) to MARC21 compliant UTF-8
encoding
Leader position 9 becomes an a
Conversion Log Created
UTF-8 allows for variable length characters. The
majority of characters in the database occupy the
same amount of space as before conversion.
Note All indexes and database columns with MARC
data are regenerated after conversion.

23
Conversion Details

IMPORTANT! NO RECORDS ARE LOST
Each field in the record handled individually.
As each field is processed, it may change length,
requiring adjustments to the leader and directory
of the record.
Records are saved to the database with a leader
position 9 a.
Both record-level and field-level checking are
performed. In rare cases an entire record might
fail conversion it is more likely that an
individual field fails to be converted.
Records may not convert if they contain text that
cannot be mapped into Unicode according to the
standard MARC-8 to Unicode mappings.
Records that do not convert are stored in the
database as is, without being converted to
Unicode.

24
Conversion Error Logging

Libraries need to know the details about the
results of the conversion process.
Full error checking and logging is included as
part of the upgrade
Technical Users Guide, Chapter 4
Cataloging Users Guide, Appendix C
Library designates should review this file to
plan for correcting any records that have errors

25
Sample from Conversion Log File

26
Conversion Log Details 1

1 2 3 4 5 6 7
11 secs read982 changed791 8800 okay982
errors0 written982
21 secs read1931 changed1558 8800 okay1931
errors0 written1931
29 secs read2848 changed2087 8800 okay2848
errors0 written2848
36 secs read3699 changed2533 8800 okay3699
errors0 written3699
43 secs read4607 changed3076 8800 okay4607
errors0 written4607
51 secs read5519 changed3610 8800 okay5519
errors0 written5519

Legend 1 number of seconds used by job so far 2 readnumber of records processed 3 changednumber of records changed 4 880how many records contain 880s 5 okay records processed successfully 6 errors records not processed due to errors 7 written records written to the database
27
Conversion Log Details 2

1 2 3 4 5 6 7 8
bib 6213 17(700) c-gt8 loose char page0 at
20 '091e ..
9
bib 35322 14(856) c-gt8 undefined char page0
at 61 'fc7220486973746f .r Histo
10
bib 35516 23(856) c-gt8 no char to combine to
page0 at 82 '1e .

1 record type and id 2 index within record of field that generated error 3 tag that generated error 4 c-gt8 indicates conversion to UTF-8 encoding 5 description of error 6 pagesubset to which source character belongs 7 at position of source character that caused error 8 hex dump of source character 9 description of error 10 description of error
28
Conversion Log Details 3
loose char a warning message indicating that a character not strictly part of Voyager encoding has been converted (e.g. unexpected carriage return)
no char to combine to a warning message indicating that a combining character appeared but it lacks a base character with which to combine (e.g. umlaut but no a, o, u base letter)
undefined char an error message indicating that there is a single character that cannot be mapped to UTF-8
29
Identifying non-Unicode data

To identify a non-Unicode record in the
Cataloging client, select a color for Conversion
records in Session Defaults and Preferences gt
Colors-Fonts tab.

30
Identifying non-Unicode data

Any non-converted record displays in the color
selected in Options/Preferences.

31
Identifying non-Unicode data

There are other ways to identify records that
have conversion errors.

Records that cannot be converted to Unicode are
viewable in the Cataloging module with nc (not
converted) displayed in the Title Bar.
Any characters that cannot be matched or
recognized are replaced with a Unicode
substitution character.
32
Fonts and Unicode

A MARC record may contain non-Roman characters
even though you cannot see them.
Records are sure to display correctly if a
Unicode-compliant font has been selected.
Lucida Sans Unicode installed by default with
Windows
Arial Unicode MS
Good choice for libraries with mixed cataloging
Included with Microsoft Office and other
Microsoft products

33
The Rest of Voyager

Non-MARC data is not converted
Acquisitions data
Circulation data (patron info, etc.)
Item data
Reporter
Not Unicode standard compliant
Translates data to LATIN1
Dots appear where you used to see squares

34
Agenda

Introduction
Your Work Environment
Conversion
New Features
Cataloging
Diacritics Special Characters, Importing
Records, New Record Views, Search URIs
WebVoyáge
Browsers, Searching, Displaying
Interacting with Other Systems
Learning More
QA

35
Diacritic and Special Character Entry

Cataloging practices then and now
Pre-Unicode input in Cataloging accent
character (diacritic) precedes the base
character.
Example Espana
Post-Unicode input in Cataloging accent
character (diacritic) follows the base character.
Example Espana
Ability to display combined characters is an
improvement over past versions and a way to
insure accurate entry
Example España

36
Special Characters.cfg
SpecialCharacters.cfg, located in the
C\Voyager\Catalog folder, defines the content of
the special character entry dialog box. Operators
may define their most frequently used characters
here.
37
Special Character Entry
This is what the dialog box in Cataloging looks
like.
The key press column identifies the keyboard
equivalent that may be used instead of turning on
Special Character Mode in Cataloging.
38
Finding Little Used Characters

For situations where a character not part of the
Special Characters list is needed, operator can
use Character Map from MS Windows
Start Programs Accessories System Tools
Character Map
Locate character or perform search
Select and Copy character, then paste into
position in bib record

39
Cataloging Input of Non-Roman Text
Voyager with Unicode allows Cataloging operators
to use all of the standard Microsoft Windows
keyboard and input method editors (IMEs). With
this functionality in place, operators may search
for, display, and edit the contents of all MARC
records using the full range of UTF-8
characters. Entire JACKPHY group is part of the
UTF-8 character set which includes right-to-left
input needed for Arabic, Persian, Hebrew and
Yiddish. Reminder JACKPHY Japanese, Arabic,
Chinese, Korean, Persian, Hebrew, Yiddish
40
Linking in a MARC21 Record
Tag I1 I2 Subfield Data
100 1 6 880-01 a An, Zhen.
245 1 0 6 880-02 a Ri yue yun yan / c An Zhen zhu.
250 6 880-03 a Di 1 ban.
260 6 880-04 a Changchun Shi b Changchun chu ban she, c 1997.
300 a 4, 2, 291 p. c 21 cm.
440 0 6 880-05 a Zhongguo li dai wang chao xing shuai qu shi lu
500 a Non-Roman script Chinese
651 0 a China x History y Ming dynasty, 1368-1644.
880 1 6 100-01/1 a ? ?.
880 1 0 6 245-02/1 a ?? ?? / c ? ? ?.
880 6 250-03/1 a ?1?.
880 6 260-04/1 a ??? b ?? ???,c 1997.
880 0 6 440-05/1 a ?? ?? ?? ?? ???
41
Using On-Screen Keyboard

Typically, the path is StartProgramsAccessories
AccessibilityOn-Screen Keyboard

42
Importing Records

Conversion process is separate and distinct from
the process of importing records
Important distinction for operators who import
records through the Cataloging client or via
BulkImport
Expected character set needs to be accurately
identified if records are to be imported
correctly
Some experimentation may be necessary to
determine the correct character set
Lets look at some details to help everyone
understand what is happening

43
Record Exchange Scenarios
44
Voyager 2001.2 and earlier

In Voyager 2001.2 and earlier, there were several
options from which to choose regarding the
character set
Latin1
OCLC
RLIN legacy
MARC21 MARC8
Until now it has been quite simple to choose the
correct option when importing records through the
Cataloging client or processing large numbers of
records through BulkImport.

45
After Upgrade to Voyager 2003.1

From Voyager 2003.1 forward, there are numerous
options from which to choose regarding the
character set
Latin1 (non-Unicode)
MARC21 MARC8 (non-Unicode)
MARC21 UTF8
OCLC (non-Unicode)
RLIN legacy (non-Unicode)
Voyager legacy (non-Unicode)
With Voyager 2003.1 and beyond, it is very
important to determine the character set of
records before importing records through the
Cataloging client or processing large numbers of
records through BulkImport. Some experimentation
may be necessary.
transition to MARC21 UTF8 occurs as Unicode
standard becomes pervasive

46
One Year From Now

In Voyager 2003.1 and beyond, numerous options
for character sets will continue to be needed
Latin1 (non-Unicode)
MARC21 MARC8 (non-Unicode)
MARC21 UTF8
OCLC (non-Unicode)
RLIN legacy (non-Unicode)
Voyager legacy (non-Unicode)
But, the Unicode standard will be much more
pervasive, having been adopted and deployed by
bibliographic utilities, vendors who massage
records, vendors who supply records, and others.
This means that selecting the correct option will
again be simpler, even though knowing the
character sets will continue to be very
important.

47
Bulk Import

Bulk Import of MARC Records
Fundamentally the same as before
Leader byte 9 is checked against the incoming
character set identified in the import rule.
Blank non-Unicode converted imported
a Unicode imported
Neither Blank nor a errors out not imported
See log.imp.yyyymmdd for details on import
success
Records that cannot be converted are not
imported found in err.imp.yyyymmdd

48
Bulk Import and Expected Character Set

Character set mapping for Bulk Import is
designated in the Bulk Import rule in SysAdmin gt
Cataloging gt Bulk Import Rules.

49
MARC Export

Default export character set is MARC21 UTF-8
Use the a option to choose different character
set (in the command line)
See page 10-8, in Technical Users Guide for more
detail
LATIN1 records will get a dot exported for
characters outside the LATIN1 character set
If mapping for a composed character is not found,
it decomposes and Voyager attempts to find a
match for each part.

50
New ISBN Indexes

For improved duplicate detection
New ISBN Index
020N 020a Number only
020R 020z Number only
020 a 1234567890 (Knopf)
020 a 1234567890
? Check Bibliographic and Authority duplicate
detection profiles in System Administration!

51
HTTP Posting

Much easier access to WebVoyáge display from
clients
Available in Cataloging, Acquisitions
Circulation
Toggle record view from staff client to WebVoyáge
Record menu in Cataloging contains a Send Record
to option
Send Record To WebVoyáge
LinkFinderPlus available in Cataloging,
Acquisitions Circulation
Record menu in Cataloging contains a Send Record
to option
Send Record To LinkFinderPlus
Configured in voyager.ini file MARC POSTing
stanza

52
Enabling HTTP Posting

To enable HTTP posting, a stanza is added to
the voyager.ini file. An example is shown below.
MARC POSTing
WebVoyage"http//train20031-c1db.comet.endinfosys
.com/cgi-bin/Pbibredirect.cgi"
LinkfinderPlus"http//207.56.64.116/cgi-bin/Phttp
linkresolver.cgi"

53
Easier Access to OPAC Display

Send Record To.in Cataloging

Send Record To.in Acquisitions

54
Search URI

Staff Client Search URI in Cataloging,
Circulation and Acquisitions
Drive searches to resources on the web
Add new button to search interface in staff
clients
Click buttona browser is opened search is
executed
This is PC specific (voyager.ini)
Possible applications
Link to another OPAC
Link to one of your vendors
Link to an online book seller

55
Presenting Search URI
Staff client search URI
Available in Cataloging, Circulation, and
Acquisitions
56
Adding Search URIs

clipped from voyager.ini
SearchURI
NameGoogle
URIhttp//www.google.com
CopyY
SearchSyntax/search?qltsearchtextgt
NameBarnesNoble
URIhttp//search.barnesandnoble.com
CopyY
SearchSyntax/booksearch/results.asp?WRDltsearcht
extgt
NameGale Group
URIhttp//www.galegroup.com
CopyY
SearchSyntax/servlet/SearchPageServlet?region9
imprintltsearchtextgt

57
WebVoyáge and Unicode

MARC data supplied to the browser in UTF-8
IE 6 generally displays Unicode characters
correctly. Some characters do not display
correctly unless a Unicode-compliant font is
selected.
Netscape 6 figures out that it needs to display
Unicode characters without any special settings
Consider new help text in your OPAC to help
patrons understand about language options,
especially if there are records using different
languages in your database
New UTF-8 download/save format

58
Searching in WebVoyáge

Search and display in native languages for staff
and users.
WebVoyáge and Cataloging allow Unicode character
input you can search for and retrieve records in
native languages.
Record display includes non-Latin scripts,
including right-to-left scripts like Arabic and
Hebrew. Voyager takes advantage of the web
browsers native rendering support.

59
Records with Other Languages in the OPAC
60
Displaying Records in WebVoyáge
61
Linking in a MARC21 Record
Tag I1 I2 Subfield Data
100 1 6 880-01 a An, Zhen.
245 1 0 6 880-02 a Ri yue yun yan / c An Zhen zhu.
250 6 880-03 a Di 1 ban.
260 6 880-04 a Changchun Shi b Changchun chu ban she, c 1997.
300 a 4, 2, 291 p. c 21 cm.
440 0 6 880-05 a Zhongguo li dai wang chao xing shuai qu shi lu
500 a Non-Roman script Chinese
651 0 a China x History y Ming dynasty, 1368-1644.
880 1 6 100-01/1 a ? ?.
880 1 0 6 245-02/1 a ?? ?? / c ? ? ?.
880 6 250-03/1 a ?1?.
880 6 260-04/1 a ??? b ?? ???,c 1997.
880 0 6 440-05/1 a ?? ?? ?? ?? ???
62
Interacting with Other Systems

Incoming Z39.50 Connections
Records in Unicode databases are UTF8 encoded
z3950svr may send either or both MARC8-encoded or
UTF8-encoded records
Default is set to send MARC8 encoded records
But, two different z3950svr ports can be
configured to provide records in both formats,
thereby accommodating all sites connecting to
database

63
Interacting with Other Systems

Outgoing Z39.50 Connections
Retrieves and displays records of any type in
UTF-8
Converts incoming records based on new Database
Definitions setting in System Administration
called Source Character Set
Latin1 (non Unicode)
MARC 21 MARC8 (non Unicode)
MARC21 UTF8
OCLC (non Unicode)
RLIN legacy (non Unicode)
Voyager legacy (non Unicode)

64
Agenda
Introduction Your Work Environment
Conversion New Features Learning More Final QA
65
If you want to know more about..
Coded Character Sets - EndUser 2004 Session
29 Title Coded Character Sets A Technical
Primer for Librarians Presenters Michael Doran,
Systems Librarian, University of Texas at
Arlington Dan Sweeney, Business Analyst II,
Endeavor Information Systems Great Website
http//rocky.uta.edu/doran/charsets/ Strategie
s and Tools for Cleaning Up Your Data -- EndUser
2004 Session 45 Title Transitioning To Unicode
Strategies for Tidying Your Data Presenters Fran
Budde, Acquisitions Cataloging Specialist,
Pacific Lutheran University Francesca Lane
Rasmus, Director, Technical Services, Pacific
Lutheran University Layne Nordgren, Director of
Instructional Technologies/Library Systems,
Pacific Lutheran University
66
If you want to know more about..