Title: Content Creation and Processing
1Content Creation and Processing
T.B. Rajashekar National Centre for Science
Information (NCSI)Indian Institute of Science
Bangalore - 560 012 (E-Mail raja_at_ncsi.iisc.ernet
.in)
2Content Creation and Processing
- Introduction
- How are the applications accessed?
- Issues in content hosting
- Content formats
- Tools for content creation processing
- Related sources
3Introduction
- We have discussed different applications and
levels of application in the Applications
Overview session - In this session we shall focus on issues related
to creation and management of content for these
applications, content formats and content
creation tools - A demonstration of some of these tools will be
given in the next session
4Applications
5How Are the Applications Accessed?
- Typically, access to all these services is
integrated through the library web site (library
web site design will be discussed in the next
session) - User visits the library web site using a web
browser and browses through HTML pages - Selects a link pointing to content likely to
provide the information and accesses the
information source/ service/ application - Content has to be delivered in browser compatible
format. How do we achieve this?
6Issues In Content Hosting
- Key questions to ask while hosting content
- What content I want to provide access?
- What is the delivery medium?
- Web
- Browser compatible content
- E-Mail
- Push-based services, discussion forums
- Telnet
- Legacy databases (non-web aware)
- FTP
- Download files
7Issues In Content Hosting
- Key questions to ask while hosting content
- Web-enabled content
- Browser aware ? (HTML, GIF, JPEG)
- Other formats will require helper applications/
plug-ins - (e.g. PDF Acrobat Reader, DOC MS Word)
- How is this going to be accessed?
- Browse
- Search
- Site search
- Content specific search
- Search level Full text, field specific
- Browse Search
8Issues In Content Hosting
- Key questions to ask while hosting content
- Who is going to access this? (download time)
- Intranet user
- Internet user
- How is the content stored (granularity)?
- File
- Record (in a database)
- How is this content going to be created?
- Paper to Web
- Scanning and conversion
- Data entry
- Electronic to Web (e.g. Word to HTML or PDF)
9Issues In Content Hosting
- Key questions to ask while hosting content
- What content creation/conversion tools are to be
used? - Free
- Commercial
- Security considerations for purchased content
(Locally hosted, Remote sources) - How do we implement access restrictions?
- How do we facilitate easy access to all internal
users? - Based on IP range, domain name
- User name/ password (how to administer this?)
10Content Formats
- Content Information about library its
services, policy/plan documents, e-journals,
annual reports, manuals - Running text with/without graphics, tables,
charts, multimedia - Storage File
- Access Browse and select (e.g. Doc title ?
Content page ? Section/ chapter) - Content formats HTML, PDF, ASCII (text), images
(GIF, JPEG), audio (.WAV, .RA), video (.AVI,
.MOV, .MPG), etc - Search Site search, collection specific search
11File level content (example 1)
IISc 22-26 Nov99
IIRML P4
11
12File level content (example 1) (contd.)
IISc 22-26 Nov99
IIRML P4
12
13File level content (example 1) (contd.)
IISc 22-26 Nov99
IIRML P4
13
14File level content (example 2)
IISc 22-26 Nov99
IIRML P4
14
15File level content (example 2) (contd.)
IISc 22-26 Nov99
IIRML P4
15
16File level content (example 2) (contd.)
IISc 22-26 Nov99
IIRML P4
16
17Content Formats
- Content OPAC
- How to handle non-web enabled OPACs (legacy
library automation/ DBMS packages)? - Export OPAC records into a text files
- Index using a web enabled software (e.g. MG,
Free-WAIS) - Export OPAC records into ISO 2709 format
- Import into a CDS/ISIS database
- Use CDS/ISIS web tools (e.g. WWWISIS)
- CDS/ISIS databases
- Use CDS/ISIS web tools (e.g. WWWISIS)
IISc 22-26 Nov99
18Content Formats
- Content Bibliographic, experts, institutions,
projects - Document surrogates (meta data)
- Storage Database
- Structured (RDBMS)
- Unstructured (text-oriented)
- Access
- Database title ? Record/field -based search form
- Tools WWWISIS, MG, SQL, free-WAIS
- Content format Database specific
- Could also be used to provide access to full text
at file level using hypertext linking
19Record level content
IISc 22-26 Nov99
IIRML P4
19
20Record level content(contd.)
IISc 22-26 Nov99
IIRML P4
20
21Record level content (contd.)
IISc 22-26 Nov99
IIRML P4
21
22Record level content (contd.)
IISc 22-26 Nov99
IIRML P4
22
23Record level content (RDBMS)
IISc 22-26 Nov99
IIRML P4
23
24Record level content (RDBMS) (contd.)
IISc 22-26 Nov99
IIRML P4
24
25Content Formats
- Content Web access to networked CD-ROM databases
- Solutions now exist to provide platform
independent access to Windows applications on the
Internet - Ex. Citrixs Metaframe using ICA
- ICA client can be used alone or as a plug-in to
Web browsers to access Windows applications - We can thus present an integrated list of
networked databases on the library web site and
allow users to access these using web browsers
from different platforms - Trend is towards hard disc hosting for web access
(e.g. ERL, OVID)
26Hard disc hosting of CD-ROM content using ERL
IISc 22-26 Nov99
IIRML P4
26
27Hard disc hosting of CD-ROM content using ERL
(contd.)
IISc 22-26 Nov99
IIRML P4
27
28Hard disc hosting of CD-ROM content using ERL
(contd.)
IISc 22-26 Nov99
IIRML P4
28
29Hard disc hosting of CD-ROM content using ERL
(contd.)
IISc 22-26 Nov99
IIRML P4
29
30Content Formats
- Content Remote information sources
- Most library web sites provide links to Internet
sites (subscribed and/or free) of relevance to
its users (Internet resource catalogue/gateway) - Typical details include title, description,
keywords, source type, access details, etc. - Resource description (meta data) standards
Dublin Core - Format
- Simple HTML-based listings (by subject, source
type) - Database driven catalogues, supporting keyword
search and subject/ source type browsing - Z39.50 access to library catalogues (using client
softwares like Bookwhere)
31Internet resource catalogue - e-journals (RDBMS)
IISc 22-26 Nov99
IIRML P4
31
32Internet resource catalogue - e-journals (RDBMS)
(contd.)
IISc 22-26 Nov99
IIRML P4
32
33Internet resource catalogue - e-journals (RDBMS)
(contd.)
IISc 22-26 Nov99
IIRML P4
33
34Content Formats
- Content Push Services
- Current Awareness Services (e.g. list of
additions, news letters) - Format HTML, ASCII
- Delivery E-Mail
- Also hosted on the web site
- Profile-based alerting services
- Web-based profile set-up and maintenance by
individual users - Processing Extraction from databases
- Delivery E-Mail
35An example Push service
IISc 22-26 Nov99
IIRML P4
35
36An example Push service (profile set up)
IISc 22-26 Nov99
IIRML P4
36
37An example Push service (modification)
IISc 22-26 Nov99
IIRML P4
37
38Content Formats Summary
- Major content types in library web sites
- Web pages (HTML), Images (GIF, JPEG), Text
(ASCII), Text (PDF), Search-based content from
databases - Other content types
- DHTML (interactive web pages), Audio (WAV, RA,
MP3), Video (AVI, MOV, MPEG), Animations (GIF,
Flash), Multimedia presentations (Macromedia
Director) - Emerging formats
- XML (Extensible Markup Language)
39Tools for Content Creation Processing
- In the next session we will demonstrate a few
content creation and conversion tools - HTML Netscape Composer, MS Frontpage
- Imaging (GIF, JPEG) Adobe Photoshop and Paint
Shop Pro - OCR TextBridge
- PDF Adobe Acrobat Exchange
- Conversion tools (Word-to-HTML, text-to-HTML)
- DHTML
- Audio capture and conversion (Real Audio and MP3)
40Tools for Content Creation Processing
- On the fifth day we will go into details of
search-based content and demonstrate a few tools - Web site indexing (HtDig)
- Bibliographic databases (WWWISIS)
- Bibliographic and full text databases (MG)
- RDBMS access using ODBC (FOXPRO ORACLE)
41Related Sources
- World Wide Web Consortium - Developments related
to Web technology, standards, tools, guidelines,
etc. (www.w3.org) - Digital Library Sunsite - Digitisation tools and
resources (sunsite.berkeley.edu) - D-Lib magazine e-journal reporting new
developments in digital libraries, including
tools for handling digital content (www.dlib.org)
42Related Sources
- From Paper to Web How to make information
instantly accessible (Tony McKinley)
(imagebiz.com) - Digitisation of exam papers (Andrew Hampson et
al) (The Electronic Library, 17(4), Aug 1999,
239-46) - Web publishing with Acrobat and PDF (Bruce Page
and Diana Holm, 1996. John Wiley)
43Related Sources
- Networked Digital Library of Theses and
Dissertations (www.ndltd.org) - Guide to Networked Resources and Tools (GNRT)
(www.terena.nl/libr/gnrt) - IFLA (www.ifla.org) particularly
(www.ifla.org/I/training/html)