Title: Dynamic Web File Format Transformation
1Dynamic Web File Format Transformation
Masters Project Presentation Old Dominion
University April 18th, 2005 Presented by Daniel
S. Swaney
2The Problem
- Old Content on the Web is becoming harder to
access/view - Viewing images in a web page
- Blank Placeholders appearing where images should
appear - 3rd Party Browser plug-ins eventually become old
and hard to find - Some Images are not appearing as intended by the
author (PNG Images)
3The Problem
- Old Content on the Web is becoming harder to
access/view - Viewing documents downloaded using a web browser
- Typically requires a browser plug-in or external
software to view (e.g. GhostView for postscript,
Adobe Acrobat Reader for pdf files) - What happens when the software becomes old and
hard to find?(e.g. old pkzip formats from
pre-1996)
4Existing Work
- TOM (Typed Object Model)
- Fred (Format Registry Demonstration)
- JHOVE (JSTOR/Harvard Object Validation
Environment) - LOCKSS (Lots of Copies Keep Stuff Safe)
5Existing Work
- TOM (Typed Object Model) http//tom.library.
upenn.edu/convert/ - John Mark Ockerbloom, 1998
- University of Pennsylvania
- Input content is uploaded as a single file
- Server transforms file to a target format
- Output content can be downloaded from a cache
location on a TOM server - Good Source for Knowledge of Formats
6Existing Work
- Fred (Format Registry Demonstration)
http//tom.library.upenn.edu/fred/ - Stephen Abrams of Harvard, December 23, 2003
- Uses TOM technology
- Demonstrates need for Global Digital Format
Registry for more information than just a mime
type. - http//hul.harvard.edu/gdfr/
- Good Source beyond just using Mime Types
7Existing Work
- JHOVE (JSTOR/Harvard Object Validation
Environment) http//hul.harvard.edu/jhove/inde
x.html - JSTOR and Harvard University Library(2003-present
) - Examines and Reports a digital objects
representative information based on the OAIS
reference model. - JHOVE has a scalable architecture with enough
knowledge to process formats for identifiable
information in a common way other than just mime
type.
8Existing Work
- LOCKSS (Lots of Copies Keep Stuff Safe)
- http//www.dlib.org/dlib/january05/rosenthal/01ros
enthal.html - Preserves original digital content from the
publisher - Experimented with transparent format migration
as a proof-of-concept using GIF-to-PNG with the
FireFox Web Browser - Recent development showing the need for a
transparent solution
9Transparent Approach when Web Browsing
10Identifying the Problem
- Colormap PNG Browser Image Test
- http//www.ecs.soton.ac.uk/njl98r/png-test/alpha
/cmap.html - Photography and the problems with image
preservation http//www.kenrockwell.com/tech/raw
.htm - Mentions caution about JPEG 2000 and RAW Formats
11Identifying the Problem
- Browser Image Tester http//entropymine.com/jason/
testbed/imgfmts/ - XBM (X-Windows Bitmap)
- Not Typically Viewable in Netscape/IE
12Demo
- Browser Issues with PNG
- XBM Format not displayable in Internet Explorer 6
- Yahoo Movies Pictures
- http//localhost/Test/Web_Browser_Image_Format_Tes
t.html - http//localhost/Test/JPEG_vs_RAW_vs_TIFF.htm
13Licensing Problems
- The GIF and TIF formats cant be directly
converted without a license from Unisys for the
LZW algorithm (or using a 3rd party tool that has
the license) - Microsoft doesnt publish their file formats
publicly making it difficult to ensure older
Microsoft formats will still be readable.
14Configuring the Grace Translation Service
Profile
- A Profile uniquely identifies
- a user or group of users
- preferences
- Contains a list of one or more transform rules
to follow - - Groups of users could be identified by a
list or range of IP Addresses - - For efficiency, one profile could be
- used for all users
- - Defines the ports to use for proxying
End-User
15Configuring the Grace Translation Service
Profile
End-User
- A Transform defines one rule of
- transformation to convert from
- a source mime type to
- a target mime type
- - Defines whether this rule uses an
- internally known transform module
- (e.g. ImageMagick)
- - Or an external plug-in module
- for any 3rd party translation
- mechanism/software product
Transform
16Configuring the Grace Translation Service
Profile
End-User
- Can define more than one set of rules
- - When one rule is used, all rules are
- again looked at to see if other rules
- need to be carried out.
- Future Enhancements
- Smart Linking of Transform Rules
- - To eliminate/skip extra steps to do a
- one step transform
- (JPG-GIF-BMP)
- replaced with (JPG-BMP)
-
Transform
Transform
Transform
17TranslateRules.xml (1 of 2)
"profile.dtd" description"Wireless connections on
192.168.2.x" end"192.168.2.255" proxyport8081 targetport80 sform id"001b" description"Tranform
JPG-GIF" image/jpgurce image/GIF
0on TRImageMagick.dll unctionTranslateItem
18TranslateRules.xml (2 of 2)
JPEG-GIF" image/jpegsource image/GIFt 0tion TRImageMagick.dll
TranslateItem
GIF-BMP" image/GIFurce image/bmp
0on TRImageMagick.dll unctionTranslateItem rofile
19Developing External Libraries
Start Library
bool StartLibrary() bool EndLibrary() bool
TranslateItem( LPTSTR sMimeTypeSource, LPTSTR
sMimeTypeTarget, LPBYTE pDataStream, size_t
pdwDataSizeInBytes, LPBYTE ppTranslatedStream,
size_t pdwTranslatedDataSizeInBytes ) void
ReleaseTranslatedItem( LPBYTE ppTranslatedStream
)
Translate Item
ReleaseTranslated Item
End Library
20Developing an External Library
// Handle any special initialization of a 3rd
party library bool StartLibrary() if(
g_nUserCount 0 ) StartupImageMagick()
g_nUserCount return true // Handle any
special unloading of a 3rd party library bool
EndLibrary() g_nUserCount-- if( g_nUserCount
0 ) ShutdownImageMagick() return
true
Start Library
Translate Item
ReleaseTranslated Item
End Library
21Developing an External Library
Start Library
bool TranslateItem( LPTSTR sMimeTypeSource,
LPTSTR sMimeTypeTarget, LPBYTE pDataStream,
size_t pdwDataSizeInBytes, LPBYTE
ppTranslatedStream, size_t pdwTranslatedDataSize
InBytes ) char sMimeTypeSourceRightT
strchr(
sMimeTypeSource, '/' ) char
sMimeTypeTargetRightT strchr(
sMimeTypeTarget,
'/' ) if( sMimeTypeSourceRightT
sMimeTypeTargetRightT )
sMimeTypeSourceRightT sMimeTypeTargetRight
T BYTE pTranslatedDataStream NULL
size_t nTranslatedDataSizeInBytes 0
Translate Item (part 1)
ReleaseTranslated Item
End Library
22Developing an External Library
bool bStatusT ConvertImage(
sMimeTypeSourceRightT,
sMimeTypeTargetRightT, pDataStream,
pdwDataSizeInBytes, pTranslatedDataStrea
m,
nTranslatedDataSizeInBytes ) if( bStatusT
) ppTranslatedStream
pTranslatedDataStream
pdwTranslatedDataSizeInBytes
nTranslatedDataSizeI
nBytes else
bStatus false // if conversion failed
else bStatus false // if a
mime type was not provided return bStatus
Start Library
Translate Item (part 2)
ReleaseTranslated Item
End Library
23Developing an External Library
Start Library
Translate Item
void ReleaseTranslatedItem( LPBYTE
ppTranslatedStream ) char pData (char)
ppTranslatedStream if( pData ) delete
pData
ReleaseTranslated Item
End Library
24OO-Architecture
GraceServer.exe
CNTService
CGraceService
HttpTunnel.lib
CHttpTunnel
CHttpInputListener
CHttpRequestHandler
TranslateRulesMgr.lib
CTranslateRulesMgr
CGraceSaxParseHandler
CGProxy
CGProfile
CGTransform
25Pitfalls and Problems
- Problems with sites/routers that perform complex
tricks to prevent proxy-ing - Corporate Network Router Filters
- Yahoo Image Search (AltaVista trickery)
- Commercial Formats
- PowerPoint Babel Fish Protocol
- libwww incompatibilities with Windows
- ImageMagick Phishing
26Future Enhancement Ideas
- Research into using another common shareable XML
schema - Investigate LOCKSS, Fred, JHOST
- Improving Transform Rules for Smart Linking
- Skip past unnecessary steps
- Add support beyond mime types
- JHOSTs ability to use OAIS-based information to
identify a content type - Creating a Web Site to manage rules
27More Future Enhancement Ideas
- Investigate integration with Internet Archive
(similar to LOCKSS solution) - When normal web content is missing, redirect an
attempt using the Internet Archive - Investigate ways to identify the capabilities of
a client system and determine whether a transform
is needed - Experiment Integrating with LOCKSS
28Potential Future Problem
- Provenance is partially lost with this approach
- Although the original filename/URL is maintained,
the mime type and any other information about the
original format is lost - This may be ok if the information is not a total
loss
29Questions?Concerns?Comments?