Dynamic Web File Format Transformation - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Dynamic Web File Format Transformation

Description:

3rd Party Browser plug-ins eventually become old and hard to find ... PowerPoint; Babel Fish Protocol. libwww incompatibilities with Windows. ImageMagick Phishing ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 30
Provided by: daniels5
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Web File Format Transformation


1
Dynamic Web File Format Transformation
Masters Project Presentation Old Dominion
University April 18th, 2005 Presented by Daniel
S. Swaney
2
The Problem
  • Old Content on the Web is becoming harder to
    access/view
  • Viewing images in a web page
  • Blank Placeholders appearing where images should
    appear
  • 3rd Party Browser plug-ins eventually become old
    and hard to find
  • Some Images are not appearing as intended by the
    author (PNG Images)

3
The Problem
  • Old Content on the Web is becoming harder to
    access/view
  • Viewing documents downloaded using a web browser
  • Typically requires a browser plug-in or external
    software to view (e.g. GhostView for postscript,
    Adobe Acrobat Reader for pdf files)
  • What happens when the software becomes old and
    hard to find?(e.g. old pkzip formats from
    pre-1996)

4
Existing Work
  • TOM (Typed Object Model)
  • Fred (Format Registry Demonstration)
  • JHOVE (JSTOR/Harvard Object Validation
    Environment)
  • LOCKSS (Lots of Copies Keep Stuff Safe)

5
Existing Work
  • TOM (Typed Object Model) http//tom.library.
    upenn.edu/convert/
  • John Mark Ockerbloom, 1998
  • University of Pennsylvania
  • Input content is uploaded as a single file
  • Server transforms file to a target format
  • Output content can be downloaded from a cache
    location on a TOM server
  • Good Source for Knowledge of Formats

6
Existing Work
  • Fred (Format Registry Demonstration)
    http//tom.library.upenn.edu/fred/
  • Stephen Abrams of Harvard, December 23, 2003
  • Uses TOM technology
  • Demonstrates need for Global Digital Format
    Registry for more information than just a mime
    type.
  • http//hul.harvard.edu/gdfr/
  • Good Source beyond just using Mime Types

7
Existing Work
  • JHOVE (JSTOR/Harvard Object Validation
    Environment) http//hul.harvard.edu/jhove/inde
    x.html
  • JSTOR and Harvard University Library(2003-present
    )
  • Examines and Reports a digital objects
    representative information based on the OAIS
    reference model.
  • JHOVE has a scalable architecture with enough
    knowledge to process formats for identifiable
    information in a common way other than just mime
    type.

8
Existing Work
  • LOCKSS (Lots of Copies Keep Stuff Safe)
  • http//www.dlib.org/dlib/january05/rosenthal/01ros
    enthal.html
  • Preserves original digital content from the
    publisher
  • Experimented with transparent format migration
    as a proof-of-concept using GIF-to-PNG with the
    FireFox Web Browser
  • Recent development showing the need for a
    transparent solution

9
Transparent Approach when Web Browsing
10
Identifying the Problem
  • Colormap PNG Browser Image Test
  • http//www.ecs.soton.ac.uk/njl98r/png-test/alpha
    /cmap.html
  • Photography and the problems with image
    preservation http//www.kenrockwell.com/tech/raw
    .htm
  • Mentions caution about JPEG 2000 and RAW Formats

11
Identifying the Problem
  • Browser Image Tester http//entropymine.com/jason/
    testbed/imgfmts/
  • XBM (X-Windows Bitmap)
  • Not Typically Viewable in Netscape/IE

12
Demo
  • Browser Issues with PNG
  • XBM Format not displayable in Internet Explorer 6
  • Yahoo Movies Pictures
  • http//localhost/Test/Web_Browser_Image_Format_Tes
    t.html
  • http//localhost/Test/JPEG_vs_RAW_vs_TIFF.htm

13
Licensing Problems
  • The GIF and TIF formats cant be directly
    converted without a license from Unisys for the
    LZW algorithm (or using a 3rd party tool that has
    the license)
  • Microsoft doesnt publish their file formats
    publicly making it difficult to ensure older
    Microsoft formats will still be readable.

14
Configuring the Grace Translation Service
Profile
  • A Profile uniquely identifies
  • a user or group of users
  • preferences
  • Contains a list of one or more transform rules
    to follow
  • - Groups of users could be identified by a
    list or range of IP Addresses
  • - For efficiency, one profile could be
  • used for all users
  • - Defines the ports to use for proxying

End-User
15
Configuring the Grace Translation Service
Profile
End-User
  • A Transform defines one rule of
  • transformation to convert from
  • a source mime type to
  • a target mime type
  • - Defines whether this rule uses an
  • internally known transform module
  • (e.g. ImageMagick)
  • - Or an external plug-in module
  • for any 3rd party translation
  • mechanism/software product

Transform
16
Configuring the Grace Translation Service
Profile
End-User
  • Can define more than one set of rules
  • - When one rule is used, all rules are
  • again looked at to see if other rules
  • need to be carried out.
  • Future Enhancements
  • Smart Linking of Transform Rules
  • - To eliminate/skip extra steps to do a
  • one step transform
  • (JPG-GIF-BMP)
  • replaced with (JPG-BMP)

Transform
Transform

Transform
17
TranslateRules.xml (1 of 2)
"profile.dtd" description"Wireless connections on
192.168.2.x" end"192.168.2.255" proxyport8081 targetport80 sform id"001b" description"Tranform
JPG-GIF" image/jpgurce image/GIF
0on TRImageMagick.dll unctionTranslateItem
18
TranslateRules.xml (2 of 2)
JPEG-GIF" image/jpegsource image/GIFt 0tion TRImageMagick.dll
TranslateItem
GIF-BMP" image/GIFurce image/bmp
0on TRImageMagick.dll unctionTranslateItem rofile
19
Developing External Libraries
Start Library
bool StartLibrary() bool EndLibrary() bool
TranslateItem( LPTSTR sMimeTypeSource, LPTSTR
sMimeTypeTarget, LPBYTE pDataStream, size_t
pdwDataSizeInBytes, LPBYTE ppTranslatedStream,
size_t pdwTranslatedDataSizeInBytes ) void
ReleaseTranslatedItem( LPBYTE ppTranslatedStream
)
Translate Item
ReleaseTranslated Item
End Library
20
Developing an External Library
// Handle any special initialization of a 3rd
party library bool StartLibrary() if(
g_nUserCount 0 ) StartupImageMagick()
g_nUserCount return true // Handle any
special unloading of a 3rd party library bool
EndLibrary() g_nUserCount-- if( g_nUserCount
0 ) ShutdownImageMagick() return
true
Start Library
Translate Item
ReleaseTranslated Item
End Library
21
Developing an External Library
Start Library
bool TranslateItem( LPTSTR sMimeTypeSource,
LPTSTR sMimeTypeTarget, LPBYTE pDataStream,
size_t pdwDataSizeInBytes, LPBYTE
ppTranslatedStream, size_t pdwTranslatedDataSize
InBytes ) char sMimeTypeSourceRightT
strchr(
sMimeTypeSource, '/' ) char
sMimeTypeTargetRightT strchr(
sMimeTypeTarget,
'/' ) if( sMimeTypeSourceRightT
sMimeTypeTargetRightT )
sMimeTypeSourceRightT sMimeTypeTargetRight
T BYTE pTranslatedDataStream NULL
size_t nTranslatedDataSizeInBytes 0
Translate Item (part 1)
ReleaseTranslated Item
End Library
22
Developing an External Library
bool bStatusT ConvertImage(
sMimeTypeSourceRightT,

sMimeTypeTargetRightT, pDataStream,
pdwDataSizeInBytes, pTranslatedDataStrea
m,
nTranslatedDataSizeInBytes ) if( bStatusT
) ppTranslatedStream
pTranslatedDataStream
pdwTranslatedDataSizeInBytes
nTranslatedDataSizeI
nBytes else
bStatus false // if conversion failed
else bStatus false // if a
mime type was not provided return bStatus
Start Library
Translate Item (part 2)
ReleaseTranslated Item
End Library
23
Developing an External Library
Start Library
Translate Item
void ReleaseTranslatedItem( LPBYTE
ppTranslatedStream ) char pData (char)
ppTranslatedStream if( pData ) delete
pData
ReleaseTranslated Item
End Library
24
OO-Architecture
GraceServer.exe
CNTService
CGraceService
HttpTunnel.lib
CHttpTunnel
CHttpInputListener
CHttpRequestHandler
TranslateRulesMgr.lib
CTranslateRulesMgr
CGraceSaxParseHandler
CGProxy
CGProfile
CGTransform
25
Pitfalls and Problems
  • Problems with sites/routers that perform complex
    tricks to prevent proxy-ing
  • Corporate Network Router Filters
  • Yahoo Image Search (AltaVista trickery)
  • Commercial Formats
  • PowerPoint Babel Fish Protocol
  • libwww incompatibilities with Windows
  • ImageMagick Phishing

26
Future Enhancement Ideas
  • Research into using another common shareable XML
    schema
  • Investigate LOCKSS, Fred, JHOST
  • Improving Transform Rules for Smart Linking
  • Skip past unnecessary steps
  • Add support beyond mime types
  • JHOSTs ability to use OAIS-based information to
    identify a content type
  • Creating a Web Site to manage rules

27
More Future Enhancement Ideas
  • Investigate integration with Internet Archive
    (similar to LOCKSS solution)
  • When normal web content is missing, redirect an
    attempt using the Internet Archive
  • Investigate ways to identify the capabilities of
    a client system and determine whether a transform
    is needed
  • Experiment Integrating with LOCKSS

28
Potential Future Problem
  • Provenance is partially lost with this approach
  • Although the original filename/URL is maintained,
    the mime type and any other information about the
    original format is lost
  • This may be ok if the information is not a total
    loss

29
Questions?Concerns?Comments?
Write a Comment
User Comments (0)
About PowerShow.com