Title: Initial BizTalk Programming Development Objectives for PeDALS
1Initial BizTalk Programming Development
Objectives for PeDALS
- Dennis Bitterlich, Electronic Records Archivist
2What is PeDALS?Persistent Digital Archives
Library System
- A grant funded multi-state project financed by
the Library of Congress (National Digital
Information Infrastructure Preservation Program
(NDIIPP)) the Institute for Museum and Library
Services - Includes five state partners Arizona, Florida,
New York, South Carolina and Wisconsin, with
Arizona as the lead partner - Project will run 18-months, until the middle of
2009 if successful, WHS intends to continue
participation beyond this period - At the end of the project each partner will have
a functioning electronic records repository
3Why is PeDALS Needed?
- An increasing number of state government records
of long-term value are created in electronic-only
format - Due to the large and increasing volume of
electronic records in varied formats, traditional
appraisal and acquisition practices are no longer
effectivean automated, rules-based system like
PeDALS is one possible response to this new
reality - PeDALS is not an electronic records management
system, but rather a way to acquire electronic
records already scheduled for transfer - PeDALS is both a learning opportunity and a
chance to implement a functioning system
4Goals of the Project
- Develop a methodology to support an automated,
integrated workflow to process collections of
electronic records - Implement an inexpensive storage system that can
preserve the integrity and authenticity of
electronic records over time - Remove barriers to adoption by keeping costs of
the system as low as possible - Work with Wisconsin Document Depository Program
to develop ways to integrate digital format state
agency publications into PeDALS processes since
2005 the Depository has worked to preserve
e-publications acquired from state websites
5Microsoft BizTalk Overview
- BizTalk is a middleware application which at its
core is an XML Message Queue which will - Receive Objects ? Converts Performs Logic on
Objects ? Send Objects - Completed by BizTalk using XML
6BizTalk Pipelines
- Pipelines
- Connections between systems
- Connect BizTalk to databases
- Connect BizTalk to web
- Connect BizTalk to file servers
- Connect BizTalk to programs
7BizTalk Business Rules
- Business rules
- BizTalk speak for high level processes that
determine what orchestrations will be performed - If record series confidential or restricted then
go to orchestration to populate restrictions
8BizTalk Orchestrations
- Orchestrations
- BizTalk speak for the logic to process objects
- Build in logic to calculate length of
restrictions and database fields to populate
9Initial BizTalk Development Goals Objectives
- 1 Write ARCAT BizTalk Code pipeline
- Series already cataloged
- Reduced duplication of work manual data entry
- Pipeline will work for CGI/BIN Web Service
- Copy programming code to create next pipelines
- 2 Write Web Services BizTalk Code pipeline
- Copied from CGI/BIN ARCAT Service pipeline
- Generic HTTP pipeline to Agencies Web Pages
- Can use for PeDALS Drop Box
10Initial BizTalk Development Goals Objectives
- 3 Write DHS BizTalk Code pipeline
- Code copied from prior pipelines
- Connect to a database
- Solve issues related to external networks
- 4 Write DWD BizTalk Code pipeline
- Connect to a file server
- Issues related to external networks should be
solved, but may be different for file server
connection
11Initial BizTalk Development Goals Objectives
- 5 Write Call JHOVE, MetaExtractor, or C Code
in BizTalk to wrap records with preservation
metadata orchestration - Once we can receive records through pipelines
- Create logic to perform in BizTalk
- Wrap records in XML in preservation metadata
- First, execute a third party open source program
such as JHOVE or MetaExtractor - Second, write code to interact with software
programming languages such as C
12Measurement of Success
- 1 Ability to extract MARC records from ARCAT
and insert into database - 2 Ability to create external web services
pipeline to transfer records to WHS - 3 Ability to create external file pipeline to
DHS Quest Archives Manager to transfer records to
WHS - 4 Ability to create external file pipeline to
DWD to transfer records to WHS - 5 Ability to wrap electronic records with
preservation metadata inside of BizTalk
13Process to Write Code
- Iterative Process to
- 1) Write BizTalk programming code
- 2) Test BizTalk programming code
- 3) Revise BizTalk programming code
- 4) Retest BizTalk programming code
14Pre-BizTalk Training Development PlansInitial
Thoughts on How I Would Get Objects into BizTalk
pre September 2008
- Initially PeDALS to use FTP to Receive Electronic
Records - Authentication, integrity, security, and user
friendliness issues - Now a generic Drop Box (probably a Web service)
- Initial Knowledge of BizTalk
- A middleware application which at its core is an
XML Message Queue - Uses XML to complete the connections to and from
external applications - Needed automated processes to provide BizTalk
with XML objects
15Pre-BizTalk Training Development Plans
- Use of Third Party Open Source Code to
convert/wrap in XML - MARC21 to MARCXML Converter http//www.loc.gov/
standards/marcxml/ - MarcEdit http//oregonstate.edu/reeset/marcedi
t/html/index.php - JHOVE http//hul.harvard.edu/jhove/
- MetaExtractor http//meta-extractor.sourceforge
.net/
16Pre-BizTalk Training Development Plans
- MARC21 to MARCXML Converter http//www.loc.gov/
standards/marcxml/ - The MARCXML toolkit is a set of Java programs
which allow users to convert to and from the MARC
file format (including full character set
conversion) and other formats available in the
MARCXML architecture. The toolkit requires Java
and works best with Java 1.4. If using a earlier
version of Java, you need to modify the
marcxml.bat file to include an xml parser in the
classpath. Unzip the marcxml.zip file in a
directory and run marcxml.bat for more
instructions. Make sure java is in your PATH. In
this version the stylesheets and character
conversion mappings are downloaded via http from
LC's website therefore Internet access is
required when using these utilities.
17Pre-BizTalk Training Development Plans
- MarcEdit http//oregonstate.edu/reeset/marcedi
t/html/index.php - Is a MARC editing tool with a Native Z39.50
client and automatic batch conversions to/from - Comma/Tab Delimited Files
- Dublin Core
- EAD
- MARC
- OAI
- XML
-
18Pre-BizTalk Training Development Plans
- JHOVE http//hul.harvard.edu/jhove/
- JHOVE provides functions to perform
format-specific identification, validation, and
characterization of digital objects. - Format identification is the process of
determining the format to which a digital object
conforms in other words, it answers the
question "I have a digital object what format
is it?" - Format validation is the process of determining
the level of compliance of a digital object to
the specification for its purported format, e.g.
"I have an object purportedly of format F is
it?" Format validation conformance is determined
at two levels well-formedness and validity. - A digital object is well-formed if it meets the
purely syntactic requirements for its format. - An object is valid if it is well-formed and it
meets additional semantic-level requirements.
19Pre-BizTalk Training Development Plans
- MetaExtractor http//meta-extractor.sourceforge
.net/ - The Metadata Extraction Tool was developed by the
National Library of New Zealand to
programmatically extract preservation metadata
from a range of file formats - Images BMP, GIF, JPEG and TIFF
- Office documents MS Word (version 2, 6), Word
Perfect, Open Office (version 1), MS Works, MS
Excel, MS PowerPoint, and PDF - Audio and Video WAV and MP3
- Markup languages HTML and XML
- The Metadata Extraction Tool
- Automatically extracts preservation-related
metadata from digital files - Outputs that metadata in a standard format (XML)
for use in preservation activities - The Tool was designed for preservation processes
and activities, but can be used to for other
tasks, such as the extraction of metadata for
resource discovery
20Pre-BizTalk Training Development Plans
- MarcEdit ARCAT MARC Catalog Records
- 1) Use Z39.50 gateway to retrieve records as
.mrc files - 2) Use MarcEdit to convert .mrc files to XML
- 3) BizTalk receives XML files
- 4) BizTalk performs logic
- 5) BizTalk inserts/updates SQLServer Database
21Post September BizTalk Training Development Plans
- Pipelines can connect directly to
- Web services like ARCAT or OCLC or even HTTP
- File servers like at DWD
- Databases like DHS Quest Archives Manager
- Orchestrations can
- Call other orchestrations
- Call other executable programs
- Call other applications written in various
software languages (C or Java)
22Post-BizTalk Training Development Plans
- ARCAT MARC Catalog Records
- 1) Create pipeline
- From ARCAT
- To PeDALS Database
- 2) Create search page to enter variables or a
list of series to retrieve from ARCAT - Automates process
- Decreases manual labor needed compared to using
MarcEdit - Reduced duplication of work
-
23Post-BizTalk Training Development Plans
- ARCAT MARC Catalog Records
- 3) Create Orchestration
- - To automatically map data from MARC to PeDALS
database - - To execute MarcEdit (if necessary)
- - That will insert or update PeDALS database
- - Then export from PeDALS database to ARCAT,
file, or OCLC -
-
24Possible Involvements(After Initial Development)
- State Archivist Peter Gottlieb
- Ultimate sign off on development
- Collection Development Archivist Helmut Knies
- Initial sign off on development
- Electronic Records Archivist Dennis Bitterlich
- Programming, testing, verification
- Public Records Accessioner Abbie Norderhaug
- Testing verification
- Head of Cataloging Collections Mgmt Services
Maija Cravens - Policies procedures
25Possible Involvements (After Initial Development)
- Archivist Jacquelyn Ferry
- Policies procedures
- Testing verification
- Information Technology Director Paul Hedges
- Hardware, networks, security
- WI State Government Publications Librarian Nancy
Knies - State publications to store in LOCKSS
- DHS Records Officer Steve Bose
- Transfer of records
- DHS IT Jovy Swanton
- Hardware, network, programming, security
26Possible Involvements (After Initial Development)
- DPI WDDP Abby Swanton
- State publications to store in LOCKSS
- DWD Records Officer Dawn Bluma
- Transfer of records
- DWD IT
- Hardware, network, programming, security
- UW IT
- Hardware, network, programming, security
27Thank You!
- Collecting, Preserving and Sharing Stories Since
1846