Initial BizTalk Programming Development Objectives for PeDALS - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Initial BizTalk Programming Development Objectives for PeDALS

Description:

A grant funded multi-state project financed by the Library of Congress (National ... MetaExtractor: http://meta-extractor.sourceforge.net ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 28
Provided by: julie157
Category:

less

Transcript and Presenter's Notes

Title: Initial BizTalk Programming Development Objectives for PeDALS


1
Initial BizTalk Programming Development
Objectives for PeDALS
  • Dennis Bitterlich, Electronic Records Archivist

2
What is PeDALS?Persistent Digital Archives
Library System
  • A grant funded multi-state project financed by
    the Library of Congress (National Digital
    Information Infrastructure Preservation Program
    (NDIIPP)) the Institute for Museum and Library
    Services
  • Includes five state partners Arizona, Florida,
    New York, South Carolina and Wisconsin, with
    Arizona as the lead partner
  • Project will run 18-months, until the middle of
    2009 if successful, WHS intends to continue
    participation beyond this period
  • At the end of the project each partner will have
    a functioning electronic records repository

3
Why is PeDALS Needed?
  • An increasing number of state government records
    of long-term value are created in electronic-only
    format
  • Due to the large and increasing volume of
    electronic records in varied formats, traditional
    appraisal and acquisition practices are no longer
    effectivean automated, rules-based system like
    PeDALS is one possible response to this new
    reality
  • PeDALS is not an electronic records management
    system, but rather a way to acquire electronic
    records already scheduled for transfer
  • PeDALS is both a learning opportunity and a
    chance to implement a functioning system

4
Goals of the Project
  • Develop a methodology to support an automated,
    integrated workflow to process collections of
    electronic records
  • Implement an inexpensive storage system that can
    preserve the integrity and authenticity of
    electronic records over time
  • Remove barriers to adoption by keeping costs of
    the system as low as possible
  • Work with Wisconsin Document Depository Program
    to develop ways to integrate digital format state
    agency publications into PeDALS processes since
    2005 the Depository has worked to preserve
    e-publications acquired from state websites

5
Microsoft BizTalk Overview
  • BizTalk is a middleware application which at its
    core is an XML Message Queue which will
  • Receive Objects ? Converts Performs Logic on
    Objects ? Send Objects
  • Completed by BizTalk using XML

6
BizTalk Pipelines
  • Pipelines
  • Connections between systems
  • Connect BizTalk to databases
  • Connect BizTalk to web
  • Connect BizTalk to file servers
  • Connect BizTalk to programs

7
BizTalk Business Rules
  • Business rules
  • BizTalk speak for high level processes that
    determine what orchestrations will be performed
  • If record series confidential or restricted then
    go to orchestration to populate restrictions

8
BizTalk Orchestrations
  • Orchestrations
  • BizTalk speak for the logic to process objects
  • Build in logic to calculate length of
    restrictions and database fields to populate

9
Initial BizTalk Development Goals Objectives
  • 1 Write ARCAT BizTalk Code pipeline
  • Series already cataloged
  • Reduced duplication of work manual data entry
  • Pipeline will work for CGI/BIN Web Service
  • Copy programming code to create next pipelines
  • 2 Write Web Services BizTalk Code pipeline
  • Copied from CGI/BIN ARCAT Service pipeline
  • Generic HTTP pipeline to Agencies Web Pages
  • Can use for PeDALS Drop Box

10
Initial BizTalk Development Goals Objectives
  • 3 Write DHS BizTalk Code pipeline
  • Code copied from prior pipelines
  • Connect to a database
  • Solve issues related to external networks
  • 4 Write DWD BizTalk Code pipeline
  • Connect to a file server
  • Issues related to external networks should be
    solved, but may be different for file server
    connection

11
Initial BizTalk Development Goals Objectives
  • 5 Write Call JHOVE, MetaExtractor, or C Code
    in BizTalk to wrap records with preservation
    metadata orchestration
  • Once we can receive records through pipelines
  • Create logic to perform in BizTalk
  • Wrap records in XML in preservation metadata
  • First, execute a third party open source program
    such as JHOVE or MetaExtractor
  • Second, write code to interact with software
    programming languages such as C

12
Measurement of Success
  • 1 Ability to extract MARC records from ARCAT
    and insert into database
  • 2 Ability to create external web services
    pipeline to transfer records to WHS
  • 3 Ability to create external file pipeline to
    DHS Quest Archives Manager to transfer records to
    WHS
  • 4 Ability to create external file pipeline to
    DWD to transfer records to WHS
  • 5 Ability to wrap electronic records with
    preservation metadata inside of BizTalk

13
Process to Write Code
  • Iterative Process to
  • 1) Write BizTalk programming code
  • 2) Test BizTalk programming code
  • 3) Revise BizTalk programming code
  • 4) Retest BizTalk programming code

14
Pre-BizTalk Training Development PlansInitial
Thoughts on How I Would Get Objects into BizTalk
pre September 2008
  • Initially PeDALS to use FTP to Receive Electronic
    Records
  • Authentication, integrity, security, and user
    friendliness issues
  • Now a generic Drop Box (probably a Web service)
  • Initial Knowledge of BizTalk
  • A middleware application which at its core is an
    XML Message Queue
  • Uses XML to complete the connections to and from
    external applications
  • Needed automated processes to provide BizTalk
    with XML objects

15
Pre-BizTalk Training Development Plans
  • Use of Third Party Open Source Code to
    convert/wrap in XML
  • MARC21 to MARCXML Converter http//www.loc.gov/
    standards/marcxml/
  • MarcEdit http//oregonstate.edu/reeset/marcedi
    t/html/index.php
  • JHOVE http//hul.harvard.edu/jhove/
  • MetaExtractor http//meta-extractor.sourceforge
    .net/

16
Pre-BizTalk Training Development Plans
  • MARC21 to MARCXML Converter http//www.loc.gov/
    standards/marcxml/
  • The MARCXML toolkit is a set of Java programs
    which allow users to convert to and from the MARC
    file format (including full character set
    conversion) and other formats available in the
    MARCXML architecture. The toolkit requires Java
    and works best with Java 1.4. If using a earlier
    version of Java, you need to modify the
    marcxml.bat file to include an xml parser in the
    classpath. Unzip the marcxml.zip file in a
    directory and run marcxml.bat for more
    instructions. Make sure java is in your PATH. In
    this version the stylesheets and character
    conversion mappings are downloaded via http from
    LC's website therefore Internet access is
    required when using these utilities.

17
Pre-BizTalk Training Development Plans
  • MarcEdit http//oregonstate.edu/reeset/marcedi
    t/html/index.php
  • Is a MARC editing tool with a Native Z39.50
    client and automatic batch conversions to/from
  • Comma/Tab Delimited Files
  • Dublin Core
  • EAD
  • MARC
  • OAI
  • XML

18
Pre-BizTalk Training Development Plans
  • JHOVE http//hul.harvard.edu/jhove/
  • JHOVE provides functions to perform
    format-specific identification, validation, and
    characterization of digital objects.
  • Format identification is the process of
    determining the format to which a digital object
    conforms in other words, it answers the
    question "I have a digital object what format
    is it?"
  • Format validation is the process of determining
    the level of compliance of a digital object to
    the specification for its purported format, e.g.
    "I have an object purportedly of format F is
    it?" Format validation conformance is determined
    at two levels well-formedness and validity.
  • A digital object is well-formed if it meets the
    purely syntactic requirements for its format.
  • An object is valid if it is well-formed and it
    meets additional semantic-level requirements.

19
Pre-BizTalk Training Development Plans
  • MetaExtractor http//meta-extractor.sourceforge
    .net/
  • The Metadata Extraction Tool was developed by the
    National Library of New Zealand to
    programmatically extract preservation metadata
    from a range of file formats
  • Images BMP, GIF, JPEG and TIFF
  • Office documents MS Word (version 2, 6), Word
    Perfect, Open Office (version 1), MS Works, MS
    Excel, MS PowerPoint, and PDF
  • Audio and Video WAV and MP3
  • Markup languages HTML and XML
  • The Metadata Extraction Tool
  • Automatically extracts preservation-related
    metadata from digital files
  • Outputs that metadata in a standard format (XML)
    for use in preservation activities
  • The Tool was designed for preservation processes
    and activities, but can be used to for other
    tasks, such as the extraction of metadata for
    resource discovery

20
Pre-BizTalk Training Development Plans
  • MarcEdit ARCAT MARC Catalog Records
  • 1) Use Z39.50 gateway to retrieve records as
    .mrc files
  • 2) Use MarcEdit to convert .mrc files to XML
  • 3) BizTalk receives XML files
  • 4) BizTalk performs logic
  • 5) BizTalk inserts/updates SQLServer Database

21
Post September BizTalk Training Development Plans
  • Pipelines can connect directly to
  • Web services like ARCAT or OCLC or even HTTP
  • File servers like at DWD
  • Databases like DHS Quest Archives Manager
  • Orchestrations can
  • Call other orchestrations
  • Call other executable programs
  • Call other applications written in various
    software languages (C or Java)

22
Post-BizTalk Training Development Plans
  • ARCAT MARC Catalog Records
  • 1) Create pipeline
  • From ARCAT
  • To PeDALS Database
  • 2) Create search page to enter variables or a
    list of series to retrieve from ARCAT
  • Automates process
  • Decreases manual labor needed compared to using
    MarcEdit
  • Reduced duplication of work

23
Post-BizTalk Training Development Plans
  • ARCAT MARC Catalog Records
  • 3) Create Orchestration
  • - To automatically map data from MARC to PeDALS
    database
  • - To execute MarcEdit (if necessary)
  • - That will insert or update PeDALS database
  • - Then export from PeDALS database to ARCAT,
    file, or OCLC

24
Possible Involvements(After Initial Development)
  • State Archivist Peter Gottlieb
  • Ultimate sign off on development
  • Collection Development Archivist Helmut Knies
  • Initial sign off on development
  • Electronic Records Archivist Dennis Bitterlich
  • Programming, testing, verification
  • Public Records Accessioner Abbie Norderhaug
  • Testing verification
  • Head of Cataloging Collections Mgmt Services
    Maija Cravens
  • Policies procedures

25
Possible Involvements (After Initial Development)
  • Archivist Jacquelyn Ferry
  • Policies procedures
  • Testing verification
  • Information Technology Director Paul Hedges
  • Hardware, networks, security
  • WI State Government Publications Librarian Nancy
    Knies
  • State publications to store in LOCKSS
  • DHS Records Officer Steve Bose
  • Transfer of records
  • DHS IT Jovy Swanton
  • Hardware, network, programming, security

26
Possible Involvements (After Initial Development)
  • DPI WDDP Abby Swanton
  • State publications to store in LOCKSS
  • DWD Records Officer Dawn Bluma
  • Transfer of records
  • DWD IT
  • Hardware, network, programming, security
  • UW IT
  • Hardware, network, programming, security

27
Thank You!
  • Collecting, Preserving and Sharing Stories Since
    1846
Write a Comment
User Comments (0)
About PowerShow.com