Digitising Hansard putting 200 years of parliamentary debates online - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Digitising Hansard putting 200 years of parliamentary debates online

Description:

the official report of debates in Parliament. actually an ... early reports written in the third person. eventually developed into a (nearly) verbatim account ' ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 35
Provided by: garyb73
Category:

less

Transcript and Presenter's Notes

Title: Digitising Hansard putting 200 years of parliamentary debates online


1
Digitising Hansard putting 200 years of
parliamentary debates online
  • Edward Wood
  • Programme Director
  • House of Commons


2
Digitising Hansard
  • the history of Hansard
  • digitising Hansard scanning and OCR
  • database and front end


3
Hansard
  • the official report of debates in Parliament
  • actually an unofficial private enterprise at
    first
  • nationalised in 1909
  • early reports written in the third person
  • eventually developed into a (nearly) verbatim
    account


4
  • though not strictly verbatim, it is
    substantially the verbatim report, with
    repetitions and redundancies omitted and with
    obvious mistakes corrected, but ... on the
    other hand leaves out nothing that adds to the
    meaning of the speech or illustrates the
    argument.


5
why digitise?
  • enable preservation
  • conservation is expensive
  • increase access
  • increase usability
  • re-use physical storage space
  • costs have fallen significantly
  • quality improving steadily


6
doing the work
  • overseas contractor chosen for cost and quality
  • scanning from dis-bound originals
  • image, text and metadata captured
  • data capture automated but with significant
    manual intervention to increase quality
  • triple-compare process improves OCR quality
  • volumes from 1803 2005 were digitised
  • nearly 3 million pages


7
developing a web interface
  • drivers
  • free access
  • meaningful search across a large amount of data
  • work closely with users
  • keep costs down


8
developing a web interface
  • solution
  • experimental approach web 2.0 feel
  • text only
  • faceting for intuitive navigation and search
  • clean look and feel
  • give it away!?
  • http//www.parliament.uk/publications/
    archives.cfm


9
methodology and progress
  • small team of developers working closely with
    users (inside and outside Parliament)?
  • d/b and front end use micro formats approach
  • XML is parsed into HTML before loading into the
    database
  • data back to 1803 has been loaded (around 50
    volumes missing)?
  • public discussion group and issues log


10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
faceted classification
  • assignment of multiple classifications to an
    object
  • the material can thus be presented in a variety
    of ways
  • facets include
  • date
  • volume number
  • monarch
  • chamber
  • content type (debates or questions)?
  • constituencies
  • Members of Parliament
  • offices held


15
(No Transcript)
16
other features
  • references using the standard format can be
    located using the search box
  • HC Deb Vol 385 13 May 2002 c498
  • predictable URLs
  • http//hansard.millbanksystems.com/commons/1941/m
    ay/07/war-situation
  • pages created for
  • individual Members of Parliament
  • constituencies
  • acts
  • bills
  • divisions


17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
http//hansard.millbanksystems.com


23
developing the functionality...
  • disambiguating the metadata
  • (Mr Smith, Mr John Smith, John Smith...)?
  • tag clouds
  • machine-readable divisions
  • geographical interface


24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
http//hansard.millbanksystems.com


30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
http//hansard.millbanksystems.com

Write a Comment
User Comments (0)
About PowerShow.com