Title: Digital%20Preservation%20(E-Archiving)
1Digital Preservation(E-Archiving)
Marta Melgar García mmelgar_at_ine.es
2Presentation Index
- Introduction
- Digital Preservation Strategies
- Digital Preservation Problems
- INE Journals digital repository
- INEBase History
- Our Virtual Library
- Project Phases
- The Technical Process in 3 steps
- The Publisher
- Visualization On Internet
- Interesting Data
- IT Data
3Introduction
Digital Preservation definition
- Digital preservation combines policies,
strategies and actions that ensure access to
information in digital formats over time. - Publications will be available and accessible
for generations to come. - Source American Library Association
4Digital Preservation strategies
- Digital preservation strategies and actions
address content creation, integrity and
maintenance. - Planning
- Content creation
- Content integrity
- Content maintenance
- Problems
- Source ALA
5Digital Preservation strategies
- Clear and complete technical specifications
- Production of reliable master files
- Sufficient descriptive, administrative and
structural metadata to ensure future access - Detailed quality control of processes
6Digital Preservation strategies
- Program planning, management and evaluation
should consider - Risk assessment and management.
- Cost benefit analysis.
- Legal issues.
- The role of file formats,standards and metadata.
- Storage and maintenance.
- Disaster planning.
- The relationship between preservation and access.
- Preservation strategies, approaches, and
methodologies. - Technology forecasting for preservation.
- Source Cornell University Library
7Digital Preservation strategies
- Content integrity includes
- Documentation of all policies, strategies and
procedures - Use of persistent identifiers
- Recorded provenance and change history for all
objects - Verification mechanisms
- Attention to security requirements
- Routine audits
8Digital Preservation strategies
- Content maintenance includes
- A computing and networking infrastructure
- Storage and synchronization of files at multiple
sites - Continuous monitoring and management of files
- Programs for refreshing, migration and emulation
- Written disaster prevention and recovery plans
- Periodic review and updating of policies and
procedures
9Digital Preservation problems
- We have to preserve the records in an electronic
era where change and speed is valued more highly
that conservation and longevity. - Enormous amounts of digital information are
already lost forever. - Information technologies are essentially obsolete
in a short period of time. This dynamic creates
an unstable and unpredictable environment for
the continuance of hardware and software. - There is a proliferation of document and media
formats, each one potentially carrying their own
software and hardware dependencies.Copying these
formats from one storage device to another is
simple. However, merely copying bits is not
sufficient for preservation purposes. If the
software is not avaliable, the information will
lost. Besides the complexity of maintaining the
integrity of links, embedded objects, etc. - Digital preservation is expensive.
- Increasingly restictive intellectual property and
licensing regimes. - Source http//www.ifla.org
10INE Journals digital repository
In our Library we have created a digital
repository of printed journals.
- Process steps
- 1. In our OPAC (On-line public Access Catalogue),
we select the 856 field (for electronical
resources). - 2. We create a fixed URL. This URL is inside our
server. - 3. We scan the journals in PDF format.
- 4. We get up the PDF files to the server through
FTP. - 5. We use the fixed URL and incorporate every
different PDF file to its root. - 6. We link every file to the OPAC Web.
- 7. We see the digitalized file in our OPAC Web.
11INE Journals digital repository
Field 856
12INE Journals digital repository
13INE Journals digital repository
14INE Journals digital repository
15INE Journals digital repository
16INE Journals digital repository
17INE Journals digital repository
- Some interesting data
- No cost of implementation
- Involved personel 2 people
- Project time one and a half year
- Current status More than 1000 journal numbers
digitalized and published
18INEbase history
Statistical books 1858-1997 available on the web
- Background
- 1996 The INE joins the Internet
- 2000 INEbase birth ?all statistical production
offered on the Internet - 2004 what shall we do with past information only
available in printed format? ?Target opening up
to the public historical collection of INE
publications only available on paper
19INEbase history a new section of INEbase
- We had to choose between different alternatives
- Tables in pc-axis format
- Complete PDF versions of the books
- INEbase history
20INEBase History Our Virtual Library
21INEBase History Our Virtual Library
22INEBase History Our Virtual Library
1858 Yearbook
23INEBase History Our Virtual Library
Population (28 tables)
24INEBase History Our Virtual Library
25INEBase History Our Virtual Library
26INEbase history Project Phases
- Phase 1.
- What should be published? Most symbolic and
representative volumes of public statistical
activity - Statistical Yearbooks (1858 1997)
- Population Censuses (1900 1970)
- Outsource scanning ( de 100,000 pages)
- Outsource the software development
- Phase 2.
- Cataloguing starts
- Software improvements suggested by use
- 20 publications catalogued before publishing
27INEbase history Project Phases
- Phase 3.
- Internet launch takes place with 20 Yearbooks and
1 Census
- Phase 4.
- Cataloguing and web publications of 78 Yearbooks
and 9 Censuses (34 volumes)
- Phase 5.
- Incorporation of new publications
- Scan the Agrarian Census and VS statistics
- Programme adaptation
- Cataloguing publication
28INEbase history The Technical Process in 3 steps
- 1. Scanning and OCR
- Scanning using the originals
- Unbinding (old and non-unique)
- Guillotining (repeated and unimportant)
- Microfiche (rare, old copies)
- TIFF files obtained
- OCR programme used to generate txt files ?used
for search engine - Once PDF file is obtained ? ready to be
catalogued
292. Cataloguing books into the system
cataloguer role
INEbase history The Technical Process in 3 steps
- 1st step create index with categories until we
get to the final node the statistical tables
2nd step associate one or more PDF documents to
each node
30INEbase history The Technical Process in 3 steps
How is cataloguing done? Practical example
Creation of a virtual book Statistical Yearbook
2010
Node blocked
31INEbase history The Technical Process in 3 steps
Creation of the index publication
Creating as many chapters as needed
32INEbase history The Technical Process in 3 steps
Creation of the tables and association to the
corresponding PDF-doc.
33INEbase history The Technical Process in 3 steps
Recreating the hierarchical tree All the
publications documents appear associated to
their corresponding table
Nodes unblocked
Cataloguers work ends here
34INEbase history The Technical Process in 3 steps
- 3. Revision before publishing
- Cataloguing should be revised before being
published - Who revises? ? there is a specific role, the
proof-reader, but. this role has not really
been used and in reality another cataloguer does
the revision - Once the proof-reading work is finished, the book
is ready for publication - Proof-readers work ends here
35INEbase history The Publisher
Main task to publish books other tasks user
and trasmission control, nodes translation
Blocked node
Published node
Unblocked node
Book ready to be shown on the Internet
And the translation process begins
36Trasmission process synchronization of servers
Cataloguing Server
Dissemination Server
This step might not be needed
37INEbase history Visualisation on the Internet
38INEbase history Visualisation on the Internet
Yearbooks ordered by decades
39INEbase history The hierarchical tree....
On the dissemination server
On the cataloguing programme
40And just a click on the required table
And a 9 page PDF document is shown
41INEbase history Anything else to be taken in
account
Search engine
Change language
No. of tables
Size of pdf file
42INEbase history The search engine
Direct access to the pdf document
43The search engine is based on the table titles
(sorry, only in Spanish) and the hierarchical
tree (in English as well)
INEbase history The search engine
Of course, you might as well use INEs general
search engine
44Population censuses Everything is also valid
INEbase history The search engine
45INEbase history Some Interesting Data
- 1- Economic data
- Initial scanning stage 12,000 Euros, 110,000
pages - External development 90,000 Euros
- 2- Deadlines
- Scaning development programme 6 months
- Cataloguing 20 months
- 3- Amount of scanned pages
- Yearbook 70,000 pages
- Census 30,000 pages
- Total 100,000 pages
46INEbase history Some Interesting Data
- 4- Personnel used
- Cataloguing 0 3 Recording assistants
- Indexes translator 1 trainee
- Publisher 1 2 Statisticians
- IT support team
- 5- How many people use INEbase History?
- Page views in october 77,623 (1.2 of total)
47INEbase history IT DATA
- IT infrastructure a reasonably simple system
- A cataloguing server houses a copy of the
work from the database and the collection of PDF
pages multiple cataloguer PCs provided with a
"client" application connect to the server - One of the components of the family of web
servers at www.ine.es houses the dissemination
server (the software, plus a copy of the database
and a copy of the collection of PDF pages). This
is the system that serves Internet files - There are copy and safety mechanisms between
one environment and the other - The environment is similar to a content
management programme
48INEbase history IT DATA
- IT infrastructure a reasonably simple system
- Client programmes developed with Microsoft.Net.
- Server programme developed with Java.
- Catalogue and dissemination database, Oracle 9i.
- Programmes for working with PDF files obtained
from a manufacturer specialised in this kind of
software. - Conceptual design. Setting requirements,
selection of - platforms National Statistics Institute.
- Scanning of originals Proco S.A.
- Tecnological partner development Sopra Group.
49- Thank you very much for your attention