Title: FACIT Tools For Distributed Collections
1FACIT Tools For Distributed Collections
- A breakout session at the
- NDIPP Partners meeting
- July 09, 2008
- Terry Moore, University of Tennessee
- Scott Smith/Justin Mathena, National Geospatial
Digital Archive (UCSB) - Santiago de Ledesma, ACCRE, Vanderbilt
2Overview
- FACIT project
- Basic idea
- Application context NGDA
- FACIT Technology
- Logistical Networking inside
- LoDN
- L-Store
- FACIT technology and the problem of long-term
preservation of bits
3 Our Sponsors
4What is FACIT
- FACIT Federated Archive Cyberinfrastructure
Testbed - Goal of FACIT Create a testbed to experiment
with a different approach to federated resource
sharing for access and preservation - FACIT partners
- National Geospatial Data Archive (NGDA UCSB and
Stanford) The NGDA is an NDIIPP partner - Logistical Networking (UTK) network storage tech
- REDDnet (Vanderbilt) NSF funded infrastructure
using LN for data intensive collaboration
5NGDA Overview
- National Geospatial Digital Archive
- Focus long-term archiving (100 year problem)?
- Emphasis on geospatial data
- Policy level archive - not architecture specific
- Based on 20 years of experience _at_ UCSB
6Pertinent Details
- Preservation through Simplicity
- Key component of architecture is the Data Model
all other parts considered disposable - Objects maintain self-descriptive 'manifests'
- Archive organization and object structure both
based on file systems - Data Model allows easy tie-in to L.N.
7Detail View A Manifest
8Detail View An Object
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13NGDA and Logistical Networking
- Logistical Networking as an abstracted storage
layer - Increase download speeds
- Logistical Networking used as a Tool for
custodianship - Logistical Networking as content transfer
solution - Logistical Networking as a backup for at-risk
data (temporary stewardship)?
14(No Transcript)
15The Future of NGDA and LN
- Initial trials have met with mixed success
- Lots of mixed size objects in test sets (30,000)
- Upload set data size of 1TB
- Moderate Download Speeds to LC over WAN
- Roughly 1 day per TB download
- The near future
- Middleware to bridge search client LN cloud
- Adjustments to handle mixed data sets
16What is data logistics?
- Data logistics is concerned with the time
related positioning of data resources. Main
question How can you arrange things so that the
data you have will be there when you need it? - When the data gets big, its physicality can
become a problem It does matter where you are
and when you need it! - Logistical Networking focuses the problem of
creating storage infrastructure scalable enough
to address problems of data logistics. - Initial focus Access to data for wide-area,
data-intensive collaboration - FACIT leverages this work and explores its
application to long-term preservation
17Basic elements of the LN stack
18Sample exNodes
Partial exNode encoding
Crossing administrative domains, sharing resources
19New federation members?
LoC
20Basic elements of the LN stack
21LoDN - Network File Manager
- Store files into the Logistical Network using
Java upload/download tools. - Manages exNode maintenance and replication
- Provides account (single user or group) as well
as world permissions.
22Accessing distributed collection with LN
23What is L-Store?
- Goal of L-Store Use LN to provide a generic,
high performance, wide area capable, storage
virtualization service - Provides a file system interface to (globally)
distributed IBP depots (e.g. currently uses
WebDAV and CIFS) - Flexible role based AuthZ (work in progress)
24L-Store and Logistical Networking
- L-Store adds a name space on top of the exnode
layer - Allows for LN operations on the name space.
- LNs parallelism for high performance and
reliability, e.g. parallel transfers to improve
performance (3GB/s during SC06 demo)
25L-Store scalability
- L-Store uses a Distributed Hash Table to store
all its structural metadata (i.e. metadata
about how the bits are stored) - DHTs provide a highly scalable way of storing
metadata. - Metadata and data can scaled independently.
26Storage Management
- Nevoa Networks (Brazilian company based on LN)
provides management of remote/distributed storage
via StorCore - Provides resource discovery for L-Store.
- Allows to group depots to form Logical units.
- It can create dynamic logical units based on
queries.
27L-Store and FACIT
- FACIT drives L-Store development
- L-Sync An rsync-like tool that uses L-Store as
intermediate storage. - Extended metadata attributes.
- A flexible policy framework.
28Questions about
- NGDA?
- Logistical Networking?
- LoDN?
- L-Store?
29Discussion Preservations storage problem
- Long-term preservation is a relay Repeated
migrations across storage media/systems, archive
systems, institutions - Begin with the bits Storage technology changes
every 3-5 yrs - During some periods of time data will be in
steady state - But during a century, there will be 20-30
handoffs! - How can we create a handoff process that can be
sustained for century or more? Can we create a
technical process or will a social process have
to do? - Complicating factor Were drowning in data
30Framing The Issue Globally
- World data expected to total 2 zettabytes by 2011
(IDC Whitepaper) -
- As forecast, the amount of information created,
captured, or replicated exceeded available
storage for the first time in 2007. Not all
information created and transmitted gets stored,
but by 2011, almost half of the digital universe
will not have a permanent home.
31What does experience show?
SDSCs archive shows exponential growth w/ a
consistent doubling period of 15 months
32 If preservation is a relay, then
- The key preservation problem at the bit layer is
- Choice 1 steady state data storage
- Choice 2 copying data to different systems
- Impression De facto choice is 1
- When you have to hand-off data do, is
sufficient to have - Choice 1 A social solution
- Choice 2 A technical solution
- Impression De facto choice is 1
- Contention Neither of these de facto choices is
adequate