FACIT Tools For Distributed Collections - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

FACIT Tools For Distributed Collections

Description:

Basic idea. Application context: NGDA. FACIT Technology ... Basic elements of the LN stack. Highly generic, 'best effort' protocol for using storage ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 32
Provided by: asimya
Category:

less

Transcript and Presenter's Notes

Title: FACIT Tools For Distributed Collections


1
FACIT Tools For Distributed Collections
  • A breakout session at the
  • NDIPP Partners meeting
  • July 09, 2008
  • Terry Moore, University of Tennessee
  • Scott Smith/Justin Mathena, National Geospatial
    Digital Archive (UCSB)
  • Santiago de Ledesma, ACCRE, Vanderbilt

2
Overview
  • FACIT project
  • Basic idea
  • Application context NGDA
  • FACIT Technology
  • Logistical Networking inside
  • LoDN
  • L-Store
  • FACIT technology and the problem of long-term
    preservation of bits

3
Our Sponsors
4
What is FACIT
  • FACIT Federated Archive Cyberinfrastructure
    Testbed
  • Goal of FACIT Create a testbed to experiment
    with a different approach to federated resource
    sharing for access and preservation
  • FACIT partners
  • National Geospatial Data Archive (NGDA UCSB and
    Stanford) The NGDA is an NDIIPP partner
  • Logistical Networking (UTK) network storage tech
  • REDDnet (Vanderbilt) NSF funded infrastructure
    using LN for data intensive collaboration

5
NGDA Overview
  • National Geospatial Digital Archive
  • Focus long-term archiving (100 year problem)?
  • Emphasis on geospatial data
  • Policy level archive - not architecture specific
  • Based on 20 years of experience _at_ UCSB

6
Pertinent Details
  • Preservation through Simplicity
  • Key component of architecture is the Data Model
    all other parts considered disposable
  • Objects maintain self-descriptive 'manifests'
  • Archive organization and object structure both
    based on file systems
  • Data Model allows easy tie-in to L.N.

7
Detail View A Manifest
8
Detail View An Object
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
NGDA and Logistical Networking
  • Logistical Networking as an abstracted storage
    layer
  • Increase download speeds
  • Logistical Networking used as a Tool for
    custodianship
  • Logistical Networking as content transfer
    solution
  • Logistical Networking as a backup for at-risk
    data (temporary stewardship)?

14
(No Transcript)
15
The Future of NGDA and LN
  • Initial trials have met with mixed success
  • Lots of mixed size objects in test sets (30,000)
  • Upload set data size of 1TB
  • Moderate Download Speeds to LC over WAN
  • Roughly 1 day per TB download
  • The near future
  • Middleware to bridge search client LN cloud
  • Adjustments to handle mixed data sets

16
What is data logistics?
  • Data logistics is concerned with the time
    related positioning of data resources. Main
    question How can you arrange things so that the
    data you have will be there when you need it?
  • When the data gets big, its physicality can
    become a problem It does matter where you are
    and when you need it!
  • Logistical Networking focuses the problem of
    creating storage infrastructure scalable enough
    to address problems of data logistics.
  • Initial focus Access to data for wide-area,
    data-intensive collaboration
  • FACIT leverages this work and explores its
    application to long-term preservation

17
Basic elements of the LN stack
18
Sample exNodes
Partial exNode encoding
Crossing administrative domains, sharing resources
19
New federation members?
LoC
  • Add new depots
  • Copy the data
  • Rewrite the exNodes

20
Basic elements of the LN stack
21
LoDN - Network File Manager
  • Store files into the Logistical Network using
    Java upload/download tools.
  • Manages exNode maintenance and replication
  • Provides account (single user or group) as well
    as world permissions.

22
Accessing distributed collection with LN
23
What is L-Store?
  • Goal of L-Store Use LN to provide a generic,
    high performance, wide area capable, storage
    virtualization service
  • Provides a file system interface to (globally)
    distributed IBP depots (e.g. currently uses
    WebDAV and CIFS)
  • Flexible role based AuthZ (work in progress)

24
L-Store and Logistical Networking
  • L-Store adds a name space on top of the exnode
    layer
  • Allows for LN operations on the name space.
  • LNs parallelism for high performance and
    reliability, e.g. parallel transfers to improve
    performance (3GB/s during SC06 demo)

25
L-Store scalability
  • L-Store uses a Distributed Hash Table to store
    all its structural metadata (i.e. metadata
    about how the bits are stored)
  • DHTs provide a highly scalable way of storing
    metadata.
  • Metadata and data can scaled independently.

26
Storage Management
  • Nevoa Networks (Brazilian company based on LN)
    provides management of remote/distributed storage
    via StorCore
  • Provides resource discovery for L-Store.
  • Allows to group depots to form Logical units.
  • It can create dynamic logical units based on
    queries.

27
L-Store and FACIT
  • FACIT drives L-Store development
  • L-Sync An rsync-like tool that uses L-Store as
    intermediate storage.
  • Extended metadata attributes.
  • A flexible policy framework.

28
Questions about
  • NGDA?
  • Logistical Networking?
  • LoDN?
  • L-Store?

29
Discussion Preservations storage problem
  • Long-term preservation is a relay Repeated
    migrations across storage media/systems, archive
    systems, institutions
  • Begin with the bits Storage technology changes
    every 3-5 yrs
  • During some periods of time data will be in
    steady state
  • But during a century, there will be 20-30
    handoffs!
  • How can we create a handoff process that can be
    sustained for century or more? Can we create a
    technical process or will a social process have
    to do?
  • Complicating factor Were drowning in data

30
Framing The Issue Globally
  • World data expected to total 2 zettabytes by 2011
    (IDC Whitepaper)
  • As forecast, the amount of information created,
    captured, or replicated exceeded available
    storage for the first time in 2007. Not all
    information created and transmitted gets stored,
    but by 2011, almost half of the digital universe
    will not have a permanent home.

31
What does experience show?
SDSCs archive shows exponential growth w/ a
consistent doubling period of 15 months
32
If preservation is a relay, then
  • The key preservation problem at the bit layer is
  • Choice 1 steady state data storage
  • Choice 2 copying data to different systems
  • Impression De facto choice is 1
  • When you have to hand-off data do, is
    sufficient to have
  • Choice 1 A social solution
  • Choice 2 A technical solution
  • Impression De facto choice is 1
  • Contention Neither of these de facto choices is
    adequate
Write a Comment
User Comments (0)
About PowerShow.com