The Vesta Parallel File System - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

The Vesta Parallel File System

Description:

Method for Vesta file system: ... Each I/O node maintains the Vesta objects in a memory-mapped table. ... Vesta introduces the notion of partitioning the data ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 30
Provided by: stevebi
Category:

less

Transcript and Presenter's Notes

Title: The Vesta Parallel File System


1
The Vesta Parallel File System
  • Peter F. Corbett Dror G. Feithlson

2
Outline
  • Introduction
  • Motivation and Design Guidelines
  • Abstractions and Interface
  • Implementation
  • Conclusion

3
Introduction
  • The Vesta parallel file system
  • For the AIX on the IBM SP2
  • Design to provide parallel file access
  • Can achieve high efficiency on parallel I/O
    hardware
  • Deal exclusively with persistent on-line storage
    of files, particularly those that must be
    accessed by parallel applications

4
Introduction cont.
5
Introduction cont.
  • Method for Vesta file system
  • Introduce a new abstraction of parallel files, by
    which application programmers can express the
    required partitioning of file data among the
    processes of a parallel application
  • Reduce the need for synchronization and
    concurrency control, and allows for a more
    streamlined implementation
  • Provide explicit control over the way data is
    distributed across the I/O nodes, and allows the
    distribution to be tailored for the expected
    access patterns

6
Motivation and Design Guidelines
  • Motivation
  • Users are able to create distributed files
    without full control over the mapping of data to
    disks
  • Design Guidelines
  • Parallelism
  • Scalability
  • Layering
  • Providing commonly expected service

7
Simple stripping method to get a parallel view
Simple stripping technique Assuming that the
number of I/O nodes is N, block i of the file is
located on I/O node i mod N.
8
Method of Vesta to get a parallel view
  • Two steps
  • Abstract away from a direct dependency on the
    number of I/O nodes
  • Allow a variety of partitioned views of the data,
    in addition to partitioning according to the
    physical distribution of data to the I/O nodes
  • All these parallel views partition the file into
    disjoint subfiles, that are typically accessed by
    different processes of a parallel application
  • Guarantee that the accesses by the different
    processes are non-overlapping at the byte level
  • Allow each process to access its data directly

9
Cell abstraction of Vesta
  • Abstracting away from I/O nodes is done by
    introducing the notion of cells
  • Cells can be thought as containers where data can
    be deposited
  • When a file is created, the number of cells is
    given as a parameter
  • If the number of cells is no more than the number
    of I/O nodes, then each cell will reside on a
    different I/O node
  • If there are more cells than I/O nodes, the cells
    will be distributed to the I/O nodes in
    round-robin manner

10
2-d structure of Vesta
  • 2-dimensional structure
  • Cell dimension (horizontal) specifies the
    parallelism in accessing the data
  • Data within the cells (vertical)
  • The data in each cell is viewed as a sequence of
    basic striping units (BSUs).
  • The BSU size can be an arbitrary number of bytes,
    and should be chosen to reflect the minimal unit
    of data access

11
Two parameters to define the structure
  • The number of cells
  • The BSU size
  • The two parameters are defined when the file is
    created , and cant be changed thereafter.
  • Attach -- new call to do this
  • Every process in the application must attach
    every file before it can open the file.

12
Partition files for parallel access
13
Partition files for parallel access
  • Define the template of Vesta subfiles
  • Define the block size used to distribute the data
  • Data decomposition scheme

14
Handling awkward cases
  • Ghost cell The extra cells are added to make the
    total a multiple of Hbs ? Hn
  • Ghost cell has no effect for reading and writing
  • Hole cell leaving a hole in the middle of a cell
    for cells with different length
  • Writing to a hole causes it to be filled with
    valid data
  • Call the Vesta stat function to find how much
    data is contained in the whole file

15
Data ordering
16
Feature of Vesta system
  • Key feature The capability to perform direct
    access from a compute node to an I/O node without
    referencing any centralized metadata
  • The form of the abstraction
  • The 2-d structure of BSUs within cells
  • The interface used to access the abstraction
  • partition is also an innovative feature
  • The partitioning is defined in advance, and then
    processes can perform independent accesses to any
    part of their partition (subfile)

17
Implementation
  • Create dedicated I/O nodes
  • A client library linked with application code
    running on the compute nodes
  • A server that runs on the I/O nodes
  • Achieve direct access from a compute node to the
    I/O node
  • Find metadata distributed among all the I/O nodes
  • Can identify the I/O nodes using a combination of
    the metadata , parameter, and the offset and
    count of data

18
Access to MetaData
  • Vesta objects files, cells and Xrefs
  • Each I/O node maintains the Vesta objects in a
    memory-mapped table.
  • The I/O nodes are logically numbered
  • Each entry in the table contains information, the
    file name, its owner ID, group and access
    permissions, creation, access, and last
    modification times, the number of cells, the BSU
    size, the base and highest numbered I/O nodes
    used, and the current file status.
  • 7-bit uniquifier field to distinguish two files
    or Xrefs with different names
  • 1-bit field to distinguish files from Xrefs
  • 8-bit level field are used to number cells of a
    file

19
Attaching and opening
  • The file is attached to the application Access
    the metadata to get parameters, such as the base
    and maximal I/O nodes, the number of cells, and
    the BSU size
  • Open a subfile call open function to set the
    partitioning parameter that define which subfile
    is being accessed

20
Directory Structure
  • Vesta files are accessed directly by hashing
    their pathnames and dont need to maintain
    directories to find files.
  • For users to be easy to organize their files, a
    hierarchical structure of directories is created
    using Xrefs.
  • Xrefs simply contain lists of internal Ids of
    files and other Xrefs.

21
Access to File Data
  • Access is done by providing a byte offset and a
    byte count
  • Vesta does not have a separate seek function
  • File data is not cached on compute nodes
  • Three mechanisms for reducing access latency
  • Use of buffer caches on the I/O node
  • Asynchronous I/O operations
  • Explicit prefetch and flush operations

22
Access to File Data
23
Sharing
  • Vesta supports sharing in two main ways
  • Partition the file into disjoint subfiles that
    can be accessed with no synchronization among the
    sharing processes
  • Share a subfile
  • Each process can have an independent file pointer
    into the shared subfile
  • Each process can share a single pointer
  • When an application process opens a subfile for
    the first time, it gets a local, private pointer.
  • When a pointer is shared, a random I/O node is
    chosen, and the pointer is moved to that I/O
    node. The identity of this node and pointers ID
    on that node are passed to all processes that
    share its use. When a data access based on a
    shared pointer is performed, the accessing node
    first communicates with the I/O node holing the
    pointer. The current pointer value is returned to
    the accessing node.

24
Concurrency Control
  • Concurrency control appears
  • Write data to a shared subfile
  • Overlapping subfiles using independent offsets
  • When an application interleaves file metadata
    operations , they also affect the file data
  • One application writes a file while others read
    it
  • Vesta uses a fast token-passing mechanism among
    the I/O nodes to guarantee concurrency atomicity
    of request that span multiple I/O nodes, and to
    provide sequential consistency and
    linearizability among requests
  • When the token reaches the last I/O node, it
    sends an acknowledgement to the requesting
    compute node.

25
Concurrency Control
  • Each I/O node maintains a set of 64 token
    buckets, each with an in counter and an out
    counter
  • Each file is assigned to one bucket of the set
  • When each token is sent, the out counter is
    incremented
  • When a node receives a token, it first tries to
    match the tokens value with the value of the
    buckets in counter. Token that do not match are
    delayed until other tokens that should be
    processed before they arrive, and increment the
    in counter.

26
Structures for Storing Data
  • Blocklists for cells are maintained at the I/O
    nodes
  • All I/O node metadata, including the block list,
    are pinned into memory
  • The block list of each cell is organized as a
    16-ary tree.

27
(No Transcript)
28
Conclusion
  • Vesta is a new approach to parallel I/O file
    systems
  • The basis of this approach is 2-d structure of
    Vesta files, one dimension represents the
    parallelism and the other represents sequential
    data
  • Vesta introduces the notion of partitioning the
    data
  • Vesta are fully implemented on an IBS SP1
    multi-computer, using the EUI-H message passing
    library and the MPX job control facility
  • Vesta is the base technology for the AIX Parallel
    I/O File System used with the IBM SP2

29
Question
  • What is the 2-dimensional structure of Vesta
    files?
  • What is key feature of the Vesta Parallel File
    system?
  • What mechanism does the Vesta file system use to
    control concurrency?
Write a Comment
User Comments (0)
About PowerShow.com