Ceph: A Scalable, High-Performance Distributed File System - PowerPoint PPT Presentation

About This Presentation

Title:

Ceph: A Scalable, High-Performance Distributed File System

Description:

... of commits to disk. Atomic compound data ... EBOFS writes saturate disk for request sizes over 32k ... Manage replication, failure detection, and recovery ... – PowerPoint PPT presentation

Number of Views:1357

Avg rating:3.0/5.0

Slides: 20

Provided by: sw474

Learn more at: https://sdm.lbl.gov

Category:

more less

Transcript and Presenter's Notes

Title: Ceph: A Scalable, High-Performance Distributed File System

1
Ceph A Scalable, High-Performance Distributed
File System

Sage Weil
Scott Brandt
Ethan Miller
Darrell Long
Carlos Maltzahn
University of California, Santa Cruz

2
Project Goal

Reliable, high-performance distributed file
system with excellent scalability
Petabytes to exabytes, multi-terabyte files,
billions of files
Tens or hundreds of thousands of clients
simultaneously accessing same files or
directories
POSIX interface
Storage systems have long promised scalability,
but have failed to deliver
Continued reliance on traditional file systems
principles
Inode tables
Block (or object) list allocation metadata
Passive storage devices

3
CephKey Design Principles

Maximal separation of data and metadata
Object-based storage
Independent metadata management
CRUSH data distribution function
Intelligent disks
Reliable Autonomic Distributed Object Store
Dynamic metadata management
Adaptive and scalable

4
Outline

Maximal separation of data and metadata
Object-based storage
Independent metadata management
CRUSH data distribution function
Intelligent disks
Reliable Autonomic Distributed Object Store
Dynamic metadata management
Adaptive and scalable

5
Object-based StorageParadigm
Traditional Storage
Object-based Storage
Applications
Applications
File System
File System
Object Interface
Storage component
Logical Block Interface
Hard Drive
Object-based Storage Device (OSD)
6
CephDecoupled Data and Metadata
Applications
File System
Client
Metadata Manager
7
CRUSHSimplifying Metadata

Conventionally
Directory contents (filenames)
File inodes
Ownership, permissions
File size
Block list
CRUSH
Small map completely specifies data
distribution
Function calculable everywhere used to locate
objects
Eliminates allocation lists
Inodes collapse back into small, almost
fixed-sized structures
Embed inodes into directories that contain them
No more large, cumbersome inode tables

8
Outline

Maximal separation of data and metadata
Object-based storage
Independent metadata management
CRUSH data distribution function
Intelligent disks
Reliable Autonomic Distributed Object Store
Dynamic metadata management
Adaptive and scalable

9
RADOSReliable AutonomicDistributed Object Store

Ceph OSDs are intelligent
Conventional drives only respond to commands
OSDs communicate and collaborate with their peers
CRUSH allows us to delegate
data replication
failure detection
failure recovery
data migration
OSDs collectively form a single logical object
store
Reliable
Self-managing (autonomic)
Distributed
RADOS manages peer and client interaction
EBOFS manages local object storage

RADOS
RADOS
EBOFS
EBOFS
10
RADOS Scalability

Failure detection and recovery are distributed
Centralized monitors used only to update map
Maps updates are propagated by OSDs themselves
No monitor broadcast necessary
Identical recovery procedure used to respond to
all map updates
OSD failure
Cluster expansion
OSDs always collaborate to realize the newly
specified data distribution

11
EBOFSLow-level object storage

Extent and B-tree-based Object File System
Non-standard interface and semantics
Asynchronous notification of commits to disk
Atomic compound datametadata updates
Extensive use of copy-on-write
Revert to consistent state after failure
User-space implementation
We define our own interfacenot limited by
ill-suited kernel file system interface
Avoid Linux VFS, page cachedesigned under
different usage assumptions

RADOS
EBOFS
12
OSD PerformanceEBOFS vs ext3, ReiserFSv3, XFS

EBOFS writes saturate disk for request sizes over
32k
Reads perform significantly better for large
write sizes

13
Outline

Maximal separation of data and metadata
Object-based storage
Independent metadata management
CRUSH data distribution function
Intelligent disks
Reliable Autonomic Distributed Object Store
Dynamic metadata management
Adaptive and scalable

14
MetadataTraditional Partitioning
Coarse partition
Fine partition

Static Subtree Partitioning
Portions of file hierarchy are statically
assigned to MDS nodes
(NFS, AFS, etc.)

File Hashing
Metadata distributed based on hash of full path
(or inode )

Directory Hashing
Hash on directory portion of path only

Coarse distribution (static subtree partitioning)
hierarchical partition preserves locality
high management overhead distribution becomes
imbalanced as file system, workload change
Finer distribution (hash-based partitioning)
probabilistically less vulnerable to hot spots,
workload change
destroys locality (ignores underlying
hierarchical structure)