The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur

1 / 25

About This Presentation

Title:

The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur

Description:

The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao –

Number of Views:159

Avg rating:3.0/5.0

Slides: 26

Provided by: SDXR

Category:

more less

Transcript and Presenter's Notes

Title: The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur

1
The Hadoop Distributed File SystemArchitecture
and Designby Dhruba Borthakur

Presented by Bryant Yao

2
Introduction

What is it? Its a file system!
Supports most of the operations a normal file
system would.
Open source implementation of GFS (Google File
System).
Written in Java
Designed primarily for GNU/Linux
Some support for Windows

3
Design Goals

HDFS is designed to store large files (think TB
or PB).
HDFS is designed for a computer cluster/s made up
of racks.
Write once, read many model
Useful for reading many files at once but not
single files.
Streaming access of data
Data is coming to you constantly and not in waves
Make use of commodity computers
Expect hardware to fail
Moving computation is cheaper than moving data

Rack 1
Rack 2
Cluster
4
Master/Slave Architecture
Namenode
Datanodes
5
Master/Slave Architecture cont.

1 master, many slaves
The master manages the file system namespace and
regulates access to files by clients.
Data distributed across slaves. The slaves store
the data as blocks.
What is a block?
A portion of a file.
Files are broken down into and stored as a
sequence of blocks.

File 1
A
B
C
Broken down into blocks A, B, and C.
6
Task Flow
7
Namenode

Master
Handles metadata operations
Stored in a transaction log called EditLog
Manages datanodes
Passes I/O requests to datanodes
Informs the datanode when to perform block
operations.
Maintains a BlockMap which keeps track of which
blocks each datanode is responsible for.
Stores all files metadata in memory
File attributes, number of replicas, files
blocks, block locations, and checksum of a block.
Stores a copy of the namespace in the FsImage on
disk.

8
Datanode

Slave
Handles data I/O.
Handles block creation, deletion, and replication
Local storage is optimized so files are stored
over multiple file directories
Storing data into a single directory

9
Data Replication

Makes copies of the data!
Replication factor determines the number of
copies.
Specified by namenode or during file creation
Replication is pipelined!

10
Pipelining Data Replication

Blocks are split into portions (4KB).

1
2
3
Assume a block is split into 3 portions A, B,
and C.
A
1
2
3
B
A
1
2
3
C
B
A
11
Replication Polic y

Communication bandwidth between computers in a
rack is greater than between a computer outside
of the rack.
We could replicate data across racksbut this
would consume the most bandwidth.
We could replicate data across all computers in a
rackbut if the rack dies were in the same
position as before.

12
Replication Polic y cont.

Assume only three replicas are created.
Split the replicas between 2 racks.
Rack failure is rare so were still able to
maintain good data reliability while minimizing
bandwidth cost.
Version 0.18.0
2 replicas in current rack (2 different nodes)
1 replica in remote rack
Version 0.20.3.x
1 replica in current rack
2 replicas in remote rack (2 different nodes)
What happens if replication factor is 2 or gt 3?
No answer in this paper.
Some other papers state that the minimum is 3.
The author wrote a separate paper stating every
replica after the 3rd is placed randomly.

13
Reading Data

Read the data thats closest to you!
If the block/replica of data you want is on the
datanode/rack/data center youre on, read it from
there!
Read from datanodes directly.
Can be done in parallel.
Namenode is used to generate the list of
datanodes which host a requested file as well as
getting checksum values to validate blocks
retrieved from the datanodes.

14
Writing Data

Data is written once
Split into blocks, typically of size 64MB
The larger the block size, the less metadata
stored by the namenode
Data is written to a temporary local block on the
client side and then flushed to a datanode, once
the block is full.
If a file is closed while the temporary block
isnt full, the remaining data is flushed to the
datanode.
If the namenode dies during file creation, the
file is lost!

15
Hardware Failure
Imagine a file is broken into 3 blocks spread
over three datanodes.
1
2
3
Block A
Block B
Block C
If the third datanode died, we would have no
access to block C and we cant retrieve the file.
1
2
3
Block A
Block B
Block C
16
Designing for Hardware Failure

Data replication
Safemode
Heartbeat
Block report
Checkpoints
Re-replication

17
Checkpoints
EditLog
FsImage

File System Namespace
18
Checkpoints

FsImage is a copy of the system taken before any
changes have occurred.
EditLog is a log of all the changes to the
namenode since its startup.
Upon the start up of the namenode, it applies all
changes to the FsImage to create an up to date
version of itself.
The resulting FsImage is the checkpoint.
If either the FsImage or EditLog is corrupt, the
HDFS will not start!

19
Heartbeat and Blockreport

A heartbeat is a message sent from the datanode
to the namenode.
Periodically sent to the namenode, letting the
namenode know its alive.
If its dead, assume you cant use it.
Blockreport
A list of blocks the datanode is handling.

20
Safemode

Upon startup, the namenode enters safemode to
check the health status of the cluster. Only done
once.
Heartbeat is used to ensure all datanodes are
available to use.
Blockreport is used to check data integrity.
If the number of replicas retrieved is different
from the number of replicas expected, there is a
problem.
Replicated Found

A
A
A
A
A
21
Re-replication/De-replication

During startup and when receiving heartbeats, the
namenode will check to see if the number of
replicas for each block is satisfied.
If the number of replicas found was lower than
expected, perform data replication for each block
that does not satisfy the above criteria.
If the number of replicas found was lower than
expected, the namenode randomly selects datanodes
to remove blocks from, for each block that does
not satisfy the above criteria.

22
Other

Can view file system through FS Shell or the web
Communicates through TCP/IP
File deletes are a move operation to a trash
folder which auto-deletes files after a specified
time (default is 6 hours).
Rebalancer moves data from datanodes which have
are close to filling up their local storage.

23
Relation with Search Engines

Originally built for Nutch.
Intended to be the backbone for a search engine.
HDFS is the file system used by Hadoop.
Hadoop also contains a MapReducer which has many
applications, like indexing the web!
Analyzing large amounts of data.
Used by many, many companies
Google, Yahoo!, Facebook, etc.
It can store the web!
Just kidding ?.

24
Pros/Cons

The goal of this paper is to describe the system,
not analyze it. It gives a great beginning
overview.
Probably couldve been condensed/organized
better.
Some information is missing
SecondaryNameNode
CheckpointNode
Etc.

25
Pros/Cons of HDFSIn and Beyond the Paper