15-440, Hadoop Distributed File System Allison Naaktgeboren - PowerPoint PPT Presentation

About This Presentation

Title:

15-440, Hadoop Distributed File System Allison Naaktgeboren

Description:

15-440, Hadoop Distributed File System. Allison Naaktgeboren. Wut u mean? ... Avoid bothering the Master too often. When a Client has 1 chunk's worth of data ... – PowerPoint PPT presentation

Number of Views:214

Avg rating:3.0/5.0

Slides: 16

Provided by: andre4

Learn more at: https://www.andrew.cmu.edu

Category:

Tags: allison | bothering | distributed | file | hadoop | naaktgeboren | system

Transcript and Presenter's Notes

Title: 15-440, Hadoop Distributed File System Allison Naaktgeboren

1
15-440, Hadoop Distributed File SystemAllison
Naaktgeboren

Ur doin' it rong kitteh

Wut u mean? I iz loadin a HA-doop fileh

2
Annoucements

Go Vote!
Interpretive Dances happen only after Lecture
Office Hour Change
Mon 630-930
Tues 6-730
Exams are graded

3
Hadoop Core at 30,000 ft
4
Back to the Map Reduce Model

Recall that
map (in_key, in_value) -gt
(inter_key, inter_value) list
combine (inter_key, inter_value) ? (inter_key,
inter_value)
reduce (inter_key, inter_value list) -gt
(out_key, out_vlaue)?
What resource are we most constrained by?
Oceans of Data, Skinny pipes
How many types of data will the file system care
about?
How long will we need each kind?
What is the common case for each?

5
(No Transcript)
6
What would a MR Filesytem need?

General Use case large files
Mostly append to end, long sequential reads, few
deletes
Appends might be concurrent
Scability
Adding (or losing) machines should be relatively
painless
Nodes work on nearby data
Minimize moving data between machines
Bandwidth is our limiting resource
Remember how much data
Failure (handling)is Common
Yea, yea we know, we took 213, we know hardware
sucks
No, really failure (handling) is common
(constant)?
Disks, processors,whole nodes, racks, and
datacenters

7
Addressing Those Concerns

Sequential Reads, appends need to be fast
Deletes can be painful
Hot plug machines
Add or lose machines while system is running jobs
System should auto detect the change
HDFS should distribute data somewhat evenly
So that all workers have a reasonable amount of
data to chew on
And coordinating with the Jobtracker (job
master)?
Data Replication
Should be spread out. Why?
What type of problems could arise?

8
Moving into the Details

Nodes in HDFS
NameNode (master) ( like GFS Master)?
DataNodes (slaves) ( like GFS chunkservers)?
NB Hadoop and HDFS closely paired
careful use of jargon defines the true expert
worker node A and data node 1 are frequently
the same machine
Two types of Masters
Jobtracker (Hadoop Job Master)?
NameNode (file system Master)?
What I mean by 'master' for the rest of the
lecture

9
Your Data goes in ....

Files are divided into Chunks
64 MB
The mapping between filename and chunks goes to
the Master
Each chunk is replicated and sent off to
DataNodes
By default, 3
The master determines which dataNodes

10
What the Clients Do

Where the data starts
On file creation creates a seperate file
w/checksum
When data fetched back from a dataNode, checksum
computed again
Cache file data
Avoid bothering the Master too often
When a Client has 1 chunk's worth of data
Contacts the Master,
Master sends name of dataNodes to send it to
ONLY sends it to the 1st

11
What the DataNodes Do

Heartbeat to the Master
Opens, closes, or replicates a chunk if requested
from Master
During replication, sends data to next dataNode
in chain

12
What the Namespace Node Does

System metadata!
Holds Name-gtID mapping
Chunk replicas locations
Transcation Logs
EditLog
FSImage
It is responsible for coherency
Uses the logs atomically
Addresses the conccurent writes issue
It is checkpointed
Similar to AFS volume snapshots
Will pull last consistent log upon restart

13
What the Namespace Node Does

Listens for Heartbeats
Listens for Client Requests
If no heartbeat
marks a node as dead
Its data is deregistered
It selects dataNodes
Which nodes get which chunks
Signals creating, opening, closing
Deletes
Orders move to /trash
Starts delete timer

14
All together Now!
15
Additional Resources

Hadoop wiki
Youtube ? Hadoop ? Google developer videos (1-3
will be helpful)?
Google University
Includes UW course, the other UW course, a couple
others
Use are your own risk
The Google File System paper is rather readable
as research papers go

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Introduction to Apache Hadoop HDFS PowerPoint PPT Presentation

Introduction to Apache Hadoop HDFS - This presentation introduces Apache Hadoop HDFS. It describes the HDFS file system in terms of Hadoop and big data. It looks at its architecture and resiliance. | PowerPoint PPT presentation | free to view

Overview of Hadoop PowerPoint PPT Presentation

Overview of Hadoop - Hadoop's Distributed File System is designed to reliably store very large files across machines in a large cluster. It is inspired by the Google File System. Hadoop DFS stores each file as a sequence of blocks, all blocks in a file except the last block are the same size. | PowerPoint PPT presentation | free to view

An introduction to the Apache Hadoop command set PowerPoint PPT Presentation

An introduction to the Apache Hadoop command set - An introduction to Apache Apache Hadoop command set. What commands are available and what do they do ? A brief introduction to each command without indepth detail. | PowerPoint PPT presentation | free to view

FLEXIBLE MANUFACTURING SYSTEMS PowerPoint PPT Presentation

FLEXIBLE MANUFACTURING SYSTEMS - flexible manufacturing systems modeling and analysis of manufacturing systems definition a flexible manufacturing system (fms) is a set of numerically controlled ... | PowerPoint PPT presentation | free to view

The Hadoop Distributed File System, by Dhyuba Borthakur and Related Work PowerPoint PPT Presentation

The Hadoop Distributed File System, by Dhyuba Borthakur and Related Work - Goals. SECTION TITLE. Very Large Distributed File System 10K nodes, 100 million files, 10 PB. Assumes Commodity Hardware Files are replicated to handle ... | PowerPoint PPT presentation | free to view

FILE SYSTEM Materi VIII PowerPoint PPT Presentation

FILE SYSTEM Materi VIII - FILE SYSTEM Materi VIII * * * * Dapat mencari direktori asas (root directory) dan menggunakannya sebagai salinan backup FAT dan ini akan mengelakan komputer dari ... | PowerPoint PPT presentation | free to view

HADOP ONLINE TRAINING | HADOOP COURSE DETAILS | SRY IT SOLUTIONS PowerPoint PPT Presentation

HADOP ONLINE TRAINING | HADOOP COURSE DETAILS | SRY IT SOLUTIONS - SAP HADOOP OVERVIEW: SAP HADOOP Online Training offered by SRY IT Solutions from Hyderabad, India. Online SAP HADOOP Training delivered by 6 years real time consultant at SRY IT Solutions. We also provides Job support. Any critical issues faced by resource resolved by our support team by using Team Viewer, Webex. We give you Top 100 Interview Questions, Guidelines on Resume preparation - built in best corporate standards according to the Job description. We will market the resume for top clients in the IT market. Contact us for more details and Session Schedules at: Ind: +91- 9948030675, USA: +1-319-804-4998 Email: info@sryitsolutions.com, Web: http://www.sryitsolutions.com/ | PowerPoint PPT presentation | free to view

ONLINEITGURU offers online hadoop training. with 12 years real time experts, 100% live projects, 24/7 lab assesses . PowerPoint PPT Presentation

ONLINEITGURU offers online hadoop training. with 12 years real time experts, 100% live projects, 24/7 lab assesses . - Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. ONLINEITGURU offers online Hadoop training. Hadoop online training in usa, uk, Canada, Australia ,India, Singapore and all the world with 12 years real time expert 100% live projects, 24/7 lab assesses money back guaranteed training. | PowerPoint PPT presentation | free to view

Hadoop online training fro EasylearningGuru PowerPoint PPT Presentation

Hadoop online training fro EasylearningGuru - easylearningguru provides online classes on Hadoop ,witch will help u to lean hadoop easy and fast | PowerPoint PPT presentation | free to view

Distributed File System PowerPoint PPT Presentation

Distributed File System - By Manshu Zhang Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference DFS A distributed implementation of the classical time ... | PowerPoint PPT presentation | free to view

Global Hadoop Market | Hardware, Software, Services | Applications | Forecast 2014-2020 PowerPoint PPT Presentation

Global Hadoop Market | Hardware, Software, Services | Applications | Forecast 2014-2020 - Global Hadoop Market Applications, Geography, Haas, Strategy, Industry Overview, Size, Regional Analysis, Share, Global Trends,Opportunities, Segmentation and Forecast 2014 - 2020 | PowerPoint PPT presentation | free to view

Transport systems in animals PowerPoint PPT Presentation

Transport systems in animals - Functions of a transport/circulatory system Invertebrate circulation Diffusion Aided by gastrovascular cavity Water vascular system Open circulatory system Closed ... | PowerPoint PPT presentation | free to view

Hadoop Training in Hyderabad PowerPoint PPT Presentation

Hadoop Training in Hyderabad - Hadoop Institutes : kelly technologies is the best Hadoop Training Institutes in Hyderabad. Providing Hadoop training by real time faculty in Hyderabad. | PowerPoint PPT presentation | free to view

Best Hadoop online training in Usa, India, Uk, Canada, Singapore PowerPoint PPT Presentation

Best Hadoop online training in Usa, India, Uk, Canada, Singapore - Ithub online training provides Hadoop online training. Contact:- India: +91 9949599844 USA: +1-347-606-2716 Email: contact@ithubonlinetraining.com | PowerPoint PPT presentation | free to view

Hadoop Training in Hyderabad | Hadoop training institutes in Hyderabad PowerPoint PPT Presentation

Hadoop Training in Hyderabad | Hadoop training institutes in Hyderabad - Hadoop Institutes : kelly technologies is the best Hadoop Training Institutes in Hyderabad. Providing Hadoop training by real time faculty in Hyderabad. | PowerPoint PPT presentation | free to view

introduction to data processing using Hadoop and BigData PowerPoint PPT Presentation

introduction to data processing using Hadoop and BigData - Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Setting up the Hadoop Cluster, Map-Reduce,PIG, HIVE, HBase, Zookeeper, SQOOP etc. will be covered in the course. Call for Demo: + USA : +1 9404408084 , IND : +91 9533837156 Email: info@maxonlinetraining.com Registration Link for Demo: https://goo.gl/KC31Ea | PowerPoint PPT presentation | free to view

Hadoop Tutorial Training at S & M Consultant PowerPoint PPT Presentation

Hadoop Tutorial Training at S & M Consultant - Get the course contents and online training of Hadoop open Source framework at S & M Consultant. We are providing the best training with experienced trainers. Contact us at 1-650-585-2312 or else Visit our website at http://smconsultant.com/hadoop-training-course-content-usa-uk-india-canada-singapore/ | PowerPoint PPT presentation | free to view

United Trainings – Best Online Training Institute for Hadoop, Salesforce, Qlikview, DevOps PowerPoint PPT Presentation

United Trainings – Best Online Training Institute for Hadoop, Salesforce, Qlikview, DevOps - United Trainings is one of the Best Online Trainings provider for Hadoop, Qlikview, DevOps, Workday HCM, Dell Boomi, PL SQL, Oracle DBA, Selenium Testing, SAS. Hadoop, Big Data Online Training. http://unitedtrainings.com | PowerPoint PPT presentation | free to view

HDT set of plugins Using Eclipse IDE for developing the Hadoop platform PowerPoint PPT Presentation

HDT set of plugins Using Eclipse IDE for developing the Hadoop platform - The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications.In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. http://www.thinkittraining | PowerPoint PPT presentation | free to view

Development Environment Of Hadoop MapReduce | Hadoop Online Training PowerPoint PPT Presentation

Development Environment Of Hadoop MapReduce | Hadoop Online Training - http://smconsultant.com/hadoop-online-training-usa-uk-india-canada-singapore/ | PowerPoint PPT presentation | free to view

Best Hadoop Online Training in USA | UK| Canada | India by Experts PowerPoint PPT Presentation

Best Hadoop Online Training in USA | UK| Canada | India by Experts - From our Hadoop Online Training learner can understand the fundamental concepts of Hadoop Tool. Our training program is packed with tips, exercises, hints and examples. Our training sessions makes you to learn Servicenow quickly and effectively and also helps you to pass Bigdata Certification easily. Contact for more details India: +91-9642373173, USA: : +1-845-915-8712, Mail: info@svsoftsolutions.com | PowerPoint PPT presentation | free to view

Hadoop Admin Online Training Hyderabad | Hadoop Admin online Course PowerPoint PPT Presentation

Hadoop Admin Online Training Hyderabad | Hadoop Admin online Course - Hadoop was developed by Apache Software which is a open source framework used to process large data for distributed storage. Contact for Hadoop admin online training. | PowerPoint PPT presentation | free to view

Hadoop Online Training | Hadoop Training PowerPoint PPT Presentation

Hadoop Online Training | Hadoop Training - Request for a demo today at (+1) 650-585-2312 or else send mail to contact@smconsultant.com to get details course content on Online Hadoop Bigdata. One of our expert members will contact you to give the best training structure useful for you and followed by certified consultants. http://smconsultant.com/hadoop-online-training-usa-uk-india-canada-singapore/ | PowerPoint PPT presentation | free to view

big data and hadoop training PowerPoint PPT Presentation

big data and hadoop training - Hadoop is an Apache project to store & process Big Data. Hadoop stores large chunk of data called Big Data in a distributed & fault tolerant manner over commodity hardware. After storing, Hadoop tools are used to perform data processing over HDFS (Hadoop Distributed File System).We can say Apponix is best Hadoop training institute in Bangalore. Trainers are skilled professionals and having good experience in industries. | PowerPoint PPT presentation | free to view

Introduction to Big Data HADOOP HDFS MapReduce - Department of Computer Engineering PowerPoint PPT Presentation

Introduction to Big Data HADOOP HDFS MapReduce - Department of Computer Engineering - This presentation is an Introduction to Big Data, HADOOP: HDFS, MapReduce and includes topics What is Big Data and its benefits, Big Data Technologies and their challenges, Hadoop framework comparison between SQL databases and Hadoop and more. It is presented by Prof. Deptii Chaudhari, from the department of Computer Engineering at International Institute of Information Technology, I²IT. | PowerPoint PPT presentation | free to view

Global Hadoop Market Size, Status and Forecast 2020-2026 PowerPoint PPT Presentation

Global Hadoop Market Size, Status and Forecast 2020-2026 - Hadoop, the Apache Hadoop developed by Apache Software Foundation, is an open-source software framework for storing data and running applications on clusters of commodity hardware. | PowerPoint PPT presentation | free to view

Introduction And Components Of Hadoop Architecture PowerPoint PPT Presentation

Introduction And Components Of Hadoop Architecture - Hadoop is a batch processing system for a cluster of nodes that gives the bases of the biggest Data analytic activities because it bundles two sets of functionality, most wanted to deal with huge unstructured datasets i.e Distributed file systems and MapReduce processing. | PowerPoint PPT presentation | free to view