Big Data Hadoop Tutorial PPT for Beginners - PowerPoint PPT Presentation

About This Presentation
Title:

Big Data Hadoop Tutorial PPT for Beginners

Description:

DataFlair's takes you through various concepts of Hadoop:This Hadoop tutorial PPT covers: 1. Introduction to Hadoop 2. What is Hadoop 3. Hadoop History 4. Why Hadoop 5. Hadoop Nodes 6. Hadoop Architecture 7. Hadoop data flow 8. Hadoop components – HDFS, MapReduce, Yarn 9. Hadoop Daemons 10. Hadoop characteristics & features Related Blogs: Hadoop Introduction – A Comprehensive Guide: Wish to Learn Hadoop & Carve your career in Big Data, Contact us: info@data-flair.training +91-7718877477 – PowerPoint PPT presentation

Number of Views:2631

less

Transcript and Presenter's Notes

Title: Big Data Hadoop Tutorial PPT for Beginners


1
Hadoop Tutorial
2
Agenda
  • Introduction to Hadoop
  • Hadoop nodes daemons
  • Hadoop Architecture
  • Characteristics
  • Hadoop Features

3
What is Hadoop?
  • The Technology that empowers Yahoo, Facebook,
    Twitter, Walmart and others

Hadoop
4
What is Hadoop?
  • An Open Source framework that allows distributed
    processing of large data-sets across the cluster
    of commodity hardware

5
What is Hadoop?
  • An Open Source framework that allows distributed
    processing of large data-sets across the cluster
    of commodity hardware
  • Open Source
  • Source code is freely available
  • It may be redistributed and modified

6
What is Hadoop?
  • An open source framework that allows Distributed
    Processing of large data-sets across the cluster
    of commodity hardware
  • Distributed Processing
  • Data is processed distributedly on multiple nodes
    / servers
  • Multiple machines processes the data independently

7
What is Hadoop?
  • An open source framework that allows distributed
    processing of large data-sets across the Cluster
    of commodity hardware
  • Cluster
  • Multiple machines connected together
  • Nodes are connected via LAN

8
What is Hadoop?
  • An open source framework that allows distributed
    processing of large data-sets across the cluster
    of Commodity Hardware
  • Commodity Hardware
  • Economic / affordable machines
  • Typically low performance hardware

9
What is Hadoop?
  • Open source framework written in Java
  • Inspired by Google's Map-Reduce programming model
    as well as its file system (GFS)

10
Hadoop History
2002
2003
2005
2006
2008
2009
2004
2007
11
Hadoop Components
  • Hadoop consists of three key parts

12
Hadoop Nodes
Nodes
Master Node
Slave Node
13
Hadoop Daemons
Nodes
Master Node
Slave Node
Resource Manager
Node Manager
NameNode
DataNode
14
Basic Hadoop Architecture
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Work
Sub Work
Sub Work
Sub Work
Sub Work
Master(s)
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
100 SLAVES
15
Hadoop Characteristics
Distributed Processing
Open Source
Fault Tolerance
Easy to use
Reliability
Economic
High Availability
Scalability
16
Open Source
  • Source code is freely available
  • Can be redistributed
  • Can be modified

Free
Transparent
Affordable
Inter-operable
Open Source
Community
No vendor lock
17
Distributed Processing
  • Data is processed distributedly on cluster
  • Multiple nodes in the cluster process data
    independently

Centralized Processing
Distributed Processing
18
Fault Tolerance
  • Failure of nodes are recovered automatically
  • Framework takes care of failure of hardware as
    well tasks

19
Reliability
  • Data is reliably stored on the cluster of
    machines despite machine failures
  • Failure of nodes doesnt cause data loss

20
High Availability
  • Data is highly available and accessible despite
    hardware failure
  • There will be no downtime for end user
    application due to data

21
Scalability
  • Vertical Scalability New hardware can be added
    to the nodes
  • Horizontal Scalability New nodes can be added
    on the fly

22
Economic
  • No need to purchase costly license
  • No need to purchase costly hardware

Economic
Open Source
Commodity Hardware


23
Easy to Use
  • Distributed computing challenges are handled by
    framework
  • Client just need to concentrate on business logic

24
Data Locality
  • Move computation to data instead of data to
    computation
  • Data is processed on the nodes where it is stored

Data
Data
Data
Data
Algo
Algo
Algorithm
Algo
Algo
25
Summary
  • Everyday we generate 2.3 trillion GBs of data
  • Hadoop handles huge volumes of data efficiently
  • Hadoop uses the power of distributed computing
  • HDFS Yarn are two main components of Hadoop
  • It is highly fault tolerant, reliable available

26
Thank You
  • DataFlair
Write a Comment
User Comments (0)
About PowerShow.com