Hadoop Course Content | Hadoop Online Training in Hyderabad - PowerPoint PPT Presentation

About This Presentation
Title:

Hadoop Course Content | Hadoop Online Training in Hyderabad

Description:

RVH Technologies is a Brand of Online trainings… Honest,Dedication,Hard work..Is the secret of success for our Institute…. Believe us ,Join Us..We will make You Experts…. We are concentrating mainly in Online Trainings.... All the courses are conducted in the latest versions. We will Provide the online training based on the User Requirement (This May be Full fledged Couse,Some Modules of the course based on the User Need) Please Request for a FREE DEMO,Check the Out the standards, Then Choose the best Training Center. We are 100% sure ,you will reach to us after the demo class……… For Further Queries Please contact us on 91 8790137293 Email:info@rvhtech.com Web:www.rvhtech.com Exclusive Offer: If you come up with one more referral,You will get the discount of 20%, If it is two referrals ..you will get discount of 30% And more than that you will get 40% discount. – PowerPoint PPT presentation

Number of Views:67

less

Transcript and Presenter's Notes

Title: Hadoop Course Content | Hadoop Online Training in Hyderabad


1
(No Transcript)
2
(No Transcript)
3
Hadoop Technical Introduction from RVH
Technologies.
4
Hadoop is a free, Java-based programming
framework that supports the processing of large
data sets in a distributed computing environment.
It is part of the Apache project sponsored by the
Apache Software Foundation.
5
Terminology
Google calls it Hadoop equivalent
MapReduce Hadoop
GFS HDFS
Bigtable HBase
Chubby Zookeeper
6
Some MapReduce Terminology
  • Job A full program - an execution of a Mapper
    and Reducer across a data set
  • Task An execution of a Mapper or a Reducer on a
    slice of data
  • a.k.a. Task-In-Progress (TIP)
  • Task Attempt A particular instance of an
    attempt to execute a task on a machine

7
Task Attempts
  • A particular task will be attempted at least
    once, possibly more times if it crashes
  • If the same input causes crashes over and over,
    that input will eventually be abandoned
  • Multiple attempts at one task may occur in
    parallel with speculative execution turned on
  • Task ID from TaskInProgress is not a unique
    identifier dont use it that way

8
(No Transcript)
9
Nodes, Trackers, Tasks
  • Master node runs JobTracker instance, which
    accepts Job requests from clients
  • TaskTracker instances run on slave nodes
  • TaskTracker forks separate Java process for task
    instances

10
Job Distribution
  • MapReduce programs are contained in a Java jar
    file an XML file containing serialized program
    configuration options
  • Running a MapReduce job places these files into
    the HDFS and notifies TaskTrackers where to
    retrieve the relevant program cod

11
Creating the Mapper
  • You provide the instance of Mapper
  • Should extend MapReduceBase
  • One instance of your Mapper is initialized by the
    MapTaskRunner for a TaskInProgress
  • Exists in separate process from all other
    instances of Mapper no data sharing!

12
Mapper
  • void map(K1 key,
  • V1 value,
  • OutputCollectorltK2, V2gt output,
  • Reporter reporter)
  • K types implement WritableComparable
  • V types implement Writable

13
Getting Data To The Mapper
14
Reading Data
  • Data sets are specified by InputFormats
  • Defines input data (e.g., a directory)
  • Identifies partitions of the data that form an
    InputSplit
  • Factory for RecordReader objects to extract (k,
    v) records from the input source

15
Sending Data To The Client
  • Reporter object sent to Mapper allows simple
    asynchronous feedback
  • incrCounter(Enum key, long amount)
  • setStatus(String msg)
  • Allows self-identification of input
  • InputSplit getInputSplit()

16
Example Program - Wordcount
  • map()
  • Receives a chunk of text
  • Outputs a set of word/count pairs
  • reduce()
  • Receives a key and all its associated values
  • Outputs the key and the sum of the values
  • package org.myorg
  • import java.io.IOException
  • import java.util.
  • import org.apache.hadoop.fs.Path
  • import org.apache.hadoop.conf.
  • import org.apache.hadoop.io.
  • import org.apache.hadoop.mapred.
  • import org.apache.hadoop.util.
  • public class WordCount

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com