Title: Engineering BIG DATA with HADOOP
1ENGINEERING BIG DATA WITH HADOOP
- BY
- International School of Engineering
- We Are Applied Engineering
Disclaimer Some of the Images and content have
been taken from multiple online sources and this
presentation is intended only for knowledge
sharing but not for any commercial business
intention
2OVERVIEW
- WHAT IS BIG DATA?
- EXPLOSION OF DATA
- DATA CONTRIBUTIONS
- DATA EXPLOSION
- WHO ARE THE PLAYERS?
- BIG DATABIG PICTURE LANDSCAPE
- BIG DATA ENTERPRISE ROLES
- WHAT IS HADOOP?
- EVOLUTION OF HADOOP
- HADOOP ECOSYSTEM
- HADOOP ECOSYSTEM MAP
- HADOOP 30,000 FEET VIEW
- BIG DATA ANALYTICS Case studies
- VIDEO OF HADOOP ECOSYSYTEM
3WHAT IS BIG DATA?
- High-volume, high-velocity and high- variety
information assets that demand cost- effective,
innovative forms of information processing for
enhanced insight and decision making. -
-Gartner
HIGH VOLUME
HIGH VARIETY
HIGH VELOCITY
4EXPLOSION OF DATA
5Source http//www.emc.com/leadership/digital-univ
erse/iview/index.htm
6DATA CONTRIBUTIONS
7DATA EXPLOSION
8Source http//www.emc.com/collateral/about/news/i
dc-emc-digital-universe-2011-infographic.pdf
9Source http//www.emc.com/collateral/about/news/i
dc-emc-digital-universe-2011-infographic.pdf
10WHO ARE THE PLAYERS?
11(No Transcript)
12BIG DATABIG PICTURE LANDSCAPE
13BIG DATA ENTERPRISE ROLES
14INTRODUCTION TO
15WHAT IS HADOOP?
- Flexible
- Structured/Unstructured
- Text/Binary
- Schema/Schema less
- 100 Open Source
- Scalable
- Petabytes of Data
- Thousands of Nodes
Source http//cloudtimes.org/2013/06/25/hadoop-as
-a-service-market-growing/
16EVOLUTION OF HADOOP
How does an Elephant Sneak up on you?
17HADOOP ECOSYSTEM
Chukwa
Sqoop
Zookeeper
Pig
HBase
Avno
Mahout
Flume
Whirr
Map Reduce Engine
Hama
Hadoop Distributed File System
Hive
Hadoop Common
18HADOOP ECOSYSTEM MAP
Source http//indoos.wordpress.com/2010/08/16/had
oop-ecosystem-world-map/
19Hadoop Evolution Map Explained!
- How did it all start- huge data on the web!
- Nutch built to crawl this web data
- Huge data had to be saved- HDFS was born!
- How to use this data? Map reduce framework built
for coding and running analytics java, any
language-streaming (Hadoop streaming) - How to get in unstructured data Web logs, Click
streams, Apache logs, Server logsĀ fuse,webdav,
chukwa, flume, Scribe - Hiho and sqoop for loading data into HDFS RDBMS
can join the Hadoop band wagon!
20Continued
- High level interfaces required over low level map
reduce programming Pig, Hive, Jaql - BI tools with advanced UI reporting- drilldown
etc- Intellicus - Workflow tools over Map-Reduce processes and High
level languages Oozie - Monitor and manage hadoop, run jobs/hive, view
HDFS high level view- Hue, karmasphere, eclipse
plugin, cacti, ganglia - Support frameworks- Avro (Serialization),
Zookeeper (Coordination) - More High level interfaces/uses- Mahout, Elastic
map Reduce - OLTP- also possible Hbase
21HADOOP 30,000 FEET VIEW
- Distribute data initially
- Let processors / nodes work on local data
- Minimize data transfer over network
- Replicate data multiple times for increased
availability - Write applications at a high level
- Programmers should not have to worry about
network programming, temporal dependencies, low
level infrastructure, etc - Minimize talking between nodes (share-nothing)
22BIG DATA ANALYTICS
23YAHOO - PERSONALIZATION
24YAHOO SEARCH ASSIST
25For Detailed Description of HADOOP ECOSYSTEM
components
checkout our video on
26International School of Engineering
Plot no 63/A, 1st Floor, Road No 13, Film Nagar,
Jubilee Hills, Hyderabad-500033
For Individuals (91) 9502334561/62 For
Corporates (91) 9618 483 483
Facebook www.facebook.com/insofe
Slide share www.slideshare.net/INSOFE