Hadoop vs Apache Spark - PowerPoint PPT Presentation

About This Presentation
Title:

Hadoop vs Apache Spark

Description:

Hadoop and Spark are 2 of the most prominant platforms for big data storage and analysis. Here are some essentials of Hadoop vs Apache Spark. – PowerPoint PPT presentation

Number of Views:675
Slides: 18
Provided by: valuecodersvc

less

Transcript and Presenter's Notes

Title: Hadoop vs Apache Spark


1
Hadoop Vs Apache Spark
2
Hadoop Introduction
  • Hadoop helps in storing large data sets. It also
    helps in running processes related to
    distributed analytics. Hadoop is a framework that
    is open source and can be freely used. Large data
    sets can be quickly and easily stored using
    Hadoop. Hadoop is an efficient framework it
    does not require large amounts of data transfer.
  • Hadoop makes sure that one job is processed at a
    time. Data warehousing is one of the core
    functions of Hadoop. The framework ensures that
    big data applications continue to run in case of
    a failures of individual servers.
  • Hadoop is a framework that is highly prefered for
    batch processing. The Hadoop framework is written
    in Java . Developers also use Hive on Top of
    Hadoop for adding SQL compatibility.
  • Hadoop can be used without any programming,
    because there are numerous integration services
    available out there.

3
Hadoop Advantages
4
Scalability
  • One of the key advantages of developing with
    hadoop is scalability. Since large data sets can
    be easily stored and distributed, it is highly
    scalable.
  • A large number of nodes are made possible by
    Hadoop, ensuring large amounts of data storage
    and distribution. In comparison to traditional
    RDMS, Hadoop is highly scalable.

5
Cost Effective
  • The big data requirements of today are humongous
    and these requirements can be fulfilled in a cost
    effective manner using Hadoop. The cost of data
    processing is much higher when it comes to
    traditional database management systems.
  • The simplified processing of complex data ensures
    that Hadoop is a cost effective framework.

6
Flexible Solution
  • Operating on different types of data and having
    access to different types of data is possible
    with Hadoop and this makes it a very flexible
    solution. This helps in generating value from all
    sorts of data that is gathered.
  • One could use a variety of data sources like
    social media and email etc. to gather as much
    useful data as possible.

7
Speed
  • Since there is a distributed system of files in
    Hadoop. The processing servers and storage
    servers are the same, making the process
    extremely fast.
  • The processing of data is highly efficient using
    the Hadoop framework.

8
Reliable
  • The higher level of tolerance to faults, is found
    only in Hadoop. Data replication in different
    nodes ensures that a clear backup is available.
  • This minimizes the chances of data failure.
    Hadoop is quite a reliable framework and helps in
    avoiding both single and multiple failures.

9
  • Looking for Agile teams for your big data
    project? Trust ValueCoders for all kinds of
    software development and big data projects.

10
Spark Introduction
  • Spark, is a tool that works on processing the
    data that has been distributed, using the Hadoop
    framework. The Spark platform has be designed run
    on top of Hadoop. It works as an alternative the
    batch model. It can used for hastening
    interactive queries and processing real time
    data. Spark does not have its own file
    management system, but integrated with one.
  • Spark is quite faster than hadoop when it comes
    to processing of data. Spark is different from
    Hadoop because it ensures complete data analytics
    of real time as well as stored data. Spark does
    not have the distributed storage system which is
    an essential for big data projects. Spark is
    also known for its advanced data processing and
    machine learning.

11
Spark Advantages
12
Faster
  • Spark places the data into Resilient Distributed
    Datasets. This data gets stored in the memory
    making it easily accessible.
  • Since the data is easily accessed from the
    memory, the MapReduce jobs can be undertaken very
    quickly.

13
Real Time Processing
  • There is a continuous growth of real time data.
    Processing large quantities of a real time data
    can be a big challenge.
  • This can help in processing of logs for live
    streaming sites and also help in fraud detection
    and electronic trading data.

14
Using Big Data Effectively
  • Big data needs to be used effectively to reach
    the right set of people with the right messaging.
    Big data makes use of very specific audiences to
    bring out the best conversion rate for a retail
    business. Many retail marketers fail to bring out
    the right results for the business because of
    lack of understanding of how to make the data
    usable and how to analyse it.
  • Technology has to be fully prepared and used for
    big data usage and integration.

15
Processing of Graphs
  • Graph processing helps in capturing the
    relationship between data and entities.
  • The process helps in analysing social as well as
    advertising data. Machine learning helps in
    carrying out advanced analytics and getting
    consumer understanding.

16
Power
  • Most companies need 2 systems one for storing
    and streaming data and the other for analyzing
    the data.
  • Spark helps in simplified application
    development, maintenance and deployment.

17
Get in Touch
  • sales_at_valuecoders.com
    www.valuecoders.com
  • www.facebook.com/valuecoders
  • www.twitter.com/valuecoders
  • www.linkedin.com/valuecoders
Write a Comment
User Comments (0)
About PowerShow.com