How to Speed Up Ad-hoc Analytics with SparkSQL, Parquet, and Alluxio

About This Presentation
Title:

How to Speed Up Ad-hoc Analytics with SparkSQL, Parquet, and Alluxio

Description:

How to Speed Up Ad-hoc Analytics with SparkSQL, Parquet, and Alluxio. #Hadoop #Training #Bigdata www.peridotsystems.in –

Number of Views:22
Slides: 6
Provided by: Jaya001
Category:
Tags:

less

Transcript and Presenter's Notes

Title: How to Speed Up Ad-hoc Analytics with SparkSQL, Parquet, and Alluxio


1
Building and Deploying Custom Applications with
Apache Bigtop and Amazon EMR
2
What is Apache Bigtop?
  • Apache Bigtop is a community maintained
    repository that supports a wide range of
    components and projects, including, but not
    limited, to Hadoop, HBase, and Spark.
  • Bigtop supports various Linux packaging systems,
    such as RPM or Deb, to package applications and
    application deployment and configuration on
    clusters using Puppet.

3
Walkthrough
4
To create a Bigtop package for EMR, follow these
steps
  • Launch a development EMR cluster.
  • Clone the Bigtop public repository.
  • Add the application definition to bigtop.bom.
  • Create directories and configuration files for
    the application.
  • Create an RPM package.
  • Create a Yum repository.
  • Move the output repository to S3 to make it
    available for any new cluster where you want to
    install the new application.
  • Test the application.
  • Create a bootstrap script.
  • Launch an EMR cluster with the bootstrap script.

5
Peridot Systems Ground Floor, Kamatchi Krupa
Apts, No 84/8,Venkatarathinam main street, LB
Road, Venkatarathinam Nagar, Adyar, Chennai,
Tamil Nadu - 600020 EMAIL hr_at_peridotsystems.in PH
ONE 044 4211 5526 www.peridotsystems.in
Write a Comment
User Comments (0)
About PowerShow.com