Title: Scala and Spark Training
1 2(No Transcript)
3 Scala and Spark Training What is
Scala? Scala and spark Training Scala is a
modern multi-paradigm programming language
designed to express common programming patterns
in a concise, elegant, and type-safe way. Scala,
the word came from Scalable Language, is a
hybrid functional programming language which
smoothly integrates the features of objected
oriented and functional programming languages and
it is compiled to run on the Java Virtual
Machine. Scala has been created by Martin Odersky
and released in 2003.
4Why Scala? There are the following reasons that
encourages Scala learning. Many existing
companies, who depend on Java for business
critical applications, are turning to Scala to
boost their development productivity,
applications scalability and overall
reliability. Scala is a type-safe JVM language
that incorporates both object oriented and
functional programming features into an extremely
concise, logical, simple and extremely powerful
language.
5Scala creates a better Java alternative by
remaining its syntax very close to the Java
language syntax, so that to minimize the learning
difficulty. Scala was created specifically with
the goal of creating a better language, in
contrast with those restrictive, overly tedious,
or frustrating features of Java. Scala is a much
cleaner and well organized language that is
ultimately easier to use and increases
productivity.
6What is Spark? Spark is a fast cluster computing
technology, designed for fast computation in
Hadoop clusters. It is based on Hadoop MapReduce
programming and it extends the MapReduce model to
efficiently use it for more types of
computations, like interactive queries and stream
processing. Spark uses Hadoop in two different
ways one is storage and another one
is processing. As Spark is having its own cluster
management computation, it uses Hadoop for
storage purpose only.
7Spark is one of Hadoops sub project developed in
2009 in UC Berkeleys AMP Lab by Matey Zaharia.
It was Open Sourced in 2010 under a BSD license.
It was donated to Apache software foundation in
2013, and now Apache Spark has become a top level
Apache project from Feb-2014.
8Why Spark? Spark was introduced by Apache
Software Foundation for speeding up the Hadoop
software computing process. The main feature of
Spark is its in-memory cluster computing that
highly increases the speed of an application
processing. Spark is designed to cover a wide
range of workloads such as batch applications,
iterative algorithms, interactive queries and
streaming applications by reducing the management
burden of maintaining separate tools.
9- Apache Spark also have the following features.
- Speed- Spark helps to run an application in
Hadoop cluster, up to 100 times faster in memory
and 10 times faster when running on disk by
reducing number of read/write operations to disk
and by storing the intermediate processing data
in memory.
10- Supports multiple languages- Spark comes up with
80 high-level operators for interactive querying
and provides application development with
built-in APIs in different languages in Java,
Scala, or Python. - Advanced Analytics- Spark not only supports Map
and reduce programming but it also supports SQL
queries, Streaming data, Machine learning (ML),
and Graph algorithms. -
11The following topics will be covered in our Scala
and Spark Training
Scala and Spark Training Introduction to
Scala Scala and spark Training Overview of
Scala Installing Scala Scala Basics IDE for
Scala Scala Worksheet
12Scala Programming Variables Methods Literals Re
served Words Operators Precedence Rules Operator
Associativity Ways of Executing a Scala Program
Expressions and Loops If Expression For
Expression Usage of yield keyword in For
Expression Exception handling with Try
Expression Match Expression While Loops Do-While
Loops
13Functions in Scala Methods Nested Methods First
class Function Higher Order Methods Function
Literal Partially Applied Function Tail
Recursion Closure
Currying Control Abstraction Call-by-name Vs
call-by-value Repeated Parameter passing
mechanism Named Parameter mechanism Default
parameter mechanism
14OOPs in Scala Classes Objects Defining a
Constructor Constructor Parameter Vs Class
Parameter Singleton Object Companion
Object Abstract Class Uniform Access
Principle Access Modifiers
Extending a Class Namespace in Scala Calling a
superclass Constructor Dynamic Binding in
Scala Final Member in Scala Class Scala Class
Hierarchy Object Equality in Scala Factory Design
Pattern in Scala
15Traits Introduction to Traits Inheritance in
Traits Mixing a Trait Trait Vs Class Ordered
Trait Example of Ordered Trait Stackable
Modification behaviour of Trait Example of
Stackable Modification Rules of mixing of
multiple traits
16Scala Programming Packaging Package Different
form of Scala Package Imports statement Different
form of Import Package Object Implicit Imports
17Case Class Pattern Matching Introduction to
Case Class Introduction to Pattern
Matching Example of Pattern Matching Wildcard
Pattern Constant Pattern Variable
Pattern Constructor Pattern Sequence
Pattern Tuple Pattern Type Pattern
Variable Binding Pattern Guard Sealed
Class Option Data Type Usage of Option Data
Type Pattern Usage Partial Function Case Class
and Partial Function Usage of Pattern in For
Expression
18Scala Collection Immutable and Mutable
collection Constructing object of Array, Set,
List, Tuple, Map Detailed Discussion of various
methods in List class and List Object List
Construction Basic Operations like head, tail, is
Empty on List List Pattern Example of using List
Pattern
Categories of methods in List First Order Methods
in List Higher Order Methods in List Map vs flat
Map Filtering a List Example of take While, drop
While, span, partition Predicates over
List Folding Over List Fold Left Vs Fold Right
19Scala and Spark Training Introduction to
Spark Introduction to Big Data Big Data
Problem Scale-Up Vs Scale-Out Architecture Charact
eristics of Scale-Out Introduction to Hadoop,
Map-Reduce and HDFS Introducing Spark
20Hortonworks Data Platform (HDP) using Virtual
box Importing HDP VM image using Virtual box on
local machine Configuring HDP Overview of Ambari
and its components Overview of services
configuration using Ambari Overview of Apache
Zeppelin Creating, importing and executing
notebooks in Apache Zeppelin
21IDEs for Spark Applications SBT and its
overview Intellij Eclipse Resolving dependencies
for Spark applications
22Spark Basics Spark Shell Overview of Spark
architecture Storage layers for Spark Initialize
a Spark Context and building applications Submitti
ng a Spark Application Use of Spark History Server
Spark Components Spark Driver Process Spark
Executor Spark Conf and Spark Context Spark
Session object Overview of spark-submit
command Spark UI
23RDDs Overview of RDD RDD and Partitions Ways of
Creating RDD RDD transformations and Actions Lazy
evaluation RDD Lineage Graph (DAG) Element wise
transformations Map Vs FlatMap Transformation
Set Transformation RDD Actions Overview of RDD
persistence Methods for persisting RDD Persisting
RDD with Storage option Illustration of Caching
on an RDD in DAG Removal of Cached RDD
24Pair RDDs Overview of Key-Value Pair RDD Ways of
creating Pair RDDs Transformations on Pair
RDD ReduceByKey(), FoldByKey(),MapValues(),
FlatMapValues(),keys() and Values()
Transformation Grouping, Joining, Sorting on Pair
RDD ReduceByKey() Vs GroupByKey() Pair RDD Action
25Launching Spark on cluster Configure and launch
Spark Cluster on Google Cloud Configure and
launch Spark Cluster on Microsoft Azure
Logging and Debugging a Spark Application Setting
up a window environment for executing Spark
Application using IDE Steps of using slf4j
logging mechanism in Spark Application Attaching
a debugger to Spark Application Example of
debugging a Spark application running inside a
cluster
26Spark Application Architecture Spark
Application Distributed Architecture Spark
Application submission Mode Overview of Cluster
Manager Example of using Standalone Cluster
Manager Driver and its responsibilities Overview
of Job, Stage and Tasks Spark Job
Hierarchy Executor Spark-submit command and
various submission options Yarn Cluster
Manager Yarn Architecture Client and Cluster
Deploy-mode
27Advance concepts in Spark Accumulator Broadcast
RDD partitioning Re-partition RDD Determining
RDD partitioner
28Spark SQL Introduction to SparkSQL Creating
SparkSession with Hive Support Data Frame Ways of
Creating Data Frame Registering a Data Frame as
View Data Frame Transformations API Data Frame
SQL statement Aggregate Operations Data Frame
Action Catalyst Optimizer Catalog API
29Limitation of Data Frame Introduction to
Dataset Introduction to Encoder Creating
Dataset Functional transformation on
Dataset Loading CSV, JSON, Parquet format file in
SparkSQL Loading and saving data from/in Hive,
JDBC, HDFS, Cassandra Introduction to
User-Defined-Function (UDF) Customizing a
UDF Usage of UDF in DataFrame Transformations
API Usage of UDF in Spark SQL statement Introducti
on to Window Function
Steps of defining a window function Illustration
of Window function usage Introduction to
UDAF Customizing a UDAF Illustration of
customized UDAF usage
30Spark Streaming Introduction to data
streaming Spark Streaming framework Spark
Streaming and Micro batch Introduction of
DStreams DStreams and RDD Word Count example
using Socket Text Stream Streaming with Twitter
feeds Setting up a Twitter App Resolving Twitter
dependency in Spark Streaming Application Steps
of creating Uber Jar
Example of extracting hashtags from tweet
data Troubleshooting Twitter Streaming issue in
Spark Application Steps of creating Spark
Streaming Application Architecture of Spark
Streaming Stateless Transformations
31Twitter Streaming examples using stateless
transformation Introduction to stateful
Transformations Window Transformations Window
Duration and Slide Duration Window
Operations Naive and inverse window reduce
operation Checkpoint Tracking State of an event
using updateStateByKey operation Interact
directly with RDD using transform ()
operation Example of HDFS file streaming Example
of Spark-Kafka interaction Saving DStreams to
external file system
32For more Training Information , Contact
Us Email info_at_learntek.org USA 1734 418
2465 INDIA 40 4018 1306
7799713624