Title: Machine Learning Using Spark Online Training
1- MACHINE LEARNING USING SPARK
2- The following topics will be covered in our
- Machine Learning Using Spark
- Online Training
3What is Machine Learning?
- Machine learning Using Spark-Spark MLlib is an
application of artificial intelligence (AI) that
provides systems the ability to automatically
learn and improve from experience without being
explicitly programmed. Machine learning focuses
on the development of computer programs that can
access data and use it learn for themselves.
4Into to Machine Learning Using Spark
- MLlib is Sparks machine learning (ML) library.
Its goal is to make practical machine learning
scalable and easy. At a high level, it provides
tools such as - ML Algorithms common learning algorithms such as
classification, regression, clustering, and
collaborative filtering - Featurization feature extraction,
transformation, dimensionality reduction, and
selection - Pipelines tools for constructing, evaluating,
and tuning ML Pipelines - Persistence saving and load algorithms, models,
and Pipelines - Utilities linear algebra, statistics, data
handling, etc.
5Tools
- This course will be delivered using Scala and
PYTHON API. For explaining statistical concept, R
language will also be using. Visualization part
will be covered using Bokeh/ggplot library.
6Introduction to Apache Spark
- Spark Programming model
- RDD and Data Frame
- Transformation and Action
- Broadcast and Accumulator
- Running HDP on local machine
- Launching Spark Cluster
7Basic Statistics
Mean, Mode, Media, Range, Variance, Standard Deviation, Quartiles, Percentiles Sampling Sampling Methods Sampling Errors Probability Distributions Normal distribution, t-distribution, Chi-square, F Margin of Error, Confidence Interval, Significance level, Degree of Freedom Hypothesis concept, Type I and Type II error P-value, t-Test, Chi-square Test Correlation Coefficient
8Machine Learning Using Spark
- Introduction to Spark MLlib
- Data types Vector, Labeled Point
- Feature Extraction
- Feature Transformation, Normalization
- Feature Selectors
- Locality Sensitive Hashing(LSH)
9Regression Analysis with Spark
- Types of Regression Models
- Gradient Descent
- Linear Regression, Generalized Linear Regression
- MSE, RMSE MAE, R-squared Coefficient
- Transforming the target variable
- Tuning Model Parameters
10Classification Model with Spark
Linear Models, Naives Bayes Model, Decision Tree Logistic Regression Linear Support Vector Machine Random Forest Gradient-Boosted Trees Training Classification Models Accuracy and prediction error Precision and Recall ROC curve and AUC Cross validation
11Clustering
- Hierarchical clustering
- K-mean clustering
12Dimensionality Reduction
- Principal Component Analysis
- Singular Value Decomposition
- Clustering as dimensionality reduction
- Training a dimensionality reduction model
- Evaluating dimensionality reduction models
13Recommendation Engine
- Content based filtering
- Collaborative based filtering
- Overview of Movie Lens data
- Training a recommendation model
- Using the recommendation model
- Performance Evaluation
14Text Processing
Feature Hashing TF-IDF model Tokenization Stop words TF-IDF Weightings Training a TF-IDF model Usage of TF-IDF model Evaluating TF-IDF models
15Prerequisites
- Prior understanding of exploratory data analysis
and data visualization will help immensely in
learning machine learning concept and
applications. This include basic statistical
technique for data analysis. Having some
knowledge of R programming or some Python
packages like sci-kit, numpy will be useful.
However , we are going to cover basic statistics
technique as part of this course before going
deep into machine learning . This will help
everyone to gain maximum from this course.
16(No Transcript)