Big Data and Predictive Analytics - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Big Data and Predictive Analytics

Description:

Employed machine learning technologies over big data. Tesco Loyalty Program. Done by ... Implemented in LA by a joint initiative of Xerox and the LA transport department. – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0
Slides: 22
Provided by: ning440
Category:

less

Transcript and Presenter's Notes

Title: Big Data and Predictive Analytics


1
Big Data and Predictive Analytics
  • Unravel the BIG mystery

In God we trust, all others must bring data
Antarip Biswas Sept 26th 2013
2
Agenda / Table of Contents
3
Use Cases and Success Stories
4
Success Stories - FareCast
  • Air fare prediction
  • For an online airfare predicts whether the fare
    will go UP or DOWN or STAY SAME in the future
  • Acquired for 100M by Microsoft
  • Employed machine learning technologies over big
    data

5
Tesco Loyalty Program
  • Done by Dunnhumby
  • Data
  • Data for Loyalty Program
  • Basic demographic information such as address,
    age, gender, the number of members in a household
    and their ages, dietary habits.
  • Purchase history appended
  • Summary attributes
  • Cluster analysis
  • Crucible
  • a massive database of not only applicant
    information and purchase history, but also
    information purchased and collected elsewhere
    about participating consumers. Credit reports,
    loan applications, magazine subscription lists,
    Office for National Statistics, and the Land
    Registry are all sources of additional
    information that is stored in Crucible.

6
Tesco Loyalty Program - Benefits
1. Loyalty 2. Cross-sells 3. Inventory,
distribution and store network planning 4.
Optimal targeting and use of manufacturer
promotions 5. Consumer insight generation and
marketing those insights
Tesco has achieved a 3.6 factor increase in
coupon redemption rates by using big-data
predictive analytics to predict which consumers
are more likely to redeem which coupons !
7
Big Data Success Story
8
Netflix Recommendations
  • Existing recommendation system Cinematch
  • Korbell Team winner
  • 107 algorithms explored
  • Machine learning and Data mining
  • Employed SVD and RBM
  • Achieved 8.43 improvement in recommendations
    over existing system

9
Google Flu Spread Prediction
  • Prediction of the spread of flu in real time
    during H1N1 2009
  • Google tested a mammoth of 450 million different
    mathematical models to test the search terms,
    comparing their predictions against the actual
    flu cases 45 important parameters were founds
  • Model was tested when H1N1 crisis struck in 2009
    and gave more meaningful and valuable real time
    information than any public health official system

10
Prediction High Frequency Trading
  • Objective predict impact of earnings
    announcement on stock prices
  • Use historical financial data to get a time
    series of quarterly expected and actual earnings
    announcements
  • Use historical financial data of stock price
    movements after the announcement
  • Approach
  • Categorize stocks based on market capital so that
    similar sized companies are grouped together
  • Split the historical data into in sample
    (training) set and out sample (validation) set
  • Fit a linear regression model on sample data
    where the independent variable (feature) is the
    difference between the actual and estimated
    earnings, the dependent variable is the impact on
    stock price

Achieved return of 1 or 100 basis points
11
Predictive Analytics for Couponing
Run the same campaign on both lists
Test Group List of households from Analytic
engine
Control Group List of households getting the same
offer
Evaluate impact Control Group vs. Test Group
Measure results Redemption (primary), Clips
(secondary)
Verify efficacy of household recommendation
demonstrating significant variance from Control
Group
12
Improve Recommendations/Allocations
Customer deviation in buying behavior refined by
customer profile changes
Customer Transactions
Customer 360
Association Clustering
Time Series
  • Taxonomy based approach to identify business
    semantic
  • Major events that determine change in buying
    pattern Location change, change in marital
    status, change in income group, birth of child,
  • Source for this information social channels,
    purchase deviation,
  • Identify specific product categories relevant for
    the major event
  • Association of product categories to various
    customer classification
  • For instance customers with kids buy candies or
    customers with pets buy pet-food

Exploratory techniques
Cluster assignments
Products eligible for recommendation
Refine classifiers
Time specific product and associated prods
Customer groups based on classifications
Product classification and Customer segment
association
Products List For target customers cluster
Campaign results
Matching / Filtering
Probabilistic product affinities based on
segments behavior
Personalized Recommendation List
Target Recommendation
13
Improve Recommendations/Allocations
Products bought by similar customers, but not by
current customer
  • Identification of similar customers more
    accurately with availability of extensive profile
    information
  • Classification of customers by predetermined
    attributes
  • Usage of exploratory techniques to identify
    clusters of similar customers
  • Identify product propensity for specific segments
  • Determined by clustering and classification
    techniques

Customer Transactions
Customer 360 - NoSQL
Association Clustering
Exploratory techniques
Cluster assignments
Products eligible for recommendation
Refine classifiers
Segment specific Product lists
Customer groups based on classifications
Products List For target customers cluster
Campaign results
Matching / Filtering
Probabilistic product affinities based on
segments behavior
Personalized Recommendation List
Target Recommendation
14
Improve Recommendations/Allocations
Determine correlated items not bought by current
customer
Customer Transactions
Customer 360 - NoSQL
Association Clustering
Association rules
  • Link association to determine products that are
    bought together bread and butter, wine and
    cheese,
  • Identify products bought by customer, but not the
    correlated item
  • Recommendation based on absence of product

Exploratory techniques
Cluster assignments
Products eligible for recommendation
Refine classifiers
Segment specific product and associated prods
Customer groups based on classifications
Products List For target customers cluster
Campaign results
Matching / Filtering
Probabilistic product affinities based on
segments behavior
Personalized Recommendation List
Target Recommendation
15
Identify what customers want and when
Sample technique
Cross-tabulated data
  • Salary,
  • Zipcode,
  • No of kids,
  • House owner
  • Gender
  • Brand1, Brand2, Brandn
  • Weight, Size, Volume,
  • Brand
  • Category1, Categgory2, ..
  • Offer clipped category1,






Transaction details merged with customer data to
provide contextual information as required for
inference
Transaction details for filtered customer list
Buyers of Cat food/ Cat food Generic 4 oz
Affinity models
Models generated using historical data by the
analytic engine to identify affinity of specific
variables

























Associated Variables Single or multiple
variables by different segments
using multi-model approach
Prediction models
Application of variable affinity to customer list
to identify probability of non-purchasers to
purchase cat food / cat food Generic 4 oz





Customer list by probability
16
Contextualize information, correlate facts,
predict and improve
Information from multiple operational and data
warehousing systems that contain customer data,
purchase details,
Information from social channels that provide
supporting information to create detailed
customer profile
Rule sets from knowledgebase accumulated over the
years
Advanced Analytics - Product association
Filter





Customer list, probability
Buyer of Cat Food / Generic Cat food 4 ounce
Transaction details for this customer list
Filtered high vol. categories
Associated products by affinity confidence
Inferred rules
17
Obama for America Campaign 2012
Canvassing from older generation
Canvassing from youth
18
Obama for America Campaign 2012
  •  Obama for America data science team used social
    media as a tool to efficiently recruit human
    resources it needed leading into the elections
    home stretch
  • Primary objective - determine who were the best
    messengers, who they might be able to persuade,
    and what actions they might be willing to take
  • Reason to harness social media -  
  • Youth majority unreachable on phone calls or
    neighborhood canvassing, but always connected to
    some form of social media
  • Optimize resources by enabling to transform voter
    intelligence to actionable intelligence.

19
Traffic Congestion Control
  • Big Data Analytics used for traffic congestion
    control
  • Enables travellers to plan their routes to their
    destinations
  • Enables traffic controllers to effectively route
    cars in order to avoid as much congestion as
    possible
  • Implemented in LA by a joint initiative of Xerox
    and the LA transport department

20
DNA Sequencing and Cancer Therapies
  • Previously small portions of peoples genes
    sequenced
  • Big Data technology enables entire DNA to be
    sequenced which is largely helpful for cancer
    patients
  • Enabled selecting therapies based on genetic
    markers and person-specific genetic makeup
  • If one treatment became ineffective due to cancer
    mutation, use different therapies based on other
    gene markers.
  • Steve Jobs one of the first people in the world
    to have entire DNA sequenced

21
Thank You
Write a Comment
User Comments (0)
About PowerShow.com