Title: Consumer%20Behavior%20Prediction%20using%20Parametric%20and%20Nonparametric%20Methods
1Consumer Behavior Prediction using Parametric and
Nonparametric Methods
- Elena Eneva
- Carnegie Mellon University
- 25 November 2002
eneva_at_cs.cmu.edu
2Recent Research Projects
- Dimensionality Reduction Methods and Fractal
Dimension (with Christos Faloutsos) - Learning to Change Taxonomies (with Valery
Petrushin, Accenture Technology Labs) - Text Re-Classification Using Existing Schemas
(with Yiming Yang) - Learning Within-Sentence Semantic Coherence
(with Roni Rosenfeld) - Automatic Document Summarization (with John
Lafferty) - Consumer Behavior Prediction (with Alan
Montgomery Business school and Rich Caruana
SCS)
3Outline
- Introduction Motivation
- Dataset
- Baseline Models
- New Hybrid Models
- Results
- Summary Work in Progress
4How to increase profits?
- Without raising the overall price level?
- Without more advertising?
- Without attracting new customers?
5A Better Pricing Strategies
- Encourage the demand for products which are most
profitable for the store - Recent trend to consolidate independent stores
into chains - Pricing doesnt take into account the variability
of demand due to neighborhood differences.
6A Micro-Marketing
- Pricing strategies should adapt to the
neighborhood demand - The basis the difference in interbrand
competition in different stores - Stores can increase operating profit margins by
33 to 83 Montgomery 1997
7Understanding Demand
- Need to understand the relationship between the
prices of products in a category and the demand
for these products - Price Elasticity of Demand
8Price Elasticity
consumers response to price change
Q is quantity purchased P is price of product
9Prices and Quantities
- Q demanded of a specific product is a function of
the prices of all the products in that category - This function is different for every store, for
every category
10The Function
Need to multiply this across many stores, many
categories.
11How to find this function?
- Traditionally using parametric models (linear
regression)
12Data Example
13Data Example Log Space
14The Function
Need to multiply this across many stores, many
categories.
15How to find this function?
- Traditionally using parametric models (linear
regression) - Recently using non-parametric models (neural
networks)
16Our Goal
- Advantage of LR known functional form (linear in
log space), extrapolation ability - Advantage of NN flexibility, accuracy
17Evaluation Measure
- Root Mean Squared Error (RMS)
- the average deviation between the true quantity
and the predicted quantity
18Error Measure Unbiased Model
but
by computing the integral over the distribution
is a biased estimator for q, and we correct the
bias by using
- which is an unbiased estimator for q.
19Dataset
- Store-level cash register data at the product
level for 100 stores - Store prices updated every week
- Two Years of transactions
- Chilled Orange Juice category (12 Products)
20Models
- Hybrids
- Smart Prior
- MultiTask Learning
- Jumping Connections
- Frozen Jumping Connections
- Baselines
- Linear Regression
- Neural Networks
21Baselines
- Linear Regression
- Neural Networks
22Linear Regression
- q is the quantity demanded
- pi is the price for the ith product
- K products overall
- The coefficients a and bi are determined by the
condition that the sum of the square residuals is
as small as possible.
23Linear Regression
24Results - RMS Error
RMS
25Neural Networks
- Generic nonlinear function approximators
- Collection of basic units (neurons), computing a
(non)linear function of their input - Random initialization
- Backpropagation
- Early stopping to prevent overfitting
26Neural Networks
1 hidden layer, 100 units, sigmoid activation
function
27Results RMS
RMS
28Hybrid Models
- Smart Prior
- MultiTask Learning
- Jumping Connections
- Frozen Jumping Connections
29Smart Prior
- Idea Initialize the NN with a good set of
weights help it start from a smart prior. - Start the search in a state which already gives a
linear approximation - NN training in 2 stages
- First, on synthetic data (generated by the LR
model) - Second, on the real data
30Smart Prior
LR
31Results RMS
RMS
32Multitask Learning
Caruana 1997
- Idea learning an additional related task in
parallel, using a shared representation - Adding the output of the LR model (built over the
same inputs) as an extra output to the NN - Make the NN share its hidden nodes between both
tasks
33MultiTask Learning
- Custom halting function
- Custom RMS function
34Results RMS
RMS
35Jumping Connections
- Idea fusing LR and NN
- Modify architecture of the NN
- Add connections which jump over the hidden
layer - Gives the effect of simulating a LR and NN
together
36Jumping Connections
37Results RMS
RMS
38Frozen Jumping Connections
- Idea show the model what the jump is for
- Same architecture as Jumping Connections, but two
training stages - Freeze the weights of the jumping layer, so the
network cant forget about the linearity
39Frozen Jumping Connections
40Frozen Jumping Connections
41Frozen Jumping Connections
42Results RMS
RMS
43Models
- Hybrids
- Smart Prior
- MultiTask Learning
- Jumping Connections
- Frozen Jumping Connections
- Baselines
- Linear Regression
- Neural Networks
- Combinations
- Voting
- Weighted Average
44Combining Models
- Idea Ensemble Learning
- Use all models and then combine their
predictions - Committee Voting
- Weighted Average
- 2 baseline and 3 hybrid models
- (Smart Prior, MultiTask Learning, Frozen Jumping
Conections)
45Committee Voting
- Average the predictions of the models
46Results RMS
RMS
47Weighted Average Model Regression
- Optimal weights determined by a linear regression
model over the predictions
48Results RMS
RMS
49Normalized RMS Error
- Compare model performance across stores with
different - Sizes
- Ages
- Locations
- Need to normalize
- Compare to baselines
- Take the error of the LR benchmark as unit error
50Normalized RMS Error
51Summary
- Built new models for better pricing strategies
for individual stores, categories - Hybrid models clearly superior to baselines for
customer choice prediction - Incorporated domain knowledge (linearity) in
Neural Networks - New models allow stores to
- price the products more strategically and
optimize profits - maintain better inventories
- understand product interaction
www.cs.cmu.edu/eneva
52References
- Montgomery, A. (1997). Creating Micro-Marketing
Pricing Strategies Using Supermarket Scanner Data - West, P., Brockett, P. and Golden, L (1997) A
Comparative Analysis of Neural Networks and
Statistical Methods for Predicting Consumer
Choice - Guadagni, P. and Little, J. (1983) A Logit Model
of Brand Choice Calibrated on Scanner data - Rossi, P. and Allenby, G. (1993) A Bayesian
Approach to Estimating Household Parameters
53Work In Progress
- analyze Weighted Average model
- compare extrapolation ability of new models
- Other MTL tasks
- shrinkage model a super store model with
data pooled across all stores - store zones
54On one hand
In log space, Price-Quantity relationship is
fairly linear
55On the other hand
- the derivation of consumers' demand responses to
price changes without the need to write down and
rely upon particular mathematical models for
demand
56The Model
Need to multiply this across many stores, many
categories.
57Problem Definition
- For a set of products
- Given the price distribution
- Predict the consumption distribution
- Change in price of one product affects the
consumption of all other products
58Assumptions
- Independence
- Substitutes fresh fruit, other juices
- Other Stores
- Stationarity
- Change over time
- Holidays
59The Most Important Slide
- for this presentation and the paper
- www.cs.cmu.edu/eneva/
- eneva_at_cs.cmu.edu
60Converting Predictions to Original Space