Title: Sales%20Forecasting%20using%20Dynamic%20Bayesian%20Networks
1Sales Forecasting using Dynamic Bayesian
Networks
- Steve Djajasaputra
- SNN Nijmegen
- The Netherlands
2Table of Content
- Why Sales Forecasting?
- Method
- Results Discussions
- Conclusions
- Further Research
- Acknowledgements
31. Why Sales Forecasting?
- Sales Forecasting bring advantage for your
business - reducing logistic cost
- improving your services
- targeted marketing
- lower backorder
But in practice is this really happening?
4The Answer is YES! An Example of Success Story
- Bayesian statistical technology for predicting
newspaper sales - 1 to 4 more sales with same deliveries
- 3 to 12 less deliveries to achieve same total
amount of sales
5But time-series forecasting is not always easy!
- So.
- Searching better forecasting technology
- Aggregation of different group of products can be
helpful - Clustering methodology for aggregation
- Bayesian Methodology generative model
62. Method
- Dynamic Bayesian Networks
- Forecasting
- The Inputs
7Dynamic Bayesian Networks
- Y is our observation
- e.g. sales of different products beer y1, beer
y2, - X-axis the time t (e.g.weeks)
8Dynamic Bayesian Networks
- In our model, we assume that our observation Y is
generated with this dynamic - X are inputs, for example sales of bier last
week, weather information, prices, day labeling - ? are hidden variables, which are
unobserved/unknown - ? is noise N(0,?2)
9Dynamic Bayesian Networks
- Hierarchical model
- Our hidden variables ? depend on other
unobserved/unknown hidden variables M.
- Several ? from different product share the same
M. - A is a transition matrix for ?
- G is a transition matrix for M
- ? is noise N(0, ? )
- ? is noise N(0, ?M )
10Dynamic Bayesian Networks
- Inference Learning
- We have Y X data in our model
- But we dont know the values of hidden variables
?, M and their initial values - We also dont know the correct value of
parameters ?, ?M ,A,G and their initial values - We solve these problems in Bayesian paradigm,
using EM Algorithm.
11Forecasting
- Steps
- Training step find the model parameters and
hidden variables ? 1T given the data from
observation window X1T,Y1T, using EM algorithm
and Kalman smoothing. - Forecasting step predict ?Th and YTh h is
the horizon of forecasting - Updating step update the hidden variables ?1Th
given the real value YTh - Repeat the forecasting updating steps above in
iterations.
12The Inputs
- By Autocorrelation FFT Spectrum analysis, for
inputs (Xi,t) I decided to use - Seasonality markers
- Recent sales (1 week ago)
- Last month sales (4 weeks ago)
- We need to keep the number of inputs as small as
possible to avoid over-fitting. - Since I consider seasonality recent sales, my
model is somewhat comparable with SC model
which is used by Pim Ouwehand.
133. Results Discussions
- An Example of Result
- Residual Analysis
- Nonlinear Transformation
- The Offset Problem
- Removing Outliers
- Our Bayesian Approach vs Conventional Econometric
Methods - Need More Informative Inputs
14An Example of Result
- Mean Absolute Deviation (MAD) is 2346 beers
- Y-axis
- O is the real value
- X is the prediction
- X-axis weeks
- Training steps week 5..204
- Forecasting steps week 205..260, 1 week horizon
- This result is about the range of Winter method
used by Pim Ouwehand.
15Residual Analysis
- To validate our model.
- Its showed that the residues (error) are noise
as we assumed. - Y predicted vs Error (figure on top)
- Error vs time (figure on bottom)
- Autocorrelation and FFT of Error
- Cross correlation Error vs Inputs
16Nonlinear Transformation
- To make data more linear gaussian since we
assume our model is linear and the data is
assumed to be gaussian distributed. - e.g. Log, Sigmoid
17The Offset Problem
- Due to the stationary assumption, the software
gives over(under)estimated forecasting if the
trend is exist. - Solutions
- Removing trend (e.g. taking difference)
- Updating the parameters after forecasting step.
- Legend
- Left moving averaged Beer-2 vs weeks
- Right Predicted Beer-2 vs weeks
18Removing Outliers
- The plot shows that the data is very noisy.
- Most of the outliers are below the mean, perhaps
due to out of stock problem. Thus it will be
helpful if we can get out of stock label for
input in our forecasting model.
Sales of 10 beers (normalized) vs weeks
19Our Bayesian Approach vs Conventional Econometric
methods
- Econometric regression methods (e.g. Winter
Method used by Pim Ouwehand) works well to fit
the data.
20- However, we dont want just do fitting the data.
We want to understand the process behind the data
that we observed (i.e. hidden/unobserved
variables). - We want to have a generative model of the beer
buyers. - This generative model helps you to understand the
hidden process in the market. This is a
valuable insight for business decision, e.g. by
simulation.
21We need More Informative Inputs
224. Conclusions
- This preliminary research (only with the sales
data without other informative inputs) showed
that the result is about in the range of Winter
method. - We need more informative input data for a better
model. - Hacking data (e.g. removing trend, nonlinear
transformation) slightly improves the result. But
this is not the main purpose of this research.
23Conclusions continued
- We are not only just fitting the data but
constructing a generative model, which is useful
for understanding business process behind the
sales. - This understanding help you to shape your
strategy to achieve more profit.
245. Further Research
- Clustering and Structural Learning
- Non stationary process
- Non linear model
- Approximations
- Variational
- Factorial
- Monte Carlo (MCMC)
256. Acknowledgements
- Our sponsor STW
- Tom Heskes (KUN)
- Pim Ouwehand (TUE)
- Bart Bakker (Phillips, was in KUN)
- Data providers/ Business Partners Schuitema,
Technie Unie, OPG.
26Appendix Clustering Insights
27- On hidden variables
- 1,2seasonality
- 3last month
- 4last week