Title: Extended Bayesian Statistical Inference and Renormalization Group
1Extended Bayesian Statistical Inference and
Renormalization Group
- Toshiaki Aida, Faculty of Engineering, Okayama
University, Japan
The plan of our talk (1) We make it clear that
the notion of renormalization group is naturally
introduced in Bayesian framework of statistical
inference, which leads to the effectiveness of
the framework. (2) We extend the framework to
obtain better prediction performance. (3) We
report the result of a numerical analysis applied
to the extended framework.
21. Scaling notion in Bayesian framework
- Two examples of non-parametric models
- (1) Density estimation (Bialek, Callan and
Strong, 1996) -
- (2) Regression
- Here,
- is a function to be inferred.
- is a
distribution of given data. - Bayes formula enables us to predict in a
probabilistic sense. - (Posterior distribution)
3- According to the change of N, the terms
proportional to N behave - as follows
- i) -terms shift the expectation value of the
function . - ii) -terms set a long-distance length scale
(or bin size) within - which a datum can affect the function .
- (1) Density estimation
-
- (2) Regression
The number of effective degrees of freedom is
naturally reduced in Bayesian framework.
4- The connection to renormalization group
- i) sets a long-distance cutoff scale in
momentum space. - ii) The response of a system under the change of
the scale , - induced by the change of N , is crucial to
inference problems. - Such a response is described by a type of
renormalization group eq., - which is called Exact renormalization group
equation.
(1) When the number of examples N is small,
(2) When the number of examples N is large,
52. Prediction in Bayesian framework and its
extension
- Our problem
- After we have observed data
, which are generated - independently of each other for
according to a - probability determined by a
function , we want to - predict an output for a new input
. - Predictive distribution in Bayesian framework
- The average of over the
posterior - Here, is a prior distribution for a
function .
6- The performance for prediction
- The Kullback-Leibler divergence between the
predictive distribution - and the true one, averaged over all the data and
the true function. - The relation to RG can be easily seen by our
rewriting this as follows. - Here, is a difference equation for
effective action defined by - and is the solution of the equation
, and is known to - be equal to the expectation value of .
7- The RG equation in statistical inference
- The difference equation
is a kind of - exact renormalization group equation, because the
increase of the - number of examples N leads to the increase of
the long-distance - cutoff scale, as is discussed previously.
- Predictive distribution in extended Bayesian
framework - Apart from Bayesian framework, if we introduce
a scaling part - to the prior distribution ,
- we are able to obtain better prediction
performance. - This generalization is natural from the point of
view of renormalization - group eq. .
8- The difference equation for the scaling part of a
prior distribution - In order to achieve better prediction
performance, our imposing that - leads to the difference equation for the scaling
part of the prior - distribution.
- This procedure is very effective especially in
non-parametric Bayesian - statistical inference, because is
much bigger than in parametric - cases.
- We can obtain a universal form of the
difference equation of in - asymptotic regime .
- Also, we report the numerical result applied to
a density estimation - problem etc..