Sieci neuronowe - PowerPoint PPT Presentation

About This Presentation
Title:

Sieci neuronowe

Description:

Sieci neuronowe bezmodelowa analiza danych? K. M. Graczyk IFT, Uniwersytet Wroc awski Poland Abstract Podczas seminarium opowiem o zastosowaniu jednokierunkowych ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 69
Provided by: Krz71
Category:

less

Transcript and Presenter's Notes

Title: Sieci neuronowe


1
Sieci neuronowe bezmodelowa analiza danych?
  • K. M. Graczyk
  • IFT, Uniwersytet Wroclawski
  • Poland

2
Abstract
  • Podczas seminarium opowiem o zastosowaniu
    jednokierunkowych sieci neuronowych do analizy
    danych eksperymentalnych. W szczególnosci skupie
    uwage na podejsciu bayesowskim, które pozwala na
    klasyfikacje i wybór najlepszej hipotezy
    badawczej. Metoda ta ma w naturalny sposób
    wbudowane tzw. kryterium brzytwy Ockhama,
    preferujace modele o mniejszym stopniu
    zlozonosci. Dodatkowym atutem podejscia jest brak
    wymogu uzywania tzw. zbioru testowego do
    weryfikacji procesu uczenia.
  • W drugiej czesci seminarium omówie wlasna
    implementacje sieci neuronowej, zawierajaca
    metody uczenia bayesowskiego. Na zakonczenie
    pokaze moje pierwsze zastosowania w analizie
    danych rozproszeniowych.

3
Why Neural Networks?
  • Look at Electromagnetic Form Factor data
  • Simple
  • Strightforward
  • Then attac more serious problems
  • Inspired by C. Giunti (Torino)
  • Papers of Forte et al.. (JHEP 0205062,200, JHEP
    0503080,2005, JHEP 0703039,2007,
    Nucl.Phys.B8091-63,2009).
  • A kind of model independet way of fitting data
    and computing assiosiated uncertienty.
  • Cooperation with R. Sulej (IPJ, Warszawa) and P.
    Plonski (Politechnika Warszawska)
  • NetMaker
  • GrANNet ) my own C library

4
Road map
  • Artificial Neural Networks (NN) idea
  • FeedForward NN
  • Bayesian statistics
  • Bayesian approach to NN
  • PDFs by NN
  • GrANNet
  • Form Factors by NN

5
Inspired by Nature
6
Aplications, general list
  • Function approximation, or regression analysis,
    including time series prediction, fitness
    approximation and modeling.
  • Classification, including pattern and sequence
    recognition, novelty detection and sequential
    decision making.
  • Data processing, including filtering, clustering,
    blind source separation and compression.
  • Robotics, including directing manipulators,
    Computer numerical control.

7
Artificial Neural Network
Output, target
Input layer
Hidden layer
8
threshold
9
A map from one vector space to another
10
Neural Networks
  • The universal approximation theorem for neural
    networks states that every continuous function
    that maps intervals of real numbers to some
    output interval of real numbers can be
    approximated arbitrarily closely by a multi-layer
    perceptron with just one hidden layer. This
    result holds only for restricted classes of
    activation functions, e.g. for the sigmoidal
    functions. (Wikipedia.org)

11
Feed-Forward-Network
activation function
  • Heavside function q(x)
  • ? 0 or 1 signal
  • Sigmoid function
  • Tanh()

12
architecture
  • 3 -layers network, two hidden
  • 1211
  • 221 121 par9

Bias neurons, instead of thresholds
G(Q2)
Q2
Linear Function
Symmetric Sigmoid Function
13
Supervised Learning
  • Propose the Error Function (Standard Error
    Function, chi2, etc, , any continous function
    which has a global minimum)
  • Consider set of the data
  • Train given network with data ? marginalize the
    error function
  • Back propagation algorithms
  • Iterative procedure which fixes weights

14
Learning
  • Gradient Algorithms
  • Gradient descent
  • QuickProp (Fahlman)
  • RPROP (Ridmiller Braun)
  • Conjugate gradients
  • Levenberg-Marquardt (hessian)
  • Newtonian method (hessian)
  • Monte Carlo algorithms (based on the Marcov chain
    algorithm)

15
Overfitting
  • More complex models describe data in better way,
    but lost generalities
  • bias-variance trade-off
  • After fitting one needs to compare with the test
    set (must twice larger than original)
  • Overfitting ? large values of the wigths
  • Regularization ? additional penalty term to error
    function

16
Fitting data with Artificial Neural Networks
  • The goal of the network training is not to learn
    on exact representation of the training data
    itself, but rather to built statistical model for
    the process which generates the data
  • C. Bishop, Neural Networks for Pattern
    Recognation

17
Parton Distribution Function with NN
  • Some method but

18
Parton Distributions Functions S. Forte, L.
Garrido, J. I. Latorre and A. Piccione, JHEP 0205
(2002) 062
  • A kind of model independent analysis of the data
  • Construction of the probability density PG(Q2)
    in the space of the structure functions
  • In practice only one Neural Network architecture
  • Probability density in the space of parameters of
    one particular NN

But in reality Forte at al.. did
19
The idea comes from W. T. Giele and S. Keller
Training Nrep neural networks, one for each set
of Ndat pseudo-data
The Nrep trained neural networks ? provide a
representation of the probability measure in the
space of the structure functions
20
uncertainty
correlation
21
10, 100 and 1000 replicas
22
short
enough long
too long
30 data points, overfitting
23
(No Transcript)
24
My criticism
  • Artificial data, and chi2 error function ?
    overestimate error function?
  • Do not discuss other architectures?
  • Problems with overfitting?

25
Form Factors with NN, done with FANN library
  • Applying Forte et al..

26
How to apply NN to the ep data
  • First stage checking if the NN are able to work
    on the reasonable level
  • GE and GM and Ratio separately
  • Input Q2 ? output Form Factor
  • The standard error function
  • GE 200 points
  • GM 86 points
  • Ratio 152 points
  • Combination of the GE, GM, and Ratio
  • Input Q2 output GM and GE
  • The standard error function a sum of three
    functions
  • GEGMRatio around 260 points
  • One needs to constrain the fits by adding some
    artificial points with GE(0)GM(0)/mp1

27
GMp
28
GMp
29
GMp
Neural Networks
Fit with TPE (our work)
30
GEp
31
GEp
32
GEp
33
Ratio
34
GEn
35
GEn
36
GEn
37
GMn
38
GMn
39
Bayesian Approach
  • common sense reduced to calculations

40
Bayesian Framework for BackProp NN, MacKay,
Bishop,
  • Objective Criteria for comparing alternative
    network solutions, in particular with different
    architectures
  • Objective criteria for setting decay rate a
  • Objective choice of reularising function Ew
  • Comparing with test data is not requiered.

41
Notation and Conventions
42
Model Classification
  • A collection of models, H1, H2, , Hk
  • We belive that models are classified by P(H1),
    P(H2), , P(Hk) (sum to 1)
  • After observing data D ? Bayes rule ?
  • Usually at the beginning P(H1)P(H2) P(Hk)

43
Single Model Statistics
  • Assume that model Hi is correct one
  • The neural network A with weights w is considered
  • Task 1 Assuming some prior probability of w,
    construct Posterior after including data

44
Hierarchy
45
Constructing prior and posterior function
Weight distribution!!!
likelihood
Prior
Posterior probability
w0
46
Computing Posterior
hessian
Covariance matrix
47
How to fix proper a
  • Two ideas
  • Evidence Approximation (MacKay)
  • Hirerchical
  • Find wMP
  • Find aMP
  • Perform analitically integrals over a

If sharply peaked!!!
48
Getting aMP
The effective number of well-determined parameters
Iterative procedure during training
49
Bayesian Model Comparison Occam Factor
Occam Factor
  • The log of Occam Factor ? amount of
  • Information we gain after data have arraived
  • Large Occam factor ?? complex models
  • larger accesible phase space (larger range of
    posterior)
  • Small Occam factor ?? simple models
  • larger accesible phase space (larger range of
    posterior)

Best fit likelihood
50
Evidence
51
What about cross sections
  • GE and GM simultaneously,
  • Input Q2 and e ? cross sections
  • Standard error function
  • the chi-2-like function, with the covariance
    matrix obtained from the Rosenbluth separation
  • Possibilities
  • The set of Neural Networks becomes a natural
    distribution of the differential cross sections
  • One can produce artificial data in the wide range
    of the epsilon and perform the Rosenbluth
    separation, searching the nonlinearities of sR in
    the epsilon dependence.

52
What about TPE?
  • Q2, epsilon ? GE, GM and TPE?
  • In the perfect case the change of the epsilon
    should not affect the GE and GM.
  • training by the NN by series of the artificial
    cross section data with fixed epsilon?
  • Collecting data in the epsilon bins, and Q2 bins,
    then showing network the set of data with
    particular epsilon in the wide range of Q2.

53
constraining error function
every cycle computed with different epsilon!
54
One network!
GM
GE
TPE
Yellow line have the vanishing weights they do
not transfer signal
55
(No Transcript)
56
GEn
57
GEn
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
GMn
  • results

62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
GEp
  • ddd

66
(No Transcript)
67
GMp
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com