Email:%20zbxu@mail.xjtu.edu.cn - PowerPoint PPT Presentation

About This Presentation
Title:

Email:%20zbxu@mail.xjtu.edu.cn

Description:

Title: Author: zbxu Last modified by: zbxu Created Date: 10/13/2000 12:53:20 AM Document presentation format – PowerPoint PPT presentation

Number of Views:357
Avg rating:3.0/5.0
Slides: 34
Provided by: zbxu
Category:

less

Transcript and Presenter's Notes

Title: Email:%20zbxu@mail.xjtu.edu.cn


1
?????????????
??? (??????) Email zbxu_at_mail.xjtu.edu.cn
?? http//zbxu.gr.xjtu.edu.cn
2
? ?
  • ???????????
  • ????????????
  • ??????????????
  • ???????????

3
A New Learning Paradigm LtDAHP(Learning through
Deterministic Assignment of Hidden Parameters)
Zongben Xu (Xian Jiaotong University,
Xian, China) Email zbxu_at_mail.xjtu.edu.cn
Homepage http//zbxu.gr.xjtu.edu.cn
4
  • A supervised learning problem difficult or easy?
  • Can a difficult learning problem be solved more
    simply?
  • Is a linear machine universal?

5
Outline
  • Some Related Concepts
  • LtRAHP Learning through Random Assignment of
    Hidden Parameters
  • LtDAHP Learning through Deterministic Assignment
    of Hidden Parameters
  • Concluding Remarks

6
Outline
  • Some Related Concepts
  • LtRAHP Learning through Random Assignment of
    Hidden Parameters
  • LtDAHP Learning through Deterministic Assignment
    of Hidden Parameters
  • Concluding Remarks

7
Some Related Concepts Supervised Learning
Supervised Learning Given a finite number of
input/output samples, to find a function f in a
machine H that approximates the unknown relation
between the input and output spaces.
Black box
ERM
Social Network
Face Recognition
Stock Index Tracking
8
Some Related Concepts HP vs BP
  • Hidden Parameter Determine the hidden predictors
    (non-linear mechanism).
  • Bright Parameter Determine how the hidden
    predictors are linearly
  • combined
    (linear mechanism)

Machine
FNNs
9
Some Related Concepts OSL vs TSL
One-Stage Learning HPs and BPs are trained
simultaneously in one stage. Two-Stage Learning
HPs and BPs are trained separately in two
stages.
Machine
Hidden parameter
Bright parameter
10
Some Related Concepts Main Concerns
Q1 How to specify assign function?
  • Tassign(a)
  • Tassign(µ) random assignment
  • Tassign(n) deterministic assignment

ADM
LtRAHP
LtDAHP
Q2 Can TSL work?
  • Universal approximation?
  • Does it degrade the generalization ability?
  • Consistency/Convergence ?
  • Effectiveness Efficiency?

11
Outline
  • Some Related Concepts
  • LtRAHP Learning through Random Assignment of
    Hidden Parameters
  • LtDAHP Learning through Deterministic Assignment
    of Hidden Parameters
  • Concluding Remarks

12
LtRAHP An Overview
Random vector functional-link networks (RVFLs)
(Y. H. Pao, Adaptive Pattern Recognition and
Neural Networks, Reading, MA Addison-Wesley,
1989)
LtRAHP Typicals
Echo-state neural networks (ESNs)
(H. Jaeger and H. Haas. Harnessing nonlinearity
Predicting chaotics systems and saving energy in
wireless communication. Science, 304 78-80,
2004.)
Extreme learning machine (ELM)
(G. B. Huang, Q. Y. Zhu and C. K. Siew. Extreme
learning machineTheory and applications.
Neurocomputing, 70 489-501, 2006.)
Stage 1
LtRAHP Training
Stage 2
Random assignment
13
LtRAHP Experimental Evidences
Experimental Support Huang et al. 2006
Training time
TestRMSE of UCI data
Data sets BP SVM ELM
Trianzines 0.5484 0.0086 lt10-4
Housing 6.532 74.184 1.1177
Abalone 1.7562 1.6123 0.0125
Airelone 2.7525 0.6726 0.0591
Census 8.0647 11.251 1.0795
Data sets BP SVM ELM
Trianzines 0.2197 0.1289 0.2002
Housing 0.1285 0.1180 0.1267
Abalone 0.0874 0.0784 0.0824
Airelone 0.0481 0.0429 0.0431
Census 0.0685 0.0746 0.0660
Application Support
Object Recognition Xu et al. 2012
Handwritten Character Recognition Chacko et al.
2012
Face Recoginition Marques et al. 2012
14
LtRAHP Really feasible?
A Precise Theoretical Assessment (Xu el al, 2014)
Prior Prior FNN Learning ELM Learning
Approximation Density universal universal
Approximation Complexity
Approximation Complexity
Generalization Consistence universal universal
Generalization Learning rate
Generalization Learning rate
Computational Complexity Computational Complexity very high

Xia Liu, ShaoBo Lin and Zongben Xu, Is extreme
learning machine feasible? A theoretical
assessment (Part I, Part II), IEEE TNNLS, 2014.
15
LtRAHP Uncertainty Problem
  • Difference in Theoretical Assertions

OSL

LtRAHP
If HPs are randomly assigned according
to
uncertainty
16
LtRAHP Uncertainty Problem
  • Experimental Implication

Number of samples (m)
Number of samples (m)
Number of hidden nodes (N)
Number of hidden nodes (N)
Uncertainty
Non-uncertainty
17
LtRAHP Uncertainty Problem
Is there other TSL scheme which has the same
complexity with LtRAHP while the uncertainty
problem does not occur?
18
Outline
  • Some Related Concepts
  • LtRAHP Learning through Random Assignment of
    Hidden Parameters
  • LtDAHP Learning through Deterministic Assignment
    of Hidden Parameters
  • Concluding Remarks

19
LtDAHP Main Idea

Uniformly Random Assignment

Deterministically assign HPs as Equally Spaced
Points (ESP)
The smallest ball containing at least two hidden
parameters
ESP

Points with mesh ratio Wendland, Scattered
Data Approximation, 2006
Can ESPs be practically constructed for any
subset in an arbitrarily high dimensional space?

The largest ball containing no hidden parameters
20
LtDAHP Mathematical Foundations (I)
Homeomorphism A continuous function between
two topological spaces that has a
continuous inverse.

ESP Decomposition
Y. Xu. Orthogonal polynomials and cubature
formulae on spheres and on balls. SIAM J. Math.
Anal, 1998.
Hard sphere problem
21
LtDAHP Mathematical Foundations (II)
  • Hard Sphere Problem Given an integer N, find a
    configuration
  • so as to
    maximize the smallest distances among
  • the points. W. Habicht and B. L. Van der
    Waerden, Math. Ann. 1951
  • Minimal Riesz t-Energy Configuration
  • Problem B. Dahlberg. Duke Math. J., 1978.

Smale's 7th problem How to solve the N-point
minimal Riesz t-energy over Sd-1in a polynomial
time for arbitrary N and t.
S. Smale., Mathematical problems for the next
century. Math. Intel., 1998.
22
LtDAHP Mathematical Foundations (II)
Minimal Riesz t-Energy (tgtd-1) Configuration
Problem can be approximately solved by
  • Equal-area partition (EAP)

D. Hardin and E. Saff, Discretizing manifold via
minimum energy points. Notices of Amer. Math.
Soc, 2004.
  • Recursive zonal sphere partition (RZSP)

P. Leopardi, Distributing points on the sphere
partitions, separation, quadrature and energy.
Doctoral dissertation, University of New South
Wales, 2007.

http//www.mathworks.com/matlabcentral/fileexchang
e/13356-eqsp-recursive-zonal-sphere-partitioning-t
oolbox
Computational Complexity
23
LtDAHP FNN Instance
Conventional FNNs
LtDAHP based FNNs
Architecture of LtDAHP
Architecture of FNN
24
LtDAHP Learning procedure (FNN instance)
LtDAHP Algorithm
Stage 1
Minimal Riesz (d-1)-energy points on Sd-1 (EZSP)
Best packing points on S1
Stage 2
Architecture of for LtDAHP
25
LtDAHP Theoretical assessment (FNN instance)
Generalization Capability
  • LtDAHP If , ,
    ,
  • OSL

26
LtDAHP Theoretical assessment (FNN instance)
Generalization Capability
  • LtDAHP If , ,
    ,
  • ELM If T is randomly fixed according to

Multiple times of trials are required
27
Number of samples (m)
Number of hidden nodes (N)
Number of samples (m)
Number of hidden nodes (N)
28
LtDAHP Toy simulations (FNN instance)
Test error
ELM (LtRAHP)
Number of samples (m)
Number of hidden nodes (N)
Training time
LtDAHP
Number of samples (m)
Number of hidden nodes (N)
29
LtDAHP Simulations on UCI data sets
Data sets Training samples Testing samples Attributes
Auto_Price 106 53 15
Stock 633 317 9
Bank(Bank8FM) 2999 1500 8
Delta_ailerons 3565 3564 5
Delta_Elevators 4759 4758 6
Data sets TestRMSE TestRMSE TestRMSE TrainMT TrainMT TrainMT MSparsity MSparsity MSparsity
Data sets SVM ELM LtDAHP SVM ELM LtDAHP SVM ELM LtDAHP
Auto_price 0.0427 0.0324 0.0357 160 3.22 3.22 116.2 240.1 72.2
Stock 0.0478 0.0347 0.0306 5.64 0.325 0.325 26.7 108.1 148.3
Bank8FM 0.0454 0.0446 0.0421 82.1 1.42 1.42 112.9 88.4 60.5
Delta_airelons 0.0422 0.0387 0.0399 60.1 2.32 2.32 169.3 56.2 48.1
Delta_Elevators 0.0534 0.0535 0.0537 684 3.10 3.10 597.6 52.6 52.1



30
LtDAHP Real world data experiments
Methods TestRMSE TrainMT Msparsity
ELM 10.89 1989 601
LtDAHP 9.21 1989 512
Million song dataset
Million Song Dataset (Bertin et al.,2011)
describes a learning task of predicting the year
in which a song is released based on audio
features associated with the song. The dataset
consists of 463, 715 training examples and 51,
630 testing examples with d90. Each example is
a song released between 1922 and 2011, and the
song is represented as a vector of timbre
information computed about the song.

31
LtDAHP Real world data experiments
Methods TestRMSE TrainMT Msparsity
ELM 0.0037 1523 534
LtDAHP 0.0017 1523 186
Buzz in Social Media
Buzz Prediction dataset is collected
from Twitter, a famous social network and a
micro-blogging platform with exponential growth
and extremely fast dynamics. The task is to
predict the mean number of active discussion
(NAD) from d77 primary features, including
number of created discussions, average number of
author interaction, average discussion length,
and etc. The dataset contains m583, 250
samples, and so a real large scale problem.

32
Concluding Remarks
  • LtDAHP provides a very efficient way of
    overcoming both high computation burden of OSL
    and the uncertainty difficulty in LtRAHP.
  • LtDAHP establishes a new paradigm in which
    supervised learning problems can be very simply
    but still effectively solved by preassigning the
    hidden parameters and solving the bright
    parameters only, while not sacrificing the
    generalization capability.
  • Many problems are still open on LtDAHP. Deserve
    further study.

33
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com