Title: Environmental applications of neural networks
1Environmental applications of neural networks
2Forecast schema
PREDICTOR
- y(t, t-1,..) autoregressive terms (past values)
- u1,u2(t-?,t -?-1,..) exogenous terms (meteo
conditions, other pollutants,) - ? delay ( corrivation time, reaction time,)
3K steps recursive forecast
yt u1t-? yt-1 u1t-? -1 . yt-L1
PREDICTOR
- The forecasted value at a given time step becomes
the first autoregressive term for the next
forecast
u1t-?1 yt u1t-? . yt-L
PREDICTOR
K max min(?1, ?2.. ?M)1
4ARX linear models
X2
X1
AR
- n exogenous inputs, each related to a different
variable (2 in the example) - parameters estimation via linear least squares
5ARX with threshold (piecewise linear)
- Refinement of simple ARX different parameters
estimate for different system conditions
- Main drawback abrupt model switch in
correspondence of the thresholds, hardly
acceptable from a physical viewpoint - Possible solution combine (weighted sum) the
output of the predictors.
6A different modelling paradigmArtificial Neural
Networks (ANN)
- Combines the output of many linear models and
introduces non-linearity
7The biological inspiration (1)
8The biological inspiration (2)
- The brain is
- Intrinsically parallel among simple units (the
neurons) - Distributed
- Redundant and thus fault tolerant
- Adaptive (synapses are reinforced by learning)
- In the brain we have
- neurons 1011
- synapses per neurons 104
- 223 Km
9Artificial neurons (1)
10Artificial neurons (2)
weighting
non linearity
11Artificial neurons (3)
12Artificial neurons (4)
The final structure is thus
13Artificial Neural Network
hidden layer
- The structure may be replicated (more hidden
layers, more outputs) - As the information moves always ahead within the
network, such an architecture is called
feed-forward
14Artificial Neural Network (2)
An important result
A standard multilayer feedforward network with a
locally bounded piecewise continuous activation
function can approximate any continuous function
to any degree of accuracy if and only if the
network's activation function is not a
polynomial. Without the threshold, the last
theorem does not hold. (see
e.g. Leshno et al., Neural Networks, 1993)
This means that we can in principle substitute
any mapping (model) between some input variables
to some output variables by a neural network of
sufficient complexity with no loss of accuracy.
15Artificial Neural Network (3)
In practice, we have to fix the network structure
and weights in order to accomplish the task.
- We operate in the following way
- Fix a structure (architecture)
- Determine the weights to optimize the network
performances (training) - Test the generalization ability of the network
- We thus subdivide the available data into 3
different sets - Training set (to determine parameters)
- Testing set (to evaluate the architecture)
- Validation set (to test generalization ability)
16ANN training
Training the network means determining the values
of the weights wij (including bias).
- EXAMPLE (the perceptron binary output)
- Set arbitrary initial values for the weights and
compute the output - If the output is not correct, the weights are
adjusted according to the formula - wnew wold a(desired
output)input a is the learning rate - 1 0.5 0 0.2 1 0.8 1.3
- Assuming Output Threshold 1.2
- 1.3 gt 1.2
- If Output was supposed to be 0
- Assume a 1 and update the weights
- W1new 0.5 1(0-1)1 -0.5
- W2new 0.2 1(0-1)0 0.2 W3new 0.8
1(0-1)1 -0.2 - Iterate the process several times (epochs) till
the correct result is obtained.
17ANN training (2)
The most common algorithm for network training is
backpropagation.
- Measure the error is the most common way
- E (target output)2
- The partial derivatives of the error wrt the
weights can be computed - Calculation of these derivatives flows backwards
through the network, hence the name,
backpropagation - The weights are updated with the formula
(gradient) - wnew wold a ?E/?wold
(a learning rate) - The procedure is repeated through many epochs.
18ANN training (3)
- Problems with training
- a too small ? slow convergence
- a too large ? may not converge
- E depends on many variables ? local minima
- Too few epochs ? low precision
- Too many epochs ? overfitting (low generalization
ability) - Overfitting may also depend on redundant
structure (too many neurons)
19ANN training improvement
- The number of parameters (weights) can easily
reach some hundreds. - Overfitting problems can be reduced by early
stopping.
- At each epoch, compute the error on both the
training and validation set - Use the weights that minimize the error on the
validation set
selection
Epochs
20ANN training improvement (2)
Another possible improvement is PRUNING, i.e.
automatically reducing the network complexity
- Pruning provides a method to identify and remove
redundant/irrelevant parameters, thus reducing
the overfitting problems. - It also provides a framework for automatic
determination of a neural network (sub)optimal
architecture.
21ANN training improvement (2)
- Pruning is based on the concept of saliency of a
parameter, i.e.
- sj measures how much the training error would
increase, if parameter j is removed from the
network architecture - The parameter having the lowest saliency is
tentatively removed from the network architecture
22Pruning procedure
- Train of the fully connected (possibly
overfitted) neural network - Assess error on the training data set
- Evaluate the saliency of each parameter
- Removed the parameter with the lowest saliency
- The new architecture (1 paramater less than the
previous one) is trained, and back to point 2
parameters
23Flood forecastTagliamento case study
- Area 2480 km2
- Average Q 90 m3/sec
- Max flow (1966) 4000 m3/sec
- 5 rain gauges
- dataset 2000 hourly records (floods)
24Standard network
- Fully connected network (5 rain gauges if one
or more are not available during the flood, the
forecast cannot be issued) - Efficiency ranges is 0 if the prediction is
issued as the average of the time series it can
rise up to 1 for a perfect predictor. - Forecast efficiency 5h ahead 85
Rain gauges input terms with a certain delay
Autoregressive terms
25Pruned architecture
- Many connections removed
- 3 rain gauges only considered
- 5hours-ahead efficiency 84,5
26Results
Adding also an input with the total rainfall of
the preceding 5 days
27Conclusions of flood case study
- ANN allows better accuracy than linear ARX
- Pruning allows to detect a parameter
parsimonious, yet effective, architecture for the
neural network - Pruning allows to reduce the use of (redundant)
rain gauges, without worsening the predictive
accuracy (more robust measurement network)
28PM10 in Milan
- Significant reduction of the yearly average of
pollutants such as SO2, NOx, CO, TSP (-90, -50,
-65, -60 in the period 1989-2001). - A major concern is constituted by PM10. Its
yearly average is stable (about 45mg/m3) since
the beginning of the monitoring (1998). - The limit value on the daily average (50mg/m3) is
exceeded about 100 days every year. - The application prediction at 9 a.m. of the PM10
daily average concentration of the current (and
the following) day.
29Air pollutants trends in Milan
- SO2, NOx and CO decreasing trends (catalytic
converters, improved heating oils)
- PM10 and O3 increasing from the early 90s
30Prediction methodology FFNN
- The input set contains both pollutants (PM10,
NOx, SO2) and meteorological data (pressure,
temperature, etc). - The input hourly time series are grouped to daily
ones as averages over of given hourly time
windows (chosen by means of cross-correlation
analysis). - The architecture is selected via trial and error
and trained using the LM algorithm and early
stopping.
31PM10 timeseries analysis
- Available dataset 1999-2002
- Winter concentrations are about twice as summer
ones, both because of unfavorable dispersion
conditions and higher emissions - On average, concentrations are about 25 lower on
Sundays than in other days
32Deseasonalization
- Yearly and weekly PM10 periodicities are clearly
detected also in the frequency domain - The same periodicities are detected also on NOx
and SO2
- On each pollutant, we fit a periodic regressor
R(?,t) before training the predictors. - PM10pred (t) R(?,t) y(t) y(t) is the
actual output of the ANN - R(?,t) cf(?1,t) f(?2,t) where
- f(?,t)?k aksin(?t)bkcos(?t)
- ?1 2?/365 day-1 ?2 2?/7 day-1
- Meteorological data are standardized as usual.
33Prediction at 9 a.m. for the current day t
- Deseasonalization allows to increase the average
goodness of fit indicators - As a term of comparison, a linear ARX predictor
results in ? 0.89 and MAE 11mg/m3 -
34Prediction for the following day (t1)
- To meet such an ambitious target, we added
further meteorological improper (i.e., unknown at
9 a.m. of day t) input variables, such as
rainfall, temperature, pressure etc. measured
over both day t and t1 - The performances obtained in this way can be
considered as an upper bound of what can be
achieved by inserting actual meteorological
forecasts in the predictor - Pollutant time series have been again
deseasonalized via periodic regressor - We tried - besides trial and error - also a
different identification approach for neural
networks, namely pruning.
35Pruned ANNs
- The network showing the lowest validation error
is finally chosen as optimal - Pruned ANNs are parsimonious they contain one
order of magnitude less parameters than
fully-connected ones
36Results
- Performances of the two models are very close to
each other, decreasing strongly with respect to
the 1-day case - As a term of comparison, the network trained
without improper meteorological information
looses just a few percent over the different
indicators, showing an almost irrelevant gap -
37Conclusions on PM10
- Performances on the 1-day prediction appears to
be satisfactory in this case, the system can be
really operated as a support to daily decisions
(traffic blocks, alarm diffusion,). - Deseasonalization of data before training the
predictors seems to be helpful in improving the
performances. - 2-days forecast are disappointing, even if
improper meteo data are introduced. Performance
differences between pruned and fully connected
neural networks are negligible. - More comprehensive meteorological data (vertical
profiles, mixing layer) may be more substantial
than training methods in improving the quality of
longer term forecasts.
38Other network architectures
RECURRENT NETWORKS some of the outputs are fed
back to the input at the following iteration.
Used in various fields (see for instance
www.idsia.ch/juergen/rnn.html).
PROBLEM how to train the network? Possible
solution limit the number of possible iterations
39- AUTOASSOCIATIVE NETWORKS
- They are trained to reproduce the input with very
few neurons in one hidden layer. - They may be used to detect input characteristics
(as Principal Components). - They can highlight non linear links between input
variables.
- They can also be useful, for instance, to
diagnose faults in a sensor network (a broken
sensor gives values different from those of the
network output).