Datamining Methods for Demand Forecasting at National Grid Transco

1 / 101
About This Presentation
Title:

Datamining Methods for Demand Forecasting at National Grid Transco

Description:

Separate models for the GMT, BST and Xmas & New Year' periods ... The Easter and Xmas-NY periods are indicated by separate fuzzy flags. 13 LDZs ... –

Number of Views:138
Avg rating:3.0/5.0
Slides: 102
Provided by: davi125
Category:

less

Transcript and Presenter's Notes

Title: Datamining Methods for Demand Forecasting at National Grid Transco


1
Datamining Methods forDemand Forecastingat
National Grid Transco
  • David Esp
  • A presentation to the Royal Statistical
    Societylocal meeting of 24 February 2005at the
    University of Reading, UK.

2
Contents
  • Introduction
  • National Grid Transco
  • The Company
  • Gas Demand Forecasting
  • Datamining
  • Especially Adaptive Logic Networks
  • Datamining for Gas Demand Forecasting
  • Framing the Problem
  • Data Cleaning
  • Model Inputs
  • Model Production
  • Scope for Improvement
  • Conclusions

3
Introduction toNational Grid Transco
4
National Grid Transco (NGT)
  • Part of the NGT Group (www.ngtgroup.com)
  • NGT Group has interests around the globe,
    particularly the US
  • NGT-UK consists of
  • National Grid (NG) Electricity transmission (not
    generation or distribution)
  • Transco (T) Gas transmission

5
Introduction toGas Demand and its
Forecastingat National Grid Transco
6
Breakdown of Demand
  • National Transmission System (NTS)
  • Many Large industrials
  • Large industrials
  • Gas-fired power stations
  • 13 Local Distribution Zones (LDZs)
  • Mostly domestic
  • The presentation will focus on models for this
    level onlY.

7
Forecasting Horizons
  • Within day - at five different times
  • Day Ahead
  • Up to one week ahead

8
Gas Demand Daily Profiles
9
What Factors Drive Gas Demand?
  • Weather
  • Thermostats
  • Heat leakage from buildings
  • Heat distribution in buildings (hot air rises)
  • Gas-powered plant efficiencies
  • Consumer Behaviour
  • Season (e.g. stay indoors when dark)
  • Holidays
  • Weather-Influenced Consumer Behaviour
  • Perception of weather (actual or forecast)
  • Adjustment of thermostats

10
Weather
  • Temperature ( 1ºC 5 to 6)
  • Wind ( above 10 Knots 1K 0.5)
  • Cooling Power - wind-chill (a function of wind
    and temperature)
  • ( Straight, delayed and moving average
    derivations of all the above ).

11
Demand Temperature Relationships
12
Temperature Effects
13
Seasonal Temperature Sensitivityof Gas Demand
14
Consumer Behaviour
  • Seasonal Transitions (Autumn and Spring)
  • Bank Holidays (Typically -5 to -20 variation)
  • Adjust thermostats timers in (delayed) response
    to weather.
  • e.g. protracted or extreme cold spells
  • Weather Forecast Effects
  • Special Events

15
Introduction to DataminingWhat Why
16
Datamining
  • A generally accepted definition
  • The non-trivial extraction of implicit,
    previously unknown and potentially useful
    information from dataFrawley,
    Piatetsky-Shapiro Metheus
  • In practice
  • The use of novel computational tools (algorithms
    machine power).
  • Information may include models, such as neural
    networks.
  • A higher-level concept, of which Datamining forms
    a (key) part
  • Knowledge Discovery from Databases (KDD)
  • Relationship Knowledge gt Information gt Data

17
Datamining Techniques
  • What are they?
  • Relatively novel computer-based data analysis
    modelling algorithms.
  • Examples neural nets, genetic algorithms, rule
    induction, clustering.
  • In existence since 1960s, popular since 1995.
  • Why advantages have they over traditional
    methods?
  • More automatic
  • Less reliance on forecasting expertise.
  • Fewer man-hours (more computer-hours)
  • Potentially more accurate
  • New kinds of model, more accurate than existing
    ones
  • Greater accuracy overall, when used in
    combination with existing models
  • Knowledge discovery might lead to improvements in
    existing models.

18
Core Methods Tools
  • Data Cleaning
  • Self-Organizing Map
  • Used to highlight atypical demand profiles and
    cluster typical ones
  • Adaptable (Nonlinear Nonparametric) Modelling
  • Adaptive Logic Network (ALN)
  • Automatically produces models from data.
  • Better than a Neural Network
  • Input Selection
  • Genetic Algorithm (GA)
  • Selects best combination of input variables for
    model
  • Also optimizes an ALN training parameter -
    learning rate

19
Experience
  • 1995-1999 Financial, electrical chemical
    problems.
  • 1999 Diagnosis of Oil-Filled Equipment (e.g.
    supergrid transformers) by Kohonen SOM.
  • 2000 Electricity Demand Forecasting
  • Encouraging results
  • Business need disappeared
  • 2001-2 EUNITE Datamining competitions
  • 2003 Gas Demand Forecasting Experiments
  • 2004 Gas Demand Forecasting models in service
  • 2005 More gas models, also focusing on wind
    power.

20
Introduction to DataminingNonlinear
Nonparametric Models
  • The core datamining method applied to gas demand
    forecasting.

21
Some Types of Problem
  • Linear - e.g. ymxc
  • Non-Linear and Smooth
  • Monotonic - e.g. yx3
  • Non-Monotonic - e.g. yx2
  • Discontinuous- e.g. ymax(0,x)
  • We might not know thetype of function in advance.

22
Parametric Modelling
Linear (1st Order Polynomial) Fit
3rd Order Polynomial Fit
23
Non-Parametric Modelling
One Linear Segment
Two Linear Segments
  • Linear Segmentation is not the only
    non-parameterised technique.
  • The key feature is growth - hence no constraint
    on degrees of freedom.

24
Non-Parametric Modelling
Three Linear Segments
Four Linear Segments
  • No need for prior knowledge of the nature of the
    underlying function.
  • The underlying function does not have to be
    smooth, monotonic etc.

25
Parametric Modelling Method
  • A formula is known or at least assumed
  • Typically a polynomial (e.g. linear).
  • May be any kind of formula.
  • Can be discontinuous.
  • Model complexity is constrained
  • Tends to make the training process robust and
    data-thrifty.
  • A model of complexity exactly as required by the
    problem should be slightly more accurate than a
    non parametric model, which can only approximate
    this degree of complexity.
  • Specialist regression tools can be applied for
    different classes of function
  • linear (or linearizable), smooth, discontinuous...

26
Parametric Modelling Methode.g. Multiple Linear
Regression
  • Advantages
  • Extremely fast both to train and use
  • If well-tailored to the problem, should give
    optimal results.
  • Disadvantages
  • Requires uncorrelated inputs
  • Assumptions about data distributions

27
Non Parametric Modelling Benefits
  • Advance knowledge of the problem is not required
  • Domain-specific knowledge, though helpful, is not
    vital.
  • No assumptions about population density or
    independence of inputs.
  • Model complexity is unconstrained
  • Advantage Model may capture unimagined
    subtleties.
  • Disadvantages
  • Training demands greater time, data volume and
    quality.
  • Model may grow to become over-complex, e.g.
    fitting every data point.
  • Additional possibilities
  • Feasibility Study
  • Determine if any model is possible at all.
  • Knowledge Discovery
  • Analyze the model to determine an equivalent
    parametric model.

28
Non-Parametric Modelling Issues
  • Might not be completely flexible learning
    algorithm may have limitations.
  • We may need to partition the problem manually.
  • The model might not generalize to the extent
    theoretically possible.
  • Much greater need for training data.
  • Can over-fit (resulting in errors) Extra
    measures needed to prevent this.
  • Longer training time (may not be an issue).

29
Introduction to DataminingNonlinear
Nonparametric Models Under, Optimal and Over
Fitting
  • This section applies to many nonlinear
    nonparametric modelling methods, not just neural
    networks.

30
Example Underlying (2-D) FunctionA privileged
view - we would not normally know what the
function looked like...
z 1000 sin(0.125 x) cos(4 ?/(0.2 y 1))
31
Undertrained ModelALN model with 24 segments
i.e. planes. Too angular (from privileged
knowledge)
32
Optimally Trained ModelALN model with 300
planes. Looks very similar to our defined
function.
33
Overtrained ModelAn ALN with 1500 planes joins
the dots of the data instead of generalising.
34
Determining Optimality of Fit
  • The function is not known in advance
  • Might be smooth, might be wrinkly - we dont
    know.
  • What are our requirements on the model?
  • What degree of accuracy is needed?
  • Any constraints on shape or rates-of-change?
  • How do we assess the models quality?
  • Test against a held-back set of data
  • Analyze the models characteristics
  • Assumes we know what to require or expect.
  • e.g. Sensitivity to inputs (at various parts of
    the data space)
  • e.g. Cross-sections (of each variable, for
    different set-points of the other variables)

35
Traditional Cross-ValidationValidate on data
that is randomly or systematicallyselected from
the same period as the training data.
Train on the training data (grey) until error is
least on the cross-validation data (blue). Actual
use will be in the future (green), on data which
is not yet available.
36
Back-ValidationValidate on data that, relative
to the training data, is as old as the future is
new.
Train on the training data (grey) until error is
least on the back-validation data (blue).Reason
like the future data, the back-val. data is an
edge.
Back-val. data
Training (regression) data
Future data (unavailable)
This method has been proven by experiment to be
superior totraditional cross validation for both
gas and electricity problems.
37
Optimal and Over Training
This is deliberate over-training. The optimum
point is where the (purple) Back-Validation
(Backval) error curve is at a minimum, namely
Epoch 30. This agrees well with that of the
Holdback (pseudo future) data.
38
Introduction to DataminingNonlinear
Nonparametric Models Example Algorithms
39
Machine Learning / Natural Computing /Basis
Function Techniques
  • Derive models more from data (examples) than from
    knowledge.
  • Roots in nature and philosophye.g. artificial
    intelligence robotics.but converging with
    traditional maths stats.
  • Many types of algorithm.
  • Evolutionary / Genetic Algorithms
  • Neural Network (e.g. MLP-BP or RBF) - popular
  • Support Vector Machine - fashionable
  • Adaptive Logic Network - experience
  • Regression Tree
  • Rule Induction
  • Instance (Case) and Cluster Based

40
Introduction to DataminingNonlinear
Nonparametric Models Example Algorithms
Neural Networks (ANNs)Focussing on the Multi
Layer Perceptron (MLP)
41
Neural Networks - Brief Overview (1)
  • But how many neurons or layers? Repeatedly
    experiment (grow, prune)

42
Neural Networks - Brief Overview (2)
  • Inspired by nature (and used to test it).
  • Output is sum of many (basis-) functions,
    typically S-shaped.
  • Each function is offset and scaled by a different
    amount.
  • Very broadly analogous to Fourier etc.
  • Given data, produce its underlying model.

43
Neural Networks - Brief Overview (3)
44
Introduction to DataminingNonlinear
Nonparametric Models Example Algorithms
Adaptive Logic Networks (ALNs)

45
Main Advantages over ANNs
  • Theoretical
  • No need to define anything like a number of
    neurons or layers
  • ALNs automatically grow to the required extent.
  • No need for outer loop of experimentation (e.g.
    pruning)
  • Basis functions are more independent, hence
  • easier and faster learning
  • greater accuracy
  • faster execution.
  • Less black-box - can be understood.
  • Function inversion - can run backwards.

46
Main Advantages over ANNs
  • Observed
  • Better accuracy sharper detail.
  • Better training faster, more reliable and more
    controllable.

47
Adaptive Logic NetworksHow they WorkALN
Structure
48
What is an ALN?
  • A proprietary technique developed by William
    Armstrong, formerly of University of Alberta,
    founder of Dendronic Decision Limited in Canada.
  • WWW.DENDRONIC.COM
  • A combined set of Linear Forms (LFs)
  • An LF yoffseta1x1a2x2...
  • An ALN initially has one LF - making it the same
    as normal linear regression
  • After optimizing its own fit, each LF can divide
    into independent LFs.
  • ALNs are generated in a descriptive form that can
    be translated into various programming languages
    (e.g. VBA, C or Matlab).

49
Minimum (Min) Maximum (Max) Operators in ALNs
y Min(a,b,c) - lines cut down
y Max(a,b,c,d) - lines cut up
Output
Linear Forms (regressions)
...
50
Min Max Combined
Output
LeftHump Min(a,b,c)RightHump Min(d,e,f,g) y
Max(LeftHump,RightHump)
Linear Forms
...
Inputs
51
ALNs are Trees of Linear Forms
  • More Complex Trees are Possible
  • Can grow to any number of layers, any number of
    linear forms.
  • During training, each leaf - linear form - can
    split into a min or max branch.
  • Later in training, leaves can be recombined as
    necessary.
  • Tree complexity can be limited by
  • Tolerance - a sufficiently accurate leaf wont
    split any further.
  • Can be fixed or varying across the data space
  • Direct constraint - e.g. max. depth 5.
  • Indirectly, by stopping training at minimum
    validation error

52
Introduction to DataminingNonlinear
Nonparametric Models Example Algorithms ALNs
vs. MLPs Simple Demo
  • Demonstration of ALN benefits through a trivial
    example.

53
Artificial ProblemWith smooth regions and a
sharp point
54
Neural Net - 4 Hidden Neurons
55
Handicapped ALNTolerance0.6 ? 4 Linear Forms
56
Neural Net - Further Training
57
Unhandicapped ALN Offset is simply for clarity
of presentation
58
Adaptive Logic NetworksHow they WorkFurther
Details
59
A Snapshot of Training
y Max(LF1,LF2,LF3)
LF3
LF1
Side-effect Orange points no longer influence
that LF, but will now pull up the other two LFs.
LF2
A data point is presented. It pulls the linear
form it influences towards itself (by learning
factor proportion).
60
ALN Learning LF Splitting
Output axis
If repeated adjustments of a given LF fail to
reduce error below Tolerance, the LF splits into
two and the process is repeated for each one
independently. Due to random elements of
training, they wander apart to cover different
portions of the data space.
Input axis
61
Recap ALN Structure
  • During training ALNs can grow into complex trees.
  • Branches are Max and Min operators.
  • Leaves are Linear Forms.
  • Trees can be of any depth. The one shown here is
    just a simple example.
  • Transformation may be possible into a more
    efficient form where initial branches are
    if..then rules.

62
ALNs can be Compiled into DTRs
Example For x in this intervalonly pieces 4 and
5 play a role.
1
5
6
2
4
3
Min(5,6)
Min(4,5)
Min(2,3,4)
Min(1,2)
x
Input axis x
63
Bagging - Averaging Several ALNs
  • A very simple way to improve accuracy
  • Applicable to any set of diverse models having
    same goal
  • For example standard MLP neural nets
  • For ALNs, diversity arises through random number
    generator affecting the training process e.g. the
    order in which data are presented.
  • BestMean is a proven refinement
  • e.g. reject results outside 2 stdev then
    compute the new mean

64
Model Development
How datamining methods were brought to bear on
our gas demand forecasting problem.
65
Stages of Model Production
  • Framing the Problem
  • Data Preparation
  • Data Cleaning
  • Derived Variables, Partitioning.
  • Input Selection
  • ALN Training
  • Implementation in Code
  • Conversion of the ALN to a convenient programming
    language.
  • Quality Assessment
  • User-testing in the target environment.

66
Model DevelopmentFraming the Problem
67
How should we frame the problem?We are in a
vacuum here, so we need to guess or preferably
experiment.
  • Hourly or daily?
  • The main requirement is for daily total demand
  • Summing hourly demands tends to give greater
    accuracy.
  • Absolute or relative?
  • But d(Demand)/d(Temperature) varies with
    Temperature
  • One big model for all LDZs, all-year round?
  • Separate models for each LDZ?
  • Split the year into parts or just flag or
    normalize each part?
  • What parts? GMT/BST Seasons? Christmas? Easter?
  • Try clustering, make a model for each cluster
  • Also try experiments based on intuition
    guesswork

68
Traditional framing of the problem
  • Daily totals
  • Linear relationships
  • Only model standard days - employ normalization
    (adjustment factors) for special days such as
    bank holidays.
  • Compute the change in demand

69
New framing of the problemBased on experience
intuition
  • Hourly totals (daily sum of hourlies)
  • Nonlinear relationships
  • Model all days - no need for normalization
    (adjustment factors).
  • Absolute demand

70
Experience Clustering of Electricity
ProfilesKohonen SOM - as implemented in
Eudaptics Viscovery SOM-Mine
Coloured areas are clusters, each with a
distinctive daily demand profile. Red text is our
interpretation.
71
Clustering of Gas Profiles
not such a detailed picture as for electricity...
Jan Dec
Jan Feb Mar Nov
Apr May Oct
June July Aug Sept
Yellow-ish areas indicate similar profiles,
Red-ish areas indicate more varying profiles.
72
Find the Best Structure for the ModelBy
experiment...
  • Experiments (on one typical LDZ)
  • One model for the whole year
  • Separate models for each of four clusters
  • Separate models for the GMT, BST and Xmas New
    Year periods
  • Separate models for GMT and BST, experimenting
    with various types of indicator for the Xmas-NY
    period
  • straight flags fuzzy flags
  • THIS PRODUCED THE BEST RESULTS

73
Final Structure for the Model
  • Produce separate models for each season of each
    LDZ.
  • Two seasons GMT BST
  • The Easter and Xmas-NY periods are indicated by
    separate fuzzy flags.
  • 13 LDZs
  • Each model will contain a Bag of 10 ALNs
  • Bag returns BestMean of the 10 ALNs
  • Bestmean rejects results outside 2 stdev
  • Thus 260 ALNs need to be produced.

74
Model DevelopmentData PreparationData Cleaning
75
Data Cleaning
  • Data Problems
  • Some actual demands are unrealistic.
  • Atypical demands are not useful for training.
  • Detection Method
  • Viscovery - commercial Kohonen / SOM tool
  • Was used to highlight unusual profiles.
  • Also manually checked plotted ranges and
    profiles in Excel.

76
Greater Requirement for Data Quality
  • Our models may be more demanding than traditional
    ones in terms of data quality.
  • Since our models are non parametric, they may be
    more susceptible to glitches in the data (may try
    to model them).
  • It is possible that the available data will not
    meet our quality requirements.
  • The existing data is clean in respect of daily
    totals, but hourly figures are traditionally less
    important.

77
Bad Profile DetectionOnce again, making use of
Eudaptics Viscovery SOM-Mine
  • Arguably the best possible two-dimensional
    representation of an n-dimensional problem.
  • The aspect ratio is based on 1st two principal
    components. It shows the main shape of the
    problem.
  • Outlier profiles (possible errors) show up as red
    blemishes
  • Yellow-ish areas are groups of similar profiles
  • Red-ish areas indicate abnormalities.

78
Bad Profile - Positive Glitch
79
Bad Profile - Negative Glitch
80
Bad Profile - Wobble
81
Bad Profile - Clock-change Artefact
82
Model DevelopmentData PreparationModel Inputs
83
Data PreparationDerive additional variables as
possible inputs
  • Think up as many candidate inputs as possible
  • Anthropomorphize Think like an ALN
  • Sine and Cosine of Day and of Year.
  • Represent and maintain cyclic nature of diurnal
    and annual cycles.
  • Annual gas cycle is approximately a sine wave
    (obvious knowledge).
  • Moving-average of Temperature
  • Cooling Power (wind chill)
  • Days Since 1 April 1990 (basis for spotting
    long term trends)
  • Fuzzy-Flags (special periods)
  • These merely highlight the incidences of special
    days
  • They do not indicate demand effects

84
Input Selection (1)
  • Around 60 potential inputs
  • Implies 260 possible choices.
  • Too many for exhaustive search.
  • Systematic search may be infeasible
  • The search-space may be rough.
  • Inputs may interact, especially in an unknown
    nonlinear model.
  • In previous projects, standard methods such as
    correlation-based input selection or adding or
    pruning inputs one at a time have failed to find
    the optimum selection.
  • The chosen selection method
  • Genetic Algorithm
  • Proven jack of all trades discrete optimization
    method
  • Fitness function based on training and testing
    disposable ALNs.

85
Input Selection (2)No simple consistent method
- too many interactions and nonlinearities - use
a genetic algorithm.
Unsurprisingly, inputs having greatest
correlation to the output were chosen by the GA.
However, below a certain threshold of
correlation, the correspondence is less the GA
chose some inputs having tiny correlation
instead of other inputs of greater correlation.
Only 32 choices in this example. The small black
stumps indicate inputs chosen by the GA.
86
Input Selection (3)Genetic Algorithm
(GA)Inspired by Darwins Theory of Evolution
  • Our GA
  • Around 100 generations of 50 individuals,
    initially random.
  • An individual is a specific choice of inputs.
  • Reproduction
  • Crossover (mating)
  • Make a new individual by combining randomly
    selected features from some of the fittest
    existing individuals.
  • Mutation (small random changes)
  • Invert one or more decisions as to which inputs
    to use.
  • Survival of the Fittest
  • The fitness of an individual is assessed by
    training an ALN with the given input selection,
    then testing it on separate test data.
  • Actually we train and average the results of a
    few ALNs.

87
Input Selection (4a)Genetic Algorithm The
PrinciplesSurvival of the Fittest
Survivors plus their offspring (produced by
crossover mutation)
88
Input Selection (4b)Genetic Algorithm The
PrinciplesCrossover
89
Input Selection (4c)Genetic Algorithm The
PrinciplesMutation
90
Input Selection (4d)Genetic Algorithm The
PrinciplesOverall Loop
91
Model DevelopmentModel Production
92
ALN Training
  • Tool AlnFit-NGT
  • Source code adapted from Dendronic Decisions
    Limited.
  • Underlying Dendronic Learning Engine (a standard
    DLL).
  • Method Back-Validation
  • Oldest year of data used for validation.
  • Most recent years of data used for training.
  • Train to the point (epoch) of least error on
    validation data.

93
Implementation in Code
  • Automatically translate descriptive form to VBA
  • Ultimately implement as a set of ActiveX DLLs
  • Topmost a Wrapper DLL
  • Provides a standard interface to the
    user-program.
  • Generates derived inputs
  • Decides which model to run (based on LDZ time
    of year).
  • ALNs DLLs (one for GMT, one for BST)
  • Contain LDZ-specific models as Classes
  • Type Definitions DLL

94
Scope for Improvement
95
Remaining Technical Issues - 1
  • Knowledge Refinement
  • Find the best way to use recent demand or demand
    error
  • Improved Weather Inputs
  • Wind direction
  • gt1 weather station in same LDZ
  • Refinement of our Methods and Tools
  • Automatic data error detection
  • Genetic Algorithm - make it more robust and
    efficient (e.g. distributed)
  • ALN training improvements

96
Remaining Technical Issues - 2
  • Metrics
  • Needed for model optimization and quality
    assessment
  • Different metrics targetted at model developer
    and user?
  • Kinds of Metrics
  • Traditional MAPE and Max. Abs. Error
  • Propose Median Abs. Error and Ave of top-10 Abs.
    Errors
  • For comparability, normalize by St.Dev ?
  • Data Sampling and Input Selection
  • Is there a better way? WAID?

97
Future Development
  • Refinements
  • Within-Day Fixer (part-developed).
  • Arbitrary-Horizon Fixer.
  • Kalman Filter (on-line adaption).
  • Future Problems
  • National gas demand
  • Windpower
  • Wish
  • Hands off Model Development Server

98
Conclusions

99
Conclusions
  • Regarding NGT
  • NGT have made effective use of datamining methods
    for electricity and gas demand forecasting.
  • Quick dirty feasibility models
  • Longer development high-accuracy production
    models
  • When run in combination with existing models, the
    overall accuracy is improved
  • With financial benefits !
  • More General Lessons
  • ALNs are great!
  • For such problems, back-validation is better than
    cross-validation.

100
- End -
Any Questions?
101
Datamining-BasedGas Demand Forecasting Models
  • Phase-I Models in service since July 2004
  • Phase-II Models
  • GMT Models in service since January 2005
  • BST Models currently under development (for March
    05)
  • Phase II Enhancements
  • More intensive Genetic Algorithm (GA) runs
  • Greater number of generations
  • Greater mutation probability
  • Greater choice of inputs
  • Individual GA runs for each LDZ(hence
    potentially different input variables)
  • Methodology verified by experiment
Write a Comment
User Comments (0)
About PowerShow.com