Investment Science Corp. - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Investment Science Corp.

Description:

Investment Science Corp. – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 18
Provided by: Michae1747
Category:

less

Transcript and Presenter's Notes

Title: Investment Science Corp.


1
Investment Science Corp.
2
Symbolic Regression
  • Large Scale
  • 1M rows x 20 columns
  • Single computer
  • Less than 50 hours computation time
  • Server Farm Scaling
  • Assume a 100 server farm
  • The Farm can manage 1000 symbolic regressions of
    1M rows x 20 cols
  • Elapsed computation time will be 500 hours
  • Trading System Deployment
  • Training weekly requires one large scale symbolic
    regression
  • To deploy, must get approval from trading
    committee
  • Blind forward 20 year history requires 1000 large
    scale symbolic regressions

3
DeepGreenTM WorkFlow
The CTO provides hands on direction to Tiger
teams developing advanced analytics.
Weekly Securities Buy-Sell Account Traders
Account Traders See only The weekly HTML reports.
Weekly Production Run Green Team
Weekly HTML Reports Account Traders
What-If Blind-Forward Testing Investment Science
Department Green Team
New Algorithm Integration Scoring Investment
Science Department Green Team
  • New Algorithm Research Development
  • Engineering Department
  • Green Team
  • Development Teams

Development Teams See only their own projects.
4
Weekly Production Run
  • Produces Thousands of HTML Pages
  • Analyses pages for each of 1,500 stocks
  • Performance history for each of top 150 trader
    agents
  • Product strategy performance for each of top 35
    product strategies
  • Requires Retraining of all 40M Agents
  • Currently uses 50 weekend hours for computation
  • Deployment of new models requires a server farm
  • Data Collection
  • Weekly data feed from Valueline
  • Weekly data feed from DownloadQuotes.com
  • Weekly data feed from First Call
  • Weekly data feed from Standard Poors
  • Data cleansing is automated but requires some
    human intervention
  • Valueline once asked to re-purchase our cleansed
    data (we declined)

5
Automating New Product Development
  • Marketing Defines New Product Requirements
  • Type of risk management required (Structural,
    Statistical)
  • Competitive rates of return available in market
  • Competitive risk levels available in market
  • Engineering Product Mock Ups
  • Review top trader performance (each of 40M trader
    agents)
  • Review product strategy performance (each of 35
    product strategies)
  • Can we fulfill with a mixture of existing product
    strategies traders?
  • Product Research (fulfilling the future)
  • Review top academic algorithm performance
  • Review additional data requirements for market
    penetration
  • Can we fulfill with a mixture of new algorithms
    new data?
  • Acquire test additional data and new academic
    algorithms

6
Symbolic Regression
  • Linear Regression
  • Gaussian substitution
  • Least Squares
  • Multivariate polynomial models
  • Non-Linear Regression
  • Logit regression
  • Support Vector Regression
  • A growing but very limited set of tools models
  • The Growing Need
  • Scientific problems are growing in complexity
  • All current regression tools are ON2
    computational complexity or greater
  • Generalized, scalable, symbolic regression is
    badly needed
  • It could change the practice of science

7
Large Scale Symbolic Regression
  • Why chose evolutionary technology?
  • Algorithms are ON1 computational complexity (they
    scale well)
  • Just-in-time algorithms
  • Algorithms are creative
  • Multiple techniques available
  • Genetic Programming
  • Grammatical Evolution
  • Grammatical Swarm Optimization
  • Reaching scalability
  • Challenge basic assumptions
  • Recombine disparate techniques into powerful
    partnerships
  • Use statistics
  • Use computer science

8
Experimental Combinations
  • Combined Hybrid combination of particle swarm
    agents and GP
  • Combined Hybrid combination of grammar and
    tree-based GP
  • Combined Hybrid fitness measure supporting
    symbolic regression and classification long/short
    candiudates
  • Combined Hybrid combination of multiple island
    populations and boosting with GP

9
Techniques Employed
  • Experimental setup with separate training
    testing data sets
  • Generate training data with simple complex
    models noise
  • High speed compiler generating register speed
    individuals
  • Fitness measure rewards accuracy first then tail
    classification
  • Standard GP using the abstract grammar
  • Abstract grammars implemented as particle swarm
    agents
  • Vertical slicing (sort by Y then use every nth
    training example)
  • Hill climbing mutation added to crossover
    operator
  • Context Aware Crossover added to crossover
    operator
  • Standard GP using the MVL grammar
  • Island GP using multiple grammars
  • Separate island for each boosting run
  • Exhaustively search abstract roots in each run
  • Standard GP using abstract grammar
  • Tournament-of-Champions every five training runs

10
Simplified Concept Flow
11
Abstract Grammar
  • Little Fine-Grain Control log(x3.2392)/sin(x10
    56.341)
  • More Fine-Grain Control log(V1C1)/sin(V2C2)
  • Abstract Substitution Vi choose from X1 thru
    XN, Ci choose any real number
  • Swarm Intelligence Use particle swarm or
    differential evolution for fine-grain control

12
Remaining Big Issues
  • Time Constraint Abstract grammars produce better
    results when given more time.
  • Poor Performance on Multi-model Test Cases GP
    still having problems on more difficult cases.
  • Search Coverage GP still prematurely converging
    on local minima.

13
Future Research Steps
  • Age-Layered Population Structure as an attempt to
    avoid premature convergence?
  • A Posteriori Fitness Subsets as a directed search
    for better performance on more difficult
    multi-model test cases?
  • Information Theoretic Fitness Measures as a tool
    for improving performance on more difficult
    multi-model test cases?

14
Reviewer Questions Part1
  • Which market index was used for this market
    neutral study? (and all other questions of this
    nature)
  • Is a fixed five-year training window adequate
    for all changing market regimes?
  • Are there other advantages of Vertical Slicing
    beyond saving training time?
  • Why do you claim classification was good in most
    cases?

15
Reviewer Questions Part2
  • What is the time scale used - which is better
    one month hold, one quarter hold, etc?
  • Is the possibility under consideration that the
    tool could evolve itself or that it could adapt?
  • What is the motivation for retraining on all
    1250 samples, when only the latest 5 of them are
    new each week, is unclear?
  • Does this extended context-aware crossover yield
    a number of evaluations significantly less than
    that of enumerative search by a constructive
    procedure that simply builds successively more
    complex canonical forms?

16
Audience QA
  • What are the audience questions?

17
Investment Science Corp.
Write a Comment
User Comments (0)
About PowerShow.com