Title: Analysis of Complex Systems
1Analysis of Complex Systems
2Abstract
- My project is involved with using data mining
techniques on the internet in order to gather
enough information for the use for a genetic
algorithm in trend analysis of a complex system
e.g. the stock market
3Scope
- The most fundamental element of my program is
creating a correlation between news about a
company and its stock and the price of the stock
itself. In order to do this, a huge amount of
data on both stock prices and news regarding
companies must be processed into a quantitative
format, and then extensively analyzed.
4Expected Results
- In this project, I expect to at the very least
have a very useful genetic algorithm, that given
a list of independant and dependant data, can
generate equations to create a tentative
correlation. While the extremely chaotic nature
of the specific application may prevent
quantitative success in this instance, I do
expect to have success on general terms.
5Other's Work
- Due to the very lucrative nature of a program
that could predict the stock market - Many have tried
- All have failed
6Procedures
- Differs for each part of program
- Quantitative tests of success
- XML parser
- Data classification
- Trial and error tests
- Evaluation algorithms
- Discriminant generation
7Design
- Several program segments
- Data miner
- uses XML parser
- Stock Robots
- Discriminant generator
- Prediction shell
- Equation Refiner Module
8Program Tests
- Bot Success
- Early program stage involved making subroutines
that kept portfolios and invested - Evaluated by profit
- Prediction Algorithm
- Not working yet
- Evaluation to be based on accuracy of predictions
9Algorithms
- Different program segments use different
algorithms - Data mining algorithm
- Discriminant Generation Algorithm
- XML parsing algorithm
- Equation Refinement algorithm
10Data Mining
- 1st Generation Data Miner Albert
- Used hardcoded processing of RSS feeds to gather
news on stocks - Data was hard to classify and generate a
quantitative score for
11Albert's Algorithm
start_item news.index("ltitemgt") end_item
news.index("lt/itemgt")7 item
newsstart_itemend_item news
newsend_item if len(item)
0break title item13item.index("lt/titlegt")
item item(item.index("lt/titlegt")8)
link item6item.index("lt/linkgt").split("")-
1 item itemitem.index("lt/linkgt")7 p
ubdate itemitem.index("ltpubDategt")9item.index
("lt/pubDategt") if "ltdescriptiongt" in
itemdescription itemitem.index("ltdescriptiongt"
)13item.index("lt/descriptiongt") elsedescri
ption "" cursor config.cursor() if
cursor.execute("SELECT FROM " table " WHERE
stock\""stock"\""\ "AND
pubdate\""pubdate"\" AND title\""title"\"")
lt 1 print title,"\n\t",pubdate,"\n\t",descri
ption cursor.execute("INSERT INTO " table
" SET stock\""stock"\", "\ "pubdate\""
pubdate"\", title\""title"\","\ "
description\""description"\",
link\""link"\"")
12Data Mining, cont
- 2nd Generation Algorithm Beatrice
- Planned to use XML parser to go through online
search engine results - Should generate much more data than Albert did,
as well as more detailed data
13XML Parser
- Two potential methods
- Iterative
- Uses a set of flags to determine what action to
take with each character - Recursive
- Splits XML document into sets of tags and
processes each tag's child elements
14Problems
- Main holdup on predictor is malformed XML
- Bots however are working very well (over a period
of 4 weeks, averaged 7 profit for the 4, with
all 4 making a profit and the highest profit
being 19)
15Results and Conclusions
- Based on success of bots with very primitive
prediction algorithms, encouraging for a
sophisticated algorithm to be able to do anything - Success/failure hinges on analysis of data