Title: Moving towards the Artificial Hacker
1Moving towards the Artificial Hacker
2Who am I?
- Member of Felinemenace
- University Student
- Computer Security enthusiast for many years
- By no means an AI expert!!
3Talk outline
- Why Artificial Intelligence?
- What is Artificial Intelligence?
- Introduction to Genetic Algorithms
- Introduction to Genetic Programming
- Introduction to Artificial Neural Networks
- Introduction to Intelligent Agents
http//www.felinemenace.org
4Why Artificial Intelligence?
- Interesting Field.
- AI techniques can be used to partly or wholly
automate tasks. - Striving towards software tools that can act with
some degree of intelligence. - There are a lot of tasks performed by security
professionals that lend themselves well to
automation.
http//www.felinemenace.org
5What is Artificial Intelligence?
- Umbrella term encompassing a lot of techniques.
- Adapted from Arthur Samuels definition AI aims
to create software or machines to perform tasks
that if performed by humans would be assumed to
involve the use of intelligence. - Includes fuzzy systems, Bayesian networks, neural
networks, genetic algorithms, genetic
programming, expert systems the list goes on - Other disciplines such as psychology that study
AI with the aim of better understanding our
thought processes. Not really of interest to us
http//www.felinemenace.org
6Genetic Algorithms - Introduction
- A type of informed search
- Suitable for problems in which only the solution
state matters, not the path(s) that leads to it - Search for the optimal value within a search
space - Inspired by evolutionary biology
http//www.felinemenace.org
7Genetic Algorithms - Introduction continued
- Attempt to mimic the natural process of evolution
- Survival of the fittest
http//www.felinemenace.org
8Genetic Algorithms - Data structures
- Chromosomes are used to represent potential
solutions to the problem (think of a chromosome
as a string) - Each chromosome has an assigned fitness produced
by a fitness function (measure of how close to
meeting the solution the chromosome is) - The population is the entire set of chromosomes
at a point in evolution
http//www.felinemenace.org
9Genetic Algorithms - Operations
- Crossover operation attempts to mimic
reproduction. In its basic form a crossover
point within the two chromosomes is selected and
the two chromosomes are combined - Selection operation selects pairs of chromosomes
from the population to crossover - Mutation operation randomly changes a random
element of a chromosome
http//www.felinemenace.org
10Genetic Algorithms - Steps
- A random population of chromosomes are generated.
- The fitness function is applied to each
chromosome. - The selection operation is applied favoring the
fittest chromosomes. - The crossover operation is applied on the
selected pairs to generate a new population. - Mutation is applied to some offspring
http//www.felinemenace.org
11Genetic Algorithms - Example
http//www.felinemenace.org
12Genetic Algorithms - Constructing a GA based
fuzzer
- Chromosome represents our input into the
application (File, Packet, argument) - Genesis state must be chosen carefully
- If the input is all rejected we will have
difficulty evolving - If we lack diversity we experience convergence
- Depending on the application our mutation and
crossover functions may have generate valid
data. - What is our measure of fitness when fuzzing?
http//www.felinemenace.org
13Genetic Algorithms - Constructing a GA based
fuzzer cont
- Code coverage!
- The more code we can test the more bugs we
(hopefully) expose. - We have a few options for a fitness function
- Instruction stepping
- Produce a call-flow-graph - assess fitness based
on coverage - Profile the target application and mark
interesting code blocks by breakpointing - assess
fitness based on number of blocks hit
http//www.felinemenace.org
14Genetic Algorithms - Applications
- Brute force instances for which we can obtain
hints as to how successful attempts are. - Fyodor/Mikasofts talk used them to breed
overflow strings. - Evolve firewall or IDS rule-sets.
http//www.felinemenace.org
15Genetic Programming - Introduction
- Genetic Programming is an adaptation of genetic
algorithm techniques. - Instead of evolving strings we evolve programs.
http//www.felinemenace.org
16Genetic Programming - Basics
- We are no longer confined to evolving string
chromosomes. - Are chromosomes are now represented in the form
of a syntax tree - Languages uses prefix notation (such as lisp) are
ideal for genetic programming as they lend
themselves well to this tree notation. - We are not restricted to evolving lisp programs
however.
http//www.felinemenace.org
17Genetic Programming - Basics cont
- A syntax tree consists of nodes and links.
- Each node represents an operation
- Each link from a node represents that nodes
parameters
http//www.felinemenace.org
18Genetic Programming - Syntax Tree
http//www.felinemenace.org
19Genetic Programming - More Basics
- We generally build our software using subroutines
- Using syntax trees we represent this with a
branch to a sub-tree
http//www.felinemenace.org
20Genetic Programming - Syntax Trees
- Branches can also represent
- Iteration
- Recursion
- Conditionals
- Predefined functions
- At times we must also enforce a constrained
syntactic structure - enforces what types can
used as arguments for specific nodes
http//www.felinemenace.org
21Genetic Programming - Basics Basics Basics
- The syntax tree now forms a basis for our
chromosome - We still have populations (of chromosomes)
- Each tree still has a fitness value assigned by a
fitness function - The fitness is evaluated based upon how well a
chromosome performs at achieving the particular
goal
http//www.felinemenace.org
22Genetic Programming - Basics cont
- In order to evaluate the fitness the code is
built from the syntax tree and interpreted or ran
under a virtual machine.
http//www.felinemenace.org
23Genetic Programming - Operations
- We still have the same operations as when dealing
with Genetic Algorithms. - Mutate, Crossover, Selection.
- One new operation, the Architecture Altering
operation - The only significant change is the data structure
were applying these to (the syntax tree).
http//www.felinemenace.org
24Genetic Programming - Operations cont
- The crossover operation selects random branches
of two directed graphs and grafts them. - Mutation alters a branch of the graph.
- Selection still works (pretty much) the same.
http//www.felinemenace.org
25Genetic Programming - Program Architecture and
altering operation
- The arrangement, number and types of branches
present in the syntax tree dictate the program
architecture - The architecture altering operation is introduced
so that the underlying architecture of the
chromosome can be dynamic rather than fixed
http//www.felinemenace.org
26Genetic Programming - Example
http//www.felinemenace.org
27Genetic Programming - Breeding shellcode
- Use PPC instruction set
- Generally 3 register operands per instruction
- mnemonic dst, operand, operand
- add r6, r11,r10
http//www.felinemenace.org
28Genetic Programming - Breeding shellcode cont
- Each node in our syntax tree represents an
instruction. - Each node has two links representing its
operands. Child node destination operands
evaluate to parent node operands. - Links can represent conditional instructions.
- Terminals with a constant value can evaluate to
an li or equivalent.
http//www.felinemenace.org
29Genetic Programming - Breeding shellcode cont
http//www.felinemenace.org
30Genetic Programming - Breeding shellcode cont
- There are many different aims when writing
shellcode. - Some measures of fitness may be
- Length
- Architecture alteration can play a big part in
this - Absence of Illegal characters
- Nulls etc.
- Variance from previous shellcode.
- Success against a known firewall or IDS rule-set
http//www.felinemenace.org
31Genetic Programming - Breeding shellcode cont
- We can emulate code using something like qemu.
- Base fitness on
- required register states or
- required system calls hit
http//www.felinemenace.org
32Artificial Neural Networks - Introduction
- An artificial neural network attempts to mimic
the workings of the human brain. - Based upon connectionism.
- Parallel collection of small processing units
with the basis on the interconnection between
these processing elements. - ANNs are able to perform pattern matching and
classification (amongst other tasks).
http//www.felinemenace.org
33Artificial Neural Networks - Data structures
- ANNs are composed of synapses and neurons.
- Neurons perform very simple processing of its
input and produces output. - Synapses are used to connect the output of one
neuron to the input of another. - Each synapse has an associated weight that is
multiplied by the input value and fed to the
destination neuron.
http//www.felinemenace.org
34Artificial Neural Networks - Data structures cont
- Each neuron has a transfer or activation
function. - The most common function is the sigmoid function.
- The sigmoid function basically squashes the input
into the range of values between 0 and 1. - The tanh function can also be used that squashes
the input into the range of values between -1 and
1. - There is an additional bias/weight input that
acts as a threshold for each neuron. If the
addition of the weighted inputs from the synapses
exceeds this bias then the neuron fires. - Some simpler topologies use a basic threshold
function. If the sum of inputs is greater than 0
fire the value 1. Otherwise fire the deactivated
value -1.
http//www.felinemenace.org
35Artificial Neural Networks - Topologies
- Many variations upon the concept of an artificial
neural network - For the purpose of this presentation were only
concerned with feed-forward networks. - Again there are several variations on
feed-forward networks. - The main ones (especially for the purpose of
today) are single and multi-layer perceptrons.
http//www.felinemenace.org
36Artificial Neural Networks - Single layer
perceptrons
- A Single layer perceptron contains a layer of
input neuron fed directly to an output layer. - The input layer does no processing but passes the
inputs to the network directly to the outputs. - The output layer does the processing within the
network and outputs the result. - These types of networks are simplistic and cannot
perform well on more complex problems - Single layer percceptrons generally use the
threshold activation function (input gt 0 fire 1
else fire -1). - Some Single layer perceptron networks work with
continuous output using activation functions such
as the sigmoid function. - Single layer perceptrons can only perform a
limited number of functions.
http//www.felinemenace.org
37Artificial Neural Networks - Multi layer
perceptrons
- Multi layer perceptrons have an input layer, one
or more hidden layers and an output layer. - The hidden and output layers perform processing
whilst the input layers provide input into the
network. The output layer still provides the
output for the network. - The sigmoid function is generally used as the
activation function.
http//www.felinemenace.org
38Artificial Neural Networks - Topologies cont
http//www.felinemenace.org
39Artificial Neural Networks - Topologies cont
http//www.felinemenace.org
40Artificial Neural Networks - How do they learn?
- After constructing a suitable network topology a
set of training input data as well as the
expected outputs are provided. - The weights within the network are initialized to
small random values (the network knows nothing). - The network is fed the training data and the
output vs the expected output is measured. - The weights are evolved as to minimize the degree
of error in the output. - Issues can arise when the network overfits the
data. In this case the network essentially
becomes a lookup-table of the training input and
output. The statistical nature of the data is not
learnt by the network and it will not perform
well on data outside of the training set. - The right amount of training must be performed.
This is sometimes difficult to predict. - The correct network topology must be chosen. This
is sometimes guesswork and experimentation.
http//www.felinemenace.org
41Artificial Neural Networks - Applications
- Anything requiring classification or pattern
matching. - IDS/IPS traffic classification?
- Virus/Worm classification?
- Identifying code constructs within binaries?
http//www.felinemenace.org
42Intelligent Agents - Introduction
- An agent is simply something that acts.
- Agents distinguishes themselves from general
programs by - Running under autonomous control
- Perceiving their environment
- Adapting to changes within their environment
- Acting Rationally
http//www.felinemenace.org
43Intelligent Agents - Autonomy
- In saying that intelligent agents are autonomous
we mean that they operate independently and
without user control. - An Intelligent Agent can make its own decisions
and work towards its own goals. - No (or very little) guidance required by an
operator.
http//www.felinemenace.org
44Intelligent Agents - Rationality
- What does it mean to act rationally?
- For purpose of working with intelligent agents
rationality refers to an agents ability to - Act as to achieve the best outcome.
- When there is lack of certainty achieve the best
expected outcome.
http//www.felinemenace.org
45Intelligent Agents - Task Environment
- The environment in which an agent operates
dictates its design, architecture and the types
of AI techniques it employs. - The most important aspects of an Agents task
environment are - The performance measure
- The Environment
- Actuators
- Sensors
http//www.felinemenace.org
46Intelligent Agents - Task Environment cont
- There are obviously near-infinite task
environments that can agent can operate within. - There are various properties we can however
record that are relevant to each environment and
agent design. - Each environment is either
- Fully or partially observable
- Deterministic or stochastic
- Episodic or sequential
- Static or Dynamic
- Discrete or Continuous
- Single or multi agent
http//www.felinemenace.org
47Intelligent Agents - Agent variations
- There are many variations in agent software and
its design. - There are four main types that summarize most of
them - Simple reflexive agents
- Model-based reflexive agents
- Goal-based agents
- Utility-based agents
http//www.felinemenace.org
48Intelligent Agents - Agent variations cont
- Simple reflexive agents
- Receive a percept from the environment and
perform an appropriate action. - Constructed with condition-action rules in which
a change of state triggers and action. - Ignore previous percepts and actions.
- Model based reflexive agents
- Attempt to compensate for partial observance
within an environment by constructing and
internal model of their belief of the state of
the environment. - Model is generally constructed based upon percept
history (states that have been to present) - Goal-based agents
- Take into account their performance and current
goals when decision making - Utility-based agents
- Utility function that maps the current state into
a numerical measure of success. - Distinctly aware of how well they are performing
- Utility functions can constructed based upon many
measures of performance. These can be hard or
soft goals for the agent.
http//www.felinemenace.org
49Intelligent Agents - Learning
- Agents that can learn themselves or that can be
taught are more desirable than having to
explicitly program an agent. - In order to learn we add two more elements
- The learning element
- The problem generator
- The learning elements responsibility is to
assess how well the agent is performing and make
modifications to its actions (actuators). - The problem generators responsibility is to
suggest new actions for the agent that can result
in it experiencing new states (experiences) that
it can learn from
http//www.felinemenace.org
50Intelligent Worms?
- Thus far we have not seen intelligent worms.
- Worms
- Are not aware of their performance
- Do not perceive their environment
- Do not adapt their behavior
- What if a worm
- Perceive its environment - operating environment
and network - Adapted its behavior
- Operated in a cooperative multi-agent fashion
- Coordinated in a multi-agent fashion
- Such a worm could
- Intelligent select targets - DNS / Mail servers?
- Modify its propagation parameters
- Communicate its experiences with other instances
of itself - Worms do not necessarily have to perform evil
- The Nachi/Welchia family of worms patched
otherwise vulnerable machines
http//www.felinemenace.org
51Intelligent Worms?
- Worms could also use genetic operations
- Selection
- Crossover
- Mutation
- Variations of a worm are unleashed into the wild
- The worms perceive their environment and adapt
accordingly - The fittest worms survive and continue to
propagate - Worms locate their brethren and produce offspring
(cross-over) - Some of the offspring are subject to mutation
- Many variants in the wild with different
propagation parameters - difficult to isolate
http//www.felinemenace.org
52Useful links
- Gaul - genetic algorithm utility library
- http//gaul.sourceforge.net
- Fann - Fast Artificial Neural network library
- http//fann.sourceforge.net
- OpenAI Project
- http//openai.sourceforge.net
http//www.felinemenace.org
53References
- Artificial Intelligence - A Modern Approach 2nd
Edition - Stuart Russel and Peter Norvig.
- A Genetic Programming tutorial
- John Koza andRiccardo Poli
http//www.felinemenace.org
54Questions? Comments? Flames?
- Hope you enjoyed the talk.
- If you think of any questions later on Im sure
you can find me at the bar )
http//www.felinemenace.org