Moving towards the Artificial Hacker - PowerPoint PPT Presentation

1 / 54

About This Presentation

Title:

Moving towards the Artificial Hacker

Description:

AI techniques can be used to partly or wholly automate tasks. ... Adapted from Arthur Samuel's definition AI aims to create software or machines ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 55

Provided by: Office2004290

Category:

Tags: ai | artificial | hacker | moving | towards

more less

Transcript and Presenter's Notes

Title: Moving towards the Artificial Hacker

1
Moving towards the Artificial Hacker

RUXCON 2005
Ashley Fox

2
Who am I?

Member of Felinemenace
University Student
Computer Security enthusiast for many years
By no means an AI expert!!

3
Talk outline

Why Artificial Intelligence?
What is Artificial Intelligence?
Introduction to Genetic Algorithms
Introduction to Genetic Programming
Introduction to Artificial Neural Networks
Introduction to Intelligent Agents

http//www.felinemenace.org
4
Why Artificial Intelligence?

Interesting Field.
AI techniques can be used to partly or wholly
automate tasks.
Striving towards software tools that can act with
some degree of intelligence.
There are a lot of tasks performed by security
professionals that lend themselves well to
automation.

http//www.felinemenace.org
5
What is Artificial Intelligence?

Umbrella term encompassing a lot of techniques.
Adapted from Arthur Samuels definition AI aims
to create software or machines to perform tasks
that if performed by humans would be assumed to
involve the use of intelligence.
Includes fuzzy systems, Bayesian networks, neural
networks, genetic algorithms, genetic
programming, expert systems the list goes on
Other disciplines such as psychology that study
AI with the aim of better understanding our
thought processes. Not really of interest to us

http//www.felinemenace.org
6
Genetic Algorithms - Introduction

A type of informed search
Suitable for problems in which only the solution
state matters, not the path(s) that leads to it
Search for the optimal value within a search
space
Inspired by evolutionary biology

http//www.felinemenace.org
7
Genetic Algorithms - Introduction continued

Attempt to mimic the natural process of evolution
Survival of the fittest

http//www.felinemenace.org
8
Genetic Algorithms - Data structures

Chromosomes are used to represent potential
solutions to the problem (think of a chromosome
as a string)
Each chromosome has an assigned fitness produced
by a fitness function (measure of how close to
meeting the solution the chromosome is)
The population is the entire set of chromosomes
at a point in evolution

http//www.felinemenace.org
9
Genetic Algorithms - Operations

Crossover operation attempts to mimic
reproduction. In its basic form a crossover
point within the two chromosomes is selected and
the two chromosomes are combined
Selection operation selects pairs of chromosomes
from the population to crossover
Mutation operation randomly changes a random
element of a chromosome

http//www.felinemenace.org
10
Genetic Algorithms - Steps

A random population of chromosomes are generated.
The fitness function is applied to each
chromosome.
The selection operation is applied favoring the
fittest chromosomes.
The crossover operation is applied on the
selected pairs to generate a new population.
Mutation is applied to some offspring

http//www.felinemenace.org
11
Genetic Algorithms - Example
http//www.felinemenace.org
12
Genetic Algorithms - Constructing a GA based
fuzzer

Chromosome represents our input into the
application (File, Packet, argument)
Genesis state must be chosen carefully
If the input is all rejected we will have
difficulty evolving
If we lack diversity we experience convergence
Depending on the application our mutation and
crossover functions may have generate valid
data.
What is our measure of fitness when fuzzing?

http//www.felinemenace.org
13
Genetic Algorithms - Constructing a GA based
fuzzer cont

Code coverage!
The more code we can test the more bugs we
(hopefully) expose.
We have a few options for a fitness function
Instruction stepping
Produce a call-flow-graph - assess fitness based
on coverage
Profile the target application and mark
interesting code blocks by breakpointing - assess
fitness based on number of blocks hit

http//www.felinemenace.org
14
Genetic Algorithms - Applications

Brute force instances for which we can obtain
hints as to how successful attempts are.
Fyodor/Mikasofts talk used them to breed
overflow strings.
Evolve firewall or IDS rule-sets.

http//www.felinemenace.org
15
Genetic Programming - Introduction

Genetic Programming is an adaptation of genetic
algorithm techniques.
Instead of evolving strings we evolve programs.

http//www.felinemenace.org
16
Genetic Programming - Basics

We are no longer confined to evolving string
chromosomes.
Are chromosomes are now represented in the form
of a syntax tree
Languages uses prefix notation (such as lisp) are
ideal for genetic programming as they lend
themselves well to this tree notation.
We are not restricted to evolving lisp programs
however.

http//www.felinemenace.org
17
Genetic Programming - Basics cont

A syntax tree consists of nodes and links.
Each node represents an operation
Each link from a node represents that nodes
parameters

http//www.felinemenace.org
18
Genetic Programming - Syntax Tree
http//www.felinemenace.org
19
Genetic Programming - More Basics

We generally build our software using subroutines
Using syntax trees we represent this with a
branch to a sub-tree

http//www.felinemenace.org
20
Genetic Programming - Syntax Trees

Branches can also represent
Iteration
Recursion
Conditionals
Predefined functions
At times we must also enforce a constrained
syntactic structure - enforces what types can
used as arguments for specific nodes

http//www.felinemenace.org
21
Genetic Programming - Basics Basics Basics

The syntax tree now forms a basis for our
chromosome
We still have populations (of chromosomes)
Each tree still has a fitness value assigned by a
fitness function
The fitness is evaluated based upon how well a
chromosome performs at achieving the particular
goal

http//www.felinemenace.org
22
Genetic Programming - Basics cont

In order to evaluate the fitness the code is
built from the syntax tree and interpreted or ran
under a virtual machine.

http//www.felinemenace.org
23
Genetic Programming - Operations

We still have the same operations as when dealing
with Genetic Algorithms.
Mutate, Crossover, Selection.
One new operation, the Architecture Altering
operation
The only significant change is the data structure
were applying these to (the syntax tree).

http//www.felinemenace.org
24
Genetic Programming - Operations cont

The crossover operation selects random branches
of two directed graphs and grafts them.
Mutation alters a branch of the graph.
Selection still works (pretty much) the same.

http//www.felinemenace.org
25
Genetic Programming - Program Architecture and
altering operation

The arrangement, number and types of branches
present in the syntax tree dictate the program
architecture
The architecture altering operation is introduced
so that the underlying architecture of the
chromosome can be dynamic rather than fixed

http//www.felinemenace.org
26
Genetic Programming - Example
http//www.felinemenace.org
27
Genetic Programming - Breeding shellcode

Use PPC instruction set
Generally 3 register operands per instruction
mnemonic dst, operand, operand
add r6, r11,r10

http//www.felinemenace.org
28
Genetic Programming - Breeding shellcode cont

Each node in our syntax tree represents an
instruction.
Each node has two links representing its
operands. Child node destination operands
evaluate to parent node operands.
Links can represent conditional instructions.
Terminals with a constant value can evaluate to
an li or equivalent.

http//www.felinemenace.org
29
Genetic Programming - Breeding shellcode cont
http//www.felinemenace.org
30
Genetic Programming - Breeding shellcode cont

There are many different aims when writing
shellcode.
Some measures of fitness may be
Length
Architecture alteration can play a big part in
this
Absence of Illegal characters
Nulls etc.
Variance from previous shellcode.
Success against a known firewall or IDS rule-set

http//www.felinemenace.org
31
Genetic Programming - Breeding shellcode cont

We can emulate code using something like qemu.
Base fitness on
required register states or
required system calls hit

http//www.felinemenace.org
32
Artificial Neural Networks - Introduction

An artificial neural network attempts to mimic
the workings of the human brain.
Based upon connectionism.
Parallel collection of small processing units
with the basis on the interconnection between
these processing elements.
ANNs are able to perform pattern matching and
classification (amongst other tasks).

http//www.felinemenace.org
33
Artificial Neural Networks - Data structures

ANNs are composed of synapses and neurons.
Neurons perform very simple processing of its
input and produces output.
Synapses are used to connect the output of one
neuron to the input of another.
Each synapse has an associated weight that is
multiplied by the input value and fed to the
destination neuron.

http//www.felinemenace.org
34
Artificial Neural Networks - Data structures cont

Each neuron has a transfer or activation
function.
The most common function is the sigmoid function.
The sigmoid function basically squashes the input
into the range of values between 0 and 1.
The tanh function can also be used that squashes
the input into the range of values between -1 and
1.
There is an additional bias/weight input that
acts as a threshold for each neuron. If the
addition of the weighted inputs from the synapses
exceeds this bias then the neuron fires.
Some simpler topologies use a basic threshold
function. If the sum of inputs is greater than 0
fire the value 1. Otherwise fire the deactivated
value -1.

http//www.felinemenace.org
35
Artificial Neural Networks - Topologies

Many variations upon the concept of an artificial
neural network
For the purpose of this presentation were only
concerned with feed-forward networks.
Again there are several variations on
feed-forward networks.
The main ones (especially for the purpose of
today) are single and multi-layer perceptrons.

http//www.felinemenace.org
36
Artificial Neural Networks - Single layer
perceptrons

A Single layer perceptron contains a layer of
input neuron fed directly to an output layer.
The input layer does no processing but passes the
inputs to the network directly to the outputs.
The output layer does the processing within the
network and outputs the result.
These types of networks are simplistic and cannot
perform well on more complex problems
Single layer percceptrons generally use the
threshold activation function (input gt 0 fire 1
else fire -1).
Some Single layer perceptron networks work with
continuous output using activation functions such
as the sigmoid function.
Single layer perceptrons can only perform a
limited number of functions.

http//www.felinemenace.org
37
Artificial Neural Networks - Multi layer
perceptrons

Multi layer perceptrons have an input layer, one
or more hidden layers and an output layer.
The hidden and output layers perform processing
whilst the input layers provide input into the
network. The output layer still provides the
output for the network.
The sigmoid function is generally used as the
activation function.

http//www.felinemenace.org
38
Artificial Neural Networks - Topologies cont
http//www.felinemenace.org
39
Artificial Neural Networks - Topologies cont
http//www.felinemenace.org
40
Artificial Neural Networks - How do they learn?

After constructing a suitable network topology a
set of training input data as well as the
expected outputs are provided.
The weights within the network are initialized to
small random values (the network knows nothing).
The network is fed the training data and the
output vs the expected output is measured.
The weights are evolved as to minimize the degree
of error in the output.
Issues can arise when the network overfits the
data. In this case the network essentially
becomes a lookup-table of the training input and
output. The statistical nature of the data is not
learnt by the network and it will not perform
well on data outside of the training set.
The right amount of training must be performed.
This is sometimes difficult to predict.
The correct network topology must be chosen. This
is sometimes guesswork and experimentation.

http//www.felinemenace.org
41
Artificial Neural Networks - Applications

Anything requiring classification or pattern
matching.
IDS/IPS traffic classification?
Virus/Worm classification?
Identifying code constructs within binaries?

http//www.felinemenace.org
42
Intelligent Agents - Introduction

An agent is simply something that acts.
Agents distinguishes themselves from general
programs by
Running under autonomous control
Perceiving their environment
Adapting to changes within their environment
Acting Rationally

http//www.felinemenace.org
43
Intelligent Agents - Autonomy

In saying that intelligent agents are autonomous
we mean that they operate independently and
without user control.
An Intelligent Agent can make its own decisions
and work towards its own goals.
No (or very little) guidance required by an
operator.

http//www.felinemenace.org
44
Intelligent Agents - Rationality

What does it mean to act rationally?
For purpose of working with intelligent agents
rationality refers to an agents ability to
Act as to achieve the best outcome.
When there is lack of certainty achieve the best
expected outcome.

http//www.felinemenace.org
45
Intelligent Agents - Task Environment

The environment in which an agent operates
dictates its design, architecture and the types
of AI techniques it employs.
The most important aspects of an Agents task
environment are
The performance measure
The Environment
Actuators
Sensors

http//www.felinemenace.org
46
Intelligent Agents - Task Environment cont

There are obviously near-infinite task
environments that can agent can operate within.
There are various properties we can however
record that are relevant to each environment and
agent design.
Each environment is either
Fully or partially observable
Deterministic or stochastic
Episodic or sequential
Static or Dynamic
Discrete or Continuous
Single or multi agent

http//www.felinemenace.org
47
Intelligent Agents - Agent variations

There are many variations in agent software and
its design.
There are four main types that summarize most of
them
Simple reflexive agents
Model-based reflexive agents
Goal-based agents
Utility-based agents

http//www.felinemenace.org
48
Intelligent Agents - Agent variations cont

Simple reflexive agents
Receive a percept from the environment and
perform an appropriate action.
Constructed with condition-action rules in which
a change of state triggers and action.
Ignore previous percepts and actions.
Model based reflexive agents
Attempt to compensate for partial observance
within an environment by constructing and
internal model of their belief of the state of
the environment.
Model is generally constructed based upon percept
history (states that have been to present)
Goal-based agents
Take into account their performance and current
goals when decision making
Utility-based agents
Utility function that maps the current state into
a numerical measure of success.
Distinctly aware of how well they are performing
Utility functions can constructed based upon many
measures of performance. These can be hard or
soft goals for the agent.

http//www.felinemenace.org
49
Intelligent Agents - Learning

Agents that can learn themselves or that can be
taught are more desirable than having to
explicitly program an agent.
In order to learn we add two more elements
The learning element
The problem generator
The learning elements responsibility is to
assess how well the agent is performing and make
modifications to its actions (actuators).
The problem generators responsibility is to
suggest new actions for the agent that can result
in it experiencing new states (experiences) that
it can learn from

http//www.felinemenace.org
50
Intelligent Worms?

Thus far we have not seen intelligent worms.
Worms
Are not aware of their performance
Do not perceive their environment
Do not adapt their behavior
What if a worm
Perceive its environment - operating environment
and network
Adapted its behavior
Operated in a cooperative multi-agent fashion
Coordinated in a multi-agent fashion
Such a worm could
Intelligent select targets - DNS / Mail servers?
Modify its propagation parameters
Communicate its experiences with other instances
of itself
Worms do not necessarily have to perform evil
The Nachi/Welchia family of worms patched
otherwise vulnerable machines

http//www.felinemenace.org
51
Intelligent Worms?

Worms could also use genetic operations
Selection
Crossover
Mutation
Variations of a worm are unleashed into the wild
The worms perceive their environment and adapt
accordingly
The fittest worms survive and continue to
propagate
Worms locate their brethren and produce offspring
(cross-over)
Some of the offspring are subject to mutation
Many variants in the wild with different
propagation parameters - difficult to isolate