Title: Introduction to graph theory and applications
1Introduction to graph theory and applications
- Introduction and definitions
- Typical network modelling applications
- Classic graph theory problems and proofs
- The Seven Bridges of Königsburg
- The four colour map colouring theorem
- The three cottage problem
- Data structures used for storing graphs
- Incidence and adjancancy lists and matrixes
2Definitions
- Graph theory concerns the study of networks based
on a mathematical abstraction of the form of a
graph. A graph is made up of vertices (singular
vertex) and edges. An edge connects exactly 2
vertices. These are the same vertex in the
special case where the edge is called a loop or
loopback. - A vertex can have any number of edges connected
to it. - Edges might be directed, and drawn as an arrow to
indicate that network flow or traffic or the
connection represented by the edge works in one
direction only. In an undirected graph the edges
are bidirectional. Edges may be associated with a
numeric cost. The meaning of edge cost will
depend upon the graph application.
3Vertices and bidirectional edges
4More definitions
- A path is a route through a graph visiting
vertices and edges in turn. A cycle is a path
that ends at the starting vertex. A Hamiltonian
path or cycle visits all vertices exactly once. A
Eulerian path or cycle visits all edges exactly
once. - A graph is simple if no edge is a loop, and no 2
edges have the same endpoints. - A planar graph can be drawn on a plane surface,
e.g. a piece of paper, so that no edges cross
over other edges.
5Even more definitions ...
- A directed acyclic graph is a directed graph
without cycles, i.e. you can't get back to the
vertex where you started by following edges in
their defined direction. This is a way to map a
forest of trees so that subtrees can be shared
between trees. Sources are vertices all of whose
edges lead out from them and sinks are vertices
all of whose edges lead into them. A tree is a
graph which is connected, acyclic and simple.
6Terms to remember
- graph
- vertex (pl vertices)?
- edge
- edge cost
- undirected graph
- directed graph
- simple graph
- loop
- path
- cycle
- Hamiltonian
- Eulerian
- planar graph
- acyclic directed graph
- forest
- tree
7Typical Graph Applications 1
- Modelling a road network with vertexes as towns
and edge costs as distances. - Modelling a water supply network. A cost might
relate to current or a function of capacity and
length. As water flows in only 1 direction, from
higher to lower pressure connections or downhill,
such a network is inherently an acyclic directed
graph. - Modelling the recent contacts of someone who has
become ill with a notifiable illness, e.g. Sars
or Meningitis. Edge costs might be a function of
the probability that the contact resulted in an
infection.
8Typical Graph Applications 2
- Dynamically modelling the status of a set of
routes by which traffic might be directed over
the Internet. - Modelling the connections between a number of
potential witnesses or suspects who were reported
or came forward as having been within the
vicinity of a serious crime within an hour of of
when it occurred. - Minimising the cost and time taken for air travel
when direct flights don't exist between starting
and ending airports. - Using a directed graph to map the links between
pages within a website and to analyse ease of
navigation between different parts of the site.
9Classic Graph Theory Problems
- Graph theory started from a mathematical
curiosity. - "The Seven Bridges of Königsberg is a problem
inspired by an actual place and situation. The
city of Kaliningrad, Russia (at the time,
Königsberg, Germany) is set on the Pregolya
River, and included two large islands which were
connected to each other and the mainland by seven
bridges. The question is whether it is possible
to walk with a route that crosses each bridge
exactly once, and return to the starting point.
In 1736, Leonhard Euler proved that it was not
possible." - Source http//en.wikipedia.org/wiki/Seven_Bridges
_of_Koenigsberg
10Seven Bridges of Königsberg 2
- "In proving the result, Euler formulated the
problem in terms of graph theory, by abstracting
the case of Königsberg -- first, by eliminating
all features except the landmasses and the
bridges connecting them second, by replacing
each landmass with a dot, called a vertex or
node, and each bridge with a line, called an edge
or link. The resulting mathematical structure is
called a graph."
11Seven Bridges of Königsberg 3
- "The shape of a graph may be distorted in any way
without changing the graph itself, so long as the
links between nodes are unchanged. It does not
matter whether the links are straight or curved,
or whether one node is to the left of another. - Euler realized that the problem could be solved
in terms of the degrees of the nodes. The degree
of a node is the number of edges touching it in
the Königsberg bridge graph, three nodes have
degree 3 and one has degree 5. Euler proved that
a circuit of the desired form is possible if and
only if there are no nodes of odd degree. Such a
walk is called an Eulerian circuit or an Euler
tour. Since the graph corresponding to Königsberg
has four nodes of odd degree, it cannot have an
Eulerian circuit."
12Seven Bridges of Königsberg 4
- "The problem can be modified to ask for a path
that traverses all bridges but does not have the
same starting and ending point. Such a walk is
called an Eulerian trail or Euler walk. Such a
path exists if and only if the graph has exactly
two nodes of odd degree, those nodes being the
starting and ending points. (So this too was
impossible for the seven bridges of Königsberg.)"
13The four colour theorem
- This theorem states that given any set of areas
on a planar surface, e.g. representing political
regions on a map, it is possible to colour every
area so that no 2 regions sharing a border need
use the same colour if 4 colours are used. - Sharing a border means a linear border between 2
areas of more than zero length e.g. many regions
can meet at a point but this doesn't count as a
border. Also an area within the set must be
single and contiguous.
14The four colour theorem 2
- Political geographic entities such as countries
sometimes break this requirement by having
enclaves, e.g. the USA includes Alaska which is
not part of the largest geographically contiguous
region of the USA. Old maps of English and Welsh
counties split the county of Flint into 2
seperate areas. - This theorem comes from a question (conjecture)
asked by a student Francis Guthrie of his maths
lecturer Augustus De Morgan in 1852. It was
finally proved in 1976.
15Graph theory presentation of the theorem
- "To formally state the theorem, it is easiest to
rephrase it in graph theory. It then states that
the vertices of every planar graph can be colored
with at most four colors so that no two adjacent
vertices receive the same color. Or "every planar
graph is four-colorable" for short. Here, every
region of the map is replaced by a vertex of the
graph, and two vertices are connected by an edge
if and only if the two regions share a border
segment (not just a corner)" - Source http//en.wikipedia.org/wiki/Four-color_th
eorem
16Three cottage problemSource http//en.wikipedia
.org/wiki/Three_cottage_problem
- "The three cottage problem is a well-known
mathematical puzzle. It can be stated like this - Suppose there are three cottages on a plane (or
sphere) and each needs to be connected to the
gas, water, and electric companies. Is there a
way to do so without any of the lines crossing
each other?"
17Three cottage problem 2
- "The problem is part of the mathematical field of
topological graph theory which studies the
embedding of graphs on surfaces. In more formal
graph theoretic terms the problem asks whether
the complete bipartite graph K3,3 is planar.
Kazimierz Kuratowski proved in 1930 that K3,3 is
nonplanar, and thus that the three cottage
problem has no solution. - But K3,3 is toroidal, that is it can be embedded
on the torus. In terms of the three cottage
problem this means the problem can be solved by
punching a hole through the plane (or the
sphere). This changes the topological properties
of the surface and using the hole we can connect
the three cottages without crossing lines."
18Three cottage problem 3
- From this description note that the term
"complete bipartite graph" means a graph with 2
sets of vertices where every vertex in one set is
connected by an edge to every vertex in the other
set. The designation K3,3 for this graph
indicates that there are 3 vertices in each of
the 2 sets, representing utilities and cottages
respectively. K5 is diagrammed on the right.
19Three cottage problem 4
- Kuratowski's proof is based on a process
involving simplification of complex graphs into
simpler graphs by removing vertices and edges in
certain situations and finding instances of one
of the 2 simplest non planar graphs as subsets of
the more complex graph. The other simplest
non-planar graph is a complete graph known as K5,
because it has a single set of 5 vertices with
each vertex connected to every other vertices.
20Data structures used for storing graphs
- Before choosing one of various options or
designing a customised approach, programmers need
to consider application data, and the means by
which this will be attached to vertices and
edges. Records will need to be attached either to
vertices, or to edges, or one type of record
might need attachment to vertices and another to
edges. The choice and design of data structure(s)
will influence and be influenced by program
performance and memory usage requirements.
21Incidence list
- The connectivity of an edge is regular in that
each represents a connection between exactly 2
vertices. This lends itself to fixed-size record
designs, e.g. an array of records known as an
incidence list. - The coding in the next slide uses pointers to the
vertex records, as a vertex may appear in any
number of edge records. Alternatively, if the
vertices are stored in an array which is
populated before the edge records are created,
and unchanged during the runtime of the
application, the edge records could store the
integer array indices. Which approach is optimal
depends upon how vertex records are stored and
how this collection of data is searched and
updated.
22Incidence list structure coding
23Adjacency list
- The connectivity of a vertex is irregular, in the
sense each can be connected using a variable
number of edges. Vertex information can be stored
in a fixed width record containing the head
pointer of a linked list. Each linked list item
will represent an edge. The latter will contains
references (e.g. array indices, keys or pointers)
to edge record storage so this isn't duplicated.
If this isn't done within the edge records
themselves, the linked list nodes must also
reference other vertices connected via the edges.
This approach is known as an adjacency list.
24Adjacency list structure coding
25Adjacency list coding comments
- The above structure is better suited to a
directed graph than an undirected graph. A link
from a webpage to another page (i.e. a directed
edge as in the above example ) does not require a
link in the opposite direction. Using this kind
of approach for an undirected graph would either
lead to duplication of information, in the sense
that each edge would have to be stored and
updated twice, or added complexity of path access
and traversal. Edge costs, if required, could be
stored as an additional field within the url_list
structure, for which each node in the list is an
edge. A reason for having edge costs in this
example might be if not all links between pages
are equally easy for a user to find.
26Incidence matrix
- This structure is a 2 dimensional array where the
rows index the edges, and columns index the
vertices. In the simplest implementation the
array element is boolean, with a 1 indicating a
connection and a 0 indicating no connection. The
advantage of this structure is that it enables
data to be rapidly indexed, either for vertices
or for edges. The disadvantage is that memory
proportional to V x E is required, where V is the
number of vertices, and E the number of edges.
List or array based structures require memory
proportional to V E.
27Adjacency matrix
- This is a V x V 2D array where V is the number of
vertices. In the simplest case, if the data
present at index ij is a 1 this indicates a
(possibly directed) edge from i to j, while a 0
indicates there being no such edge. If the edges
have costs, this could be a floating point number
present at index ij, with a sentinel value,
e.g. 0 or -1, indicating lack of an edge from
vertex i to vertex j. This data structure makes
it easier to search for subgraphs of particular
patterns or identities, e.g. K3,3 or K5.
28Conclusions
- Graph theory enables us to study and model
networks and solve some difficult problems
inherently capable of being modelled using
networks. - Various terms e.g. vertex and edge, are
associated with graph theory which gives these
terms special meanings. These meanings need to be
understood and remembered in order to apply graph
theoretic approaches to solving problems. - When solving a problem by developing a
graph-based program, careful attention must be
given at the design stage to the structuring of
data to help make solving the problem tractable,
to enable linkages to be traced efficiently and
to avoid duplication of data.