Title: Strings
1Strings
- Basic data type in computational biology
- A string is an ordered succession of characters
or symbols from a finite set called an alphabet - Sequence is synonymous with string
- s AATGCA
- Length, s 6, s1 A
- Empty string
2Strings
- Substring t is a string from consecutive
characters of the parent s - Superstring s is parents string of substring t
- si,j indicates characters from string s between
indices i and j. - Concatenation of two strings is st
- prefix and suffix
3Graphs
- A graph consists of two sets
- V the set of nodes or vertices
- E the set of edges (pair of vertices)
- G(V,E)
- Simple graph No loops
- Directed Graphs Directed Edges
- valence (in and out degree of vertex)
- Weighted Graphs
4(No Transcript)
5- Connectedness
- Cycles No edge repeated and return to start
- Acyclic no cycles
- Complete Every possible edge
- Bipartite Separated into two disjoint subsets
- Tree acyclic and connected graph (root, leaves)
- Interval Graphs Collection of intervals of real
line with edge if intersection nonempty
6Graph Problems
- Hamiltonian Cycle with every vertex on it
- Eulerian Every edge in cycle but only once
- Coloring Minimum number of colors so that no two
adjacent vertices have same color - Matching Subset of edges such that no two edges
in M share an endpoint - Adjacency Matrix
7Finite Automata
- A Finite Collection of States Q
- A finite alphabet E of input signals
- A function d which for every possible combination
of current state and input determines a new
state. - Two special states, Initial and Final or
Accepting state.
8- The FA accepts any sequence of symbols that puts
it in an accepting state - The set of all such sequences is the language of
the automaton
Input
Accept
Reset
9State Transition Diagram
?
?
?
?
0
0
1
1
4
1
0
1
2
3
0
1
0
1
0
?
1
5
0
?
10Regular Expressions
- 01(001)01
- Language accepted by a FA
- Pumping Lemma If L is a regular language, then
there is a constant n such that for each word W
in L with length gt n, there are words X, Y, Z
such that WXYZ, length of XY lt n, length of Y
gt1, and XYkZ is in L for k integer.
11Used to tell when a language is not in a
particular class Let L be language of all
palindromes over a,b. Abbababba (symmetric
about midpoint) Is L regular? W anban
(definition of palindrome) WXYZ, XY an,
Zban WXY2Zamban in L by pumping lemma, mgtn W
not in L, not a palindrome, L not regular
12Chomsky Hierarchy
13Turing Machine
Read/Write
01001011101101101010100011110101011010001011101001
0101010111101010001010101101
Start
Reset
14- Turing machine M
- x is a string over Ms alphabet E
- R/W head over leftmost symbol in x, M in start
state - R/W communicates symbol on tape to control
mechnisim in box - M can read symbol, replace symbol, move tape to
right or left onecell at a time - If M halts (final state), string y on the tape
is Ms output corresponding to input x - Doesnt necessarily halt for every x
- Computes partial function f E----gtE
- M is same thing as its program, which is a set
of quintuples - (q, s, q, s, d) where q is current state, s is
current symbol, qis next state, s is symbol to
be written, and d is direction to move - Ms compute a particular class of functions over
intergers called partial recursive functions
15Church-Turing Thesis
- All notions of effective computability are
equivalent. - Therefore, all computers are created equal.
- Other schemes Lambda calculus, General Recursive
Functions, etc...
16Universal Turing Machine
- Fixed Program in Finite Control
- Program reads description of Turing Machine from
one tape and simulates its behavior on another
tape (two tapes) - Universal Machine U, Machine to be simulate T
17- Fixed program for U is like an interpreter
- Tape 1 contains quintuples defining T
- Tape 2 intially blank. Same output as T here
- Given Ts current state and input symbol, find
thequintuple (q, s, q, s, d) in the
description of T that applies - Record the new state q, write the new symbol s
ontape 2, move in direction d, read new symbol
on tape 2, andrecord it beside q
18Halting Problem
- What is not effectively computable?
- It the a TM, M, that does the following
- Given an arbitrary TM, T, as input, and an
equally arbitrary tape, t, decide whether T halts
on t - Equivalent to does T accept t
- Undecidable
19Diagonalization
20Diagonal Set _ X X _ X _ Its Complement X _ _ X
_ X The complement of the diagonal is different
for every row. Can be extended to infinite
sets. Used to show that there are languages that
are not acceptable by TM. Therefore, there can
be no TM that decides that decides whether
arbitrary strings are accepted by arbitrary
Turing Machines. Since we canrepresent TM by
strings, after some work, it follows that there
can be no TM that decides halting
problems. Therefore, there are problems that
admit no algorithmic solution.
21Complexity Classes
- P efficient algorithms
- NP no efficient algorithms found
- Check solution in polynomial time
- Transform any NP (P is subset) to NP-complete in
polynomial time - P NP ???
22Satisfiability (SAT)
- Boolean Expression
- (x1x3x4)(x1x2x4)(x2x3)(x1x2x4)
- What combination of variable values (0,1) makes
statement true or false (1,0) - 2n combinations
- Decision problem Is formula satisfiable?
23NP-complete
- NP Nondeterministic Polynomial Time
- 1970, Cook found way to transform every problem
in NP to a single, complete problem
(satisfiability). - Transform in polynomial time
- Instance of one problem has solution if and only
if instance of other problem does - Solve any instance of any problem equivalent to
solving some instance of SAT
24NP-Complete
- P and NP are decision problems (answer yes or no)
- Optimization problems (minimize or maximize an
objective function) - NP-hard
- As least as hard as NP-complete decision problem
25What to do?
- Solve efficiently or prove NP-complete
- X In NP? Check solution in polynomial time
- Known NP-complete Y to X Solve X in P then solve
Y in P - Solve on specific, easier instances
- Exhaustive search
- Approximate in polynomial time
- Heuristics
- Quantum Computer