Title: Computational Scientific Discovery and the Robot Scientist
1- Computational Scientific Discovery and the Robot
Scientist - Philosophy of Science Hypothetico-deductive
theory of Scientific - Discovery
- Robot Scientist is automatic discovery method
currently applied to - functional genomics task in yeast
microlobiology - Yeast is model organism completely sequenced
genome - gt 6000 genes - c40 have unknown function
- Robot Scientist can automatically carry out
auxotrophic experiments using - a laboratory robot (basically a great big mouth
that sucks and squirts - liquids)
- RS can interpret results of experiments and
suggest hypotheses that - explain the observations
- RS can select design experiments to
confirm/refute competing - hypotheses
2(Intelligent) Experiment Selection (Machine
Learning
Hypothetico-deductive Science and the Robot
Scientist
Final (reasonably good ?) Theory
Abductive / Inductive Inference Imagination
Machine Learning
Hypotheses/theory
Consistent or Inconsistent ?
experiments
Rubbish Bin
observations
Deductive inference
3Genes, Enzymes and Biochemical Networks
Gene (vaguely) protein coding region of
Chromosome DNA molecule compressed
instructions for building organisms true for
viruses, yeast, bacteria, animals, plants,
humans Base Sequences of DNA code for 20 amino
acids direct translation from DNA sequence to
Amino Acid sequence in protein protein folds into
3-D molecule active regions often have affinity
for other molecules bringing reactive
compounds into close proximity allowing
(metabolic) reactions to take place (catalysis)
D-Glucose 1 phosphate
alpha-D-Glucose 6 - phosphate
YKL127W YMR105C
(Phosphoglcomutase)
4Auxotrophic Experiments to determine gene
function Create Deletion mutant yeast
strain with a gene removed from the
genome Mutant strain struggles to grow on
minimal growth medium essential biological
function no longer performed Add various
biochemical molecules to growth medium if the
yeast cell can grow properly again, then there is
some relationship between the deleted gene and
the added chemical Biochemical networks can be
abstracted as connected graphs (compounds
nodes, reactions edges) Growth recovery may be
explained by many competing hypotheses (i.e.
many reactions may be catalysed by the deleted
gene product)
5(No Transcript)
6(Supervised) Machine Learning Induce (Learn)
Classifier / Theory / Equation system
Classifier
Determine performance Of classifier on
unseen observations
Determine rules/trees/equations that correctly
explain training set (search limit hypothesis
space defined by grammar)
Training Set (seen observations)
Test Set (unseen observations)
7 Logical
Inference Deduction is sound in the logic we use
(first-order predicate logic). Informally, this
means that if the rule and fact used in inference
of the above form are true, then the inferred
fact must also be true. However, abduction is
generally not sound thus, in the abduction
example, there could be many other reasons why
the cell cannot grow. Despite this, abduction is
required to infer new scientific
knowledge Deduction Rule If a cell grows then
it can synthesize tryptophan. Fact Cell cannot
synthesize tryptophan ? Cell cannot
grow. Given the rule P ? Q, and the fact ?Q,
infer the fact ?P (deduction - modus
tollens) Abduction Rule If a cell grows then
it can synthesize tryptophan. Fact Cell cannot
grow. ? Cell cannot synthesize
tryptophan. Given the rule P ? Q, and the fact
?P, infer the fact ?Q (abduction) Induction Fact
Cell
Grows Background Cell can
sythesise Tryptophan Rule (?)
If a cell grows then it can synthesise
tryptophan Given Fact P and
background knowledge Q infer the rule P Q
8Experiment Details
In Silica Experiments
8 Genes, all processed at once - YBR166C,
YDR007W, YDR035W, YDR354W, YER090W
YGL026C,YKL211C, YNL316C 9 metabolites - single
double permutations of C000074, C00078, C00079,
C00082, C00108,
C00166, C00463, C00493, C01179 45 possible
experiments 5 experiment days (simulated) Execut
ion time c24 hrs.
Robot Experiments
Experiments given to robot in two arms first 4
genes 1 day, remaining 4 next day 5 experiment
days c 30 hrs for each experiment - 6 hrs
creation, 24 hrs incubation gt 2 weeks execution
time
9Query Answers
Query Concerning Gene Function
Bio-Logical Intelligent Database
Genome wide Prediction of Yeast Gene Functions
Suggested Answers
Combination of ILP Decision tree Learning
Sequence statistics
secondary structures
Protein Homology
Robot Scientist
Yeast Cell Model (mostly metabolism)
Update
Underlying database of current knowledge for
yeast Known genes enzymes, sequences, protein
interactions, reactions,
10- Computer Models Simulation Identification
- Abstraction of knowledge from real world scale
of fidelity and computational - Intractibility concerned with synergistic
interactions of components opposite - and complementary to reductionist science
- Common in many diverse fields
- Climate Change, Population Ecology, Economics,
Systems Biology, Engineering - Many underlying Mathematical knowledge
representations - Partial/Ordinary Differential Equations, First
Order Logic, Graph theory - regression/polynomial equations matrices etc
- Used as Background knowledge for Scientific
Discovery tasks - Theory revision performed to alter existing
models so better fit to real world - observations (experimental results)
identifying and correcting inaccuracies - And extending model to unobserved phenomena
uses Machine Learning ILP - Equation Discovery, Genetic programming (GAs) etc