Title: Automated Experimentation and Computational Thinking
1Automated Experimentation and Computational
Thinking
2Technologies and Methods
- Formal process languages allow rapid replication
and enactment of experimental designs. - Service and grid architectures permit
experimental data to be shared in large volume. - Roboticised laboratory systems permit fast
translation of experiments to the physical world. - Verification methods allow experimental protocols
to be more rigorously tested. - Systems are available for maintaining
experimental context, such as provenance of data. - Systems exist to chart broader requirements and
argumentation surrounding experiments.
3Protocols (Example in LCC)
A requester will ask about something from
an informer then get an answer from it then
continue as a requester An informer will
be asked by a requester then should tell the
requester if it knows
a(requester, A) ask(X) gt a(informer, B)
? query_from(X, B) then tell(X)
lt a(informer, B) then a(requester,
A) a(informer, B) ask(X) lt a(requester,
A) then tell(X) gt a(requester, A) ?
know(X)
- Variables begin upper case.
- Constants begin lower case.
- Variables are local to a clause.
- Data structures as in Prolog.
- Principal operators
- Messages lt, gt
- Conditional ?
- Sequence then
- Committed choice or
4Conventional Experiment
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
For every individual, S, we can predict their
response, Y, to any setting of the
independent variable, X.
?S
Y
X
Generalisation
Refutation
Observations
We can observe the effect of X on Y for some
subset of the individuals, selected from the
population under controlled conditions.
f(s1, x1) y1 ? c1(s1) c2(s1)
f(s2, x2) y2 ? c2(s1) c3(s1) c4(s1)
f(s2, x2) y2 ? c1(s2) c2(s2)
5Traditional problem (1)Experiment repeatability
Experiment cannot be repeated
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Time consuming analysis
Generalisation
Observations
f(s1, x1) y1 ? c1(s1) c2(s1)
f(s2, x2) y2 ? c2(s1) c3(s1) c4(s1)
f(s2, x2) y2 ? c1(s2) c2(s2)
6Role of Automation (1)Executable specification
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Fast analysis
Interpret
Generalisation
Experiment model
Observations
f(s1, x1) y1 ? c1(s1) c2(s1)
f(s2, x2) y2 ? c2(s1) c3(s1) c4(s1)
f(s2, x2) y2 ? c1(s2) c2(s2)
7Example Protocols as Experiment Design
Replicate someone elses experiment on my data
sets
You can be a scientist for an experiment if
you ask a data finder for sky area data then
you acquire data sources from a database
extractor then you analyse the results if
your goal is achieved or you continue as
a scientist with a revised experiment You can be
a data finder if you are asked for sky area
data by a scientist then you send a data
request for to a database extractor
a(scientist(De), S) sky_area_data gt
a(data_finder, D) then data_sources(So) lt
a(database_extractor(S), E) then (
a(analyser(So), S) ? goal_achieved(So) or
a(scientist(NewDe), S) ? revise_description(De,
NewDe) ) a(data_finder, D) sky_area_data
lt a(scientist(De), S) then data_request(De,
DD) gt a(database_extractor(S), E) ? match(De,
DD) a(database_extractor(S), E)
data_request(De DD) lt a(data_finder, D) then
data_negotiator(DD, Sources), E) then
data_sources(So) gt a(scientist(De),
S) a(data_negotiator(DD, So), E)
setup_sources(DD) gt a(storage_utility, U) then
sources_set_up(DD, So) lt a(storage_utility, U)
De Description So Set of sources DD Data
descriptors
You can be a database extractor for a scientist
if you receive a request for from a data
finder then you negotiate to obtain your data
sources then you inform the scientist of the
data sources You can be a negotiator to obtain
data sources if you ask a storage utility to
set up data sources then it confirms which
sources are set up
8Traditional problem (2)Over-generalisation
Model is never extended
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Time consuming analysis
Generalisation
Observations
f(s1, x1) y1 ? c1(s1) c2(s1)
f(s2, x2) y2 ? c2(s1) c3(s1) c4(s1)
f(s2, x2) y2 ? c1(s2) c2(s2)
9Role of automation (2)Progressive generalisation
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Rapidly repeatable analysis
Interpret
Generalisation
Experiment model
Adapt
Observations
f(s1, x1) y1 ? c1(s1) c2(s1)
f(s2, x2) y2 ? c2(s1) c3(s1) c4(s1)
f(s2, x2) y2 ? c1(s2) c2(s2)
10Example Protocols as Parameterised Experiments
If the experiment works then do it again on a
larger data segment
a(scientist(Protocol), S) null ?
completed(Protocol) or ( Protocol
then a(scientist(NewProtocol), S) ?
reparameterise(Protocol, NewProtocol)
)
You can be a scientist working with a protocol
if the protocol is completed or
you follow the protocol then become a
scientist with a new protocol if the
old can be adapted to the new
De Description So Set of sources DD Data
descriptors
11Traditional problem (3)Lack of replication
Model is never independently tested
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Time consuming reconstruction
Refutation
Observations
f(s1, x1) y1 ? c1(s1) c2(s1)
f(s2, x2) y2 ? c2(s1) c3(s1) c4(s1)
f(s2, x2) y2 ? c1(s2) c2(s2)
12Role of Automation (3)Replication of Experiments
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Interpret
Refutation
Experiment model
Observations
f(s1, x1) y1 ? c1(s1) c2(s1)
f(s2, x2) y2 ? c2(s1) c3(s1) c4(s1)
f(s2, x2) y2 ? c1(s2) c2(s2)
13Example Protocols as Experiment Replication
Poll others to see if they can replicate an
experimental result Ive obtained.
a(scientist(Protocol), S) Protocol
then replicate gt a(replicator, R) then
result(Res) lt a(replicator, R)
a(replicator, R) replicate lt
a(scientist(Protocol), S) then Protocol
then result(Res) gt a(scientist(Protocol),
S)
You can be a scientist with a protocol if
you follow the protocol then ask a peer to
perform replication then receive the
result from the replicator You can be a
replicator for a protocol if you are asked
to replicate it then you follow that
protocol then you send the result to the
scientist
14Traditional problem (4)Distributed data
Model is disconnected from original data
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Generalisation
Data acquisition
Observations
f(s1, x1) y1
f(s2, x2) y2
f(s2, x2) y2
15Role of Automation (4)Embedding curation in the
experiment
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Generalisation
Experiment model
Data acquisition
Data collection/curation model
Observations
f(s1, x1) y1
f(s2, x2) y2
f(s2, x2) y2
16Example Data Curation
Scientists
Service invocation
Curated database services
Size of community database proportional to Number
of curators
Volume of useful acquired data proportional to
Database access rate
Faced with the avalanche of genomic sequences
and data on messenger RNA expression, biological
scientists are confronting a frightening
prospect piles of information but only flakes of
knowledge. How can the thousands of sequences
being determined and deposited, and the thousands
of expression profiles being generated by the new
array methods, be synthesised into useful
knowledge?''
Eisenberg et.al. 2000, Protein Function in the
Post-genomic Era'', Nature Vol.405
17Example Yeast Protein Data
You can be a data collator for a sequence,
seeking the best matches of you can filter
the results from polling your peers for their
best matches You can poll a set of your peers
for their best results if you become a data
seeker for the first of these peers and the
matches from that peer are merged with the
matches you get from polling the rest of
the set of peers or if the set of peers
is empty you have no matches You can be a data
seeker asking a peer for a set of matches if
you send a message to that peer asking it to be a
data source then filter the matches it sends
back to you in its response
a(data_collator(Seq,Best), C)
filter_results(Seq,Results,Best) ?
a(poller(Seq,Peers,Results), C) ?
sources(Peers) a(poller(Seq,Peers,Results), C)
( a(data_seeker(Seq,D,Matches), C) ?
Peers DRestPeers and
Results r(D,Matches)RestResults then
a(poller(Seq,RestPeers,RestResults), C) ) or
null ? Peers and Results
. a(data_seeker(Seq,D,Matches), S)
query(Seq) gt a(data_source, D) then
filter_matches(Seq,Results,Matches) ?
matched(Results) lt a(data_source, D).
sharing
consistency checking
SWISS
SAM
ModBase
18Traditional problem (5)Distributed
sub-experiments
Model lacks key structure
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Generalisation
Experiment model
sub-model
Observations
19Role of Automation (5)Embedding model synthesis
in experiment
Predictive model
?S?subjects, X?independent-vars f(S, X) Y ?
ci(S) cj(S) f(S, X) Y ? ck(S)
cl(S)
?S
Y
X
Generalisation
Experiment model
Synthesis
Synthesis model
Observations
sub-model
20Example Distributed synthesis
Generates a set of models from the network that
might provide a given set of output attributes
modeller
request(Output)
repository
generator
available(Output,Model,Peer,Inputs)
relay(Peers)
Talks with repositories of models to find out
where a model might be located for a given output
attribute
Stores information about local models and may
know of other related repositories.
21Example Synthesis model
a(modeller(Os,Ms), X) ( a(generator(Ps,O,M,P
,Is), X) ? Os OROs and peers(Ps) then
a(modeller(NOs,RM), X) ? Ms m(O,M,P)RM and
merge(Is,ROs,NOs) ) or null ? Os and Ms
. a(repository, X) request(O) lt
a(generator(_,_,_,_,_), Y) then (
available(O,M,X,Is) gt a(generator(_,_,_,_,_),Y)
? a_model(O, M, Is) or relay(Ps) gt
a(generator(_,_,_,_,_),Y) ?
not(a_model(O,_,_)) and peers(Ps) ) then
a(repository, X). a(generator(Ps,O,M,P,Is), X)
request(O) gt a(repository, Y) ? Ps
Y_ then ( available(O,M,P,Is) lt
a(repository, Y) or ( relay(PPs) lt
a(repository, Y) then a(generator(NPs,O,M,
P,Is), X) ? Ps _RPs and
merge(PPs,RPs,NPs) ) ).
22Example Interaction
model(a1, m1, a3) peers(p4)
Model for a1,a2
repository p2
m(a1, m1, p2, a3)
m(a3, m3, p3, )
request(a1)
m(a2, m2, p5, )
available(a1,m1,p2,a3)
model(a3, m3, ) peers(p5)
request(a3)
modeller p1
relay(p4)
repository p3
request(a2)
relay(p4)
generator
request(a3)
available(a3,m3,p3,)
request(a3)
relay()
peers(p2,p3)
repository p4
request(a2)
relay()
available(a2,m2,p5,)
request(a2)
repository p5
model(a2, m2, )
23Traditional problem (6)Peer review
Peer review of competing models is not systematic
24Role of Automation (6)Large scale systematic
reviewing
Reviewing
Review model
25Old Problems New Setting
- variables are too numerous or complex to control
- the experiment is not extensive enough to be
convincing - generalization from specific experiments is
unjustified
Note that these are not old fashioned problems
of traditional experiments. They occur also in
computational experiments (e.g. the issue of
reproducability of results derived from Web
service orchestration).
26Computational Thinking Changes the Nature of
Experiments
- Rapid synthesis and verification of experimental
designs allow more complex experiments to be
built and deployed while controlling experimental
complexity. - By bringing experiments closer to their subjects
on a large scale, for example by establishing
living laboratories within natural systems,
more extensive experiments can be obtained by
taking advantage of the scale of current
computational infrastructure. - Generalisation problems can be controlled by
augmenting results with meta-data on provenance
and argumentation or they can be avoided by rapid
re-creation of experiments to cover new
situations as they arise.