Title: BioMAS: A Multi-Agent System for Automated Genomic Annotation
1BioMAS A Multi-Agent System for Automated
Genomic Annotation
- Keith Decker
- Department of Computer and Information
SciencesUniversity of Delaware
Salim Khan, Ravi Makkena, Gang Situ Computer
Information Sciences
Dr. Carl Schmidt, Heebal Kim Animal Food
Sciences
2Outline
- General class of problems and MAS solution
approach - BioMAS Automated Genomic Annotation
- HVDB HerpesVirus Database
- ChickDB Gallus Gallus Database
- GOFigure!
- CoPrDom
- Signal Transduction Pathway Discovery
3What problems are we addressing?
- Huge, dynamic Primary Source Databases
- Highly distributed, overlapping
- Heterogeneous content, structure, curation
- Multitude of analysis algorithms
- Different interfaces, output formats
- Create contingent process plans chaining many
analyses together - Individual PIs, working on non-model organisms
- Learn, then hand-navigate sea of DBs and analysis
tools - Easily overwhelmed by new sequence and EST data
- Struggle to make results available usefully to
others
4Approach Multi-Agent Information Gathering
- Software agents for information retrieval,
filtering, integration, analysis, and display - Embody heterogeneous database technology
(wrappers, mediators, ) - Deal with dynamic data and changing data sources
- Efficient and robust distributed computation (for
both info retrieval and analysis) - Deal with issues of data organization and
ownership - Natural approach to providing integrated
information - To humans via web
- To other agents via semantic markup XML/OIL/DAML
5Example Multi-Agent System for Automated
Herpesvirus Annotation
- Input raw sequence data
- Output an annotated database that allows fairly
complex queries - BLAST homologs
- Motifs
- Protein domains Prodomain records
- PSORT sub-cellular location predictions
- GO Gene Ontology electronic annotation
- Show me all the genes in Mareks Disease virus
with a tyrosine phosphorylation motif and a
transmembrane domain value 2
6(No Transcript)
7(No Transcript)
8(No Transcript)
9How does this help?
- Automates collection of information from various
primary source databases - If the info changes, can be updated
automatically. PI can be notified. - Allows various analyses to be done automatically
- Can encode complex (contingent) sequences of info
retrieval and linked analyses, report interesting
results only - New data sources, annotation, analyses can be
applied as they are developed, automatically
(open system) - Made available on internet to others, or private
data - Much more sophisticated queries than keyword
search - Dynamic menu of keys
- Concept hierarchies (ontology) allow more
concise queries - Query planning (e.g., time, resource usage)
- Can search across multiple databases (i.e., from
other researchers)
10How does it work?
RETSINA-style Multi-Agent Organization
11DECAF A multi-agent system toolkit
- Focus on programming agents, not designing
internal architecture - Programming at the multi-agent level
- Value-added architecture
- Support for persistent, flexible, robust actions
12DECAF
- Focus on programming agents, not designing
internal architecture - Avoiding the API approach
- DECAF as agent operating system, programmers
have strictly limited access - Communication, planning, scheduling,
coordination, execution - Graphical dataflow plan editor
13DECAF
- Programming at the multi-agent level
- Standardized, domain-independent, reusable
middle agents - Agent Name Server (white pages)
- Matchmaker (yellow pages/directory service)
- Brokers (managers)
- Information extraction (learning STALKER
knowledgebase PARKA) - Proxy (web interfaces)
- Agent Management Agent (debugging, demos,
external control) - Note heterogeneous architectures are OK!
14DECAF
- Value-added architecture
- Taking care of details (social/individual)
- ANS registration/dereg (eventually MM)
- Standard behaviors (AMA, error, FIPA, libraries)
- Message dispatching (ontology, conversation)
- Coordination (GPGP)
- Efficient use of computational resources
- Highly threaded internally domain actions
- Memory efficient (ran systems for weeks, hundreds
of thousands of messages)
15DECAF
- Support for persistent, flexible, robust actions
- HTN-style programming
- Task alternatives and contingencies
- RETSINA-style dataflow
- Provisions/Parameters determine task activation
- Multiple outcomes, Loops
- TÆMS-style task network annotations
- Dynamic overall utility Quality, cost, duration
task characteristics - Explicit representation of non-local tasks
- Example Time/Quality tradeoff
16DECAF Architecture
Incoming KQML/FIPA messages
Plan file
Incoming Message Queue
Objectives Queue
Task Queue
Agenda Queue
Agent Initialization
Dispatcher
Planner
Scheduler
Executor
Pending Action Queue
Task Templates Hash Table
Action Results Queue
concurrent
Domain Facts and Beliefs
Outgoing KQML/FIPA messages
17Plan Editor
18Expanding the Genomic Annotation System
19Functional Annotation Suborganization
Gene Ontology Consortium www.geneontology.org
Biological process Molecular Function
Cellular Component
20(No Transcript)
21(No Transcript)
22Co-present Domain Networks (CoPrDom)
- Proteins can be viewed as conserved sets of
domains - Vertex domain, edge co-present in some
protein, edge weight of proteins co-present
in - Network constructed from InterPro domain markup
of proteins in 10 species (human, drosophila, c.
elegans, s. cerevisiae among them) - Functional characterization via InterPro to GO
mapping - Network constructed per organism per functional
group, eg apoptosis regulation in human
23(No Transcript)
24Uses for COPRDOM
- Functional characterization of unknown domains
- Identification of core domains/groups in a
functional group - Tracking domain evolution through species
evolution - Predicting protein-protein interaction by
identifying evolutionary merging of domain groups
25Biological Pathway Discovery thru AI Planning
Techniques
- AI planning is a computational method to develop
complex plans of action using the representation
of the initial states, the actions which
manipulate these states to achieve the goal
states specified. - Initial States The initial state representation
of objects in the "plan world" - Actions Logical descriptions of preconditions
and effects - Goals The end states desired
- HTN (Hierarchical Task Network) Planning proceeds
by task decompostion of networks, and a
successful is one that satisfies a task network.
26Uses of the Signal Transduction Planner
- To produce computer interpretable plans capturing
relevant qualitative information regarding signal
transduction pathways. - To produce testable hypotheses regarding gaps in
knowledge of the pathway, and drive future signal
transduction research in an ordered manner. - To identify key nodes where many pathways are
regulated by a node with only 1 functional
protein serving as a critical checkpoint. - To perform in silico experiments of hyper
expression and deletion mutation. - To enable pathway vizualization tools by
providing human- and machine-readable pathway
description.
27Advantages of Planning
- Operator schema Abstracted axiomatic definitions
of sub-cellular processes, understandable to
human computer - Task abstraction Decomposition of complex task
into simpler, interchangeable actions. - Reduces search space, conflicts
- Modeling of pathways at different levels of
biochemical detail - Search conducted in Plan Space Most planners
perform bi-directional search (vs. Pathway Tools,
Prolog implementations, etc.) - Partial-order Planning Succinct representation
of multiple pathways helps identify key causal
relationships
28Advantages of Planning (contd.)
- Conditional effects can be used to model special
cases ("exceptions") when applying operator
schema - Resource Utilization can be used to model
quantitative aspects such as amplification of a
signal, feedback and feed-forward loops - Plan re-use Old plans can be successfully
inserted into new ones (if initial and final
conditions are met )without additional
computation
29(ontologically driven) Operator Schema Example
Transport
- (action transport
- parameters (?mol - macromolecule,
- ?compfrom, ?compto - compartment)
- condition (and (in ?mol ?compfrom)
- (open ?compfrom ?compto))
- effects (and (in ?mol ?compto)
- (not (in ?mol ?compfrom)))
30RTK-MAPK pathway
Activation of Ras following binding of a hormone
(eg. EGF) to a receptor
31RTK-MAPK pathway step O-Plan Output
Phosphorylation of GRB2 at domain Sh2 by the RTK
receptor
32Summary
- Bioinformatics has many features amenable to
multi-agent information gathering approach - BioMAS Automated Analysis EST processing to
functional annotation ontologies - DECAF / RETSINA / TÆMS
- GOFigure! And electronic GO annotation
- CoPrDom Co-Present Domain Analysis
- Signal Transduction Pathway Discovery
33BioMAS Future Work
- Sophisticated queries are possible, but how to
make available to Biologists?? - Show me all glycoproteins in Mareks Disease
virus with a tyrosine phosphorylation motif and a
transmembrane domain value 2 that are expressed
in feather follicles - Robustness, efficiency, scale, data
materialization issues - Automating and integrating more complex analysis
processes (using existing software!) - Estimating physical location of genes by synteny
- Integrate new data sources
- Microarray and other gene expression data
- And thus, more analyses QTL mapping, metabolic
pathway learning - New off-site organism databases and analysis
agents
http//udgenome.ags.udel.edu/
http//www.cis.udel.edu/decaf/