BioMAS: A Multi-Agent System for Automated Genomic Annotation - PowerPoint PPT Presentation

About This Presentation
Title:

BioMAS: A Multi-Agent System for Automated Genomic Annotation

Description:

BioMAS: A Multi-Agent System for Automated Genomic Annotation Keith Decker Department of Computer and Information Sciences University of Delaware – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 34
Provided by: KeithD58
Learn more at: http://www.nettab.org
Category:

less

Transcript and Presenter's Notes

Title: BioMAS: A Multi-Agent System for Automated Genomic Annotation


1
BioMAS A Multi-Agent System for Automated
Genomic Annotation
  • Keith Decker
  • Department of Computer and Information
    SciencesUniversity of Delaware

Salim Khan, Ravi Makkena, Gang Situ Computer
Information Sciences
Dr. Carl Schmidt, Heebal Kim Animal Food
Sciences
2
Outline
  • General class of problems and MAS solution
    approach
  • BioMAS Automated Genomic Annotation
  • HVDB HerpesVirus Database
  • ChickDB Gallus Gallus Database
  • GOFigure!
  • CoPrDom
  • Signal Transduction Pathway Discovery

3
What problems are we addressing?
  • Huge, dynamic Primary Source Databases
  • Highly distributed, overlapping
  • Heterogeneous content, structure, curation
  • Multitude of analysis algorithms
  • Different interfaces, output formats
  • Create contingent process plans chaining many
    analyses together
  • Individual PIs, working on non-model organisms
  • Learn, then hand-navigate sea of DBs and analysis
    tools
  • Easily overwhelmed by new sequence and EST data
  • Struggle to make results available usefully to
    others

4
Approach Multi-Agent Information Gathering
  • Software agents for information retrieval,
    filtering, integration, analysis, and display
  • Embody heterogeneous database technology
    (wrappers, mediators, )
  • Deal with dynamic data and changing data sources
  • Efficient and robust distributed computation (for
    both info retrieval and analysis)
  • Deal with issues of data organization and
    ownership
  • Natural approach to providing integrated
    information
  • To humans via web
  • To other agents via semantic markup XML/OIL/DAML

5
Example Multi-Agent System for Automated
Herpesvirus Annotation
  • Input raw sequence data
  • Output an annotated database that allows fairly
    complex queries
  • BLAST homologs
  • Motifs
  • Protein domains Prodomain records
  • PSORT sub-cellular location predictions
  • GO Gene Ontology electronic annotation
  • Show me all the genes in Mareks Disease virus
    with a tyrosine phosphorylation motif and a
    transmembrane domain value 2

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
How does this help?
  • Automates collection of information from various
    primary source databases
  • If the info changes, can be updated
    automatically. PI can be notified.
  • Allows various analyses to be done automatically
  • Can encode complex (contingent) sequences of info
    retrieval and linked analyses, report interesting
    results only
  • New data sources, annotation, analyses can be
    applied as they are developed, automatically
    (open system)
  • Made available on internet to others, or private
    data
  • Much more sophisticated queries than keyword
    search
  • Dynamic menu of keys
  • Concept hierarchies (ontology) allow more
    concise queries
  • Query planning (e.g., time, resource usage)
  • Can search across multiple databases (i.e., from
    other researchers)

10
How does it work?
RETSINA-style Multi-Agent Organization
11
DECAF A multi-agent system toolkit
  • Focus on programming agents, not designing
    internal architecture
  • Programming at the multi-agent level
  • Value-added architecture
  • Support for persistent, flexible, robust actions

12
DECAF
  • Focus on programming agents, not designing
    internal architecture
  • Avoiding the API approach
  • DECAF as agent operating system, programmers
    have strictly limited access
  • Communication, planning, scheduling,
    coordination, execution
  • Graphical dataflow plan editor

13
DECAF
  • Programming at the multi-agent level
  • Standardized, domain-independent, reusable
    middle agents
  • Agent Name Server (white pages)
  • Matchmaker (yellow pages/directory service)
  • Brokers (managers)
  • Information extraction (learning STALKER
    knowledgebase PARKA)
  • Proxy (web interfaces)
  • Agent Management Agent (debugging, demos,
    external control)
  • Note heterogeneous architectures are OK!

14
DECAF
  • Value-added architecture
  • Taking care of details (social/individual)
  • ANS registration/dereg (eventually MM)
  • Standard behaviors (AMA, error, FIPA, libraries)
  • Message dispatching (ontology, conversation)
  • Coordination (GPGP)
  • Efficient use of computational resources
  • Highly threaded internally domain actions
  • Memory efficient (ran systems for weeks, hundreds
    of thousands of messages)

15
DECAF
  • Support for persistent, flexible, robust actions
  • HTN-style programming
  • Task alternatives and contingencies
  • RETSINA-style dataflow
  • Provisions/Parameters determine task activation
  • Multiple outcomes, Loops
  • TÆMS-style task network annotations
  • Dynamic overall utility Quality, cost, duration
    task characteristics
  • Explicit representation of non-local tasks
  • Example Time/Quality tradeoff

16
DECAF Architecture
Incoming KQML/FIPA messages
Plan file
Incoming Message Queue
Objectives Queue
Task Queue
Agenda Queue
Agent Initialization
Dispatcher
Planner
Scheduler
Executor
Pending Action Queue
Task Templates Hash Table
Action Results Queue
concurrent
Domain Facts and Beliefs
Outgoing KQML/FIPA messages
17
Plan Editor
18
Expanding the Genomic Annotation System
19
Functional Annotation Suborganization
Gene Ontology Consortium www.geneontology.org
Biological process Molecular Function
Cellular Component
20
(No Transcript)
21
(No Transcript)
22
Co-present Domain Networks (CoPrDom)
  • Proteins can be viewed as conserved sets of
    domains
  • Vertex domain, edge co-present in some
    protein, edge weight of proteins co-present
    in
  • Network constructed from InterPro domain markup
    of proteins in 10 species (human, drosophila, c.
    elegans, s. cerevisiae among them)
  • Functional characterization via InterPro to GO
    mapping
  • Network constructed per organism per functional
    group, eg apoptosis regulation in human

23
(No Transcript)
24
Uses for COPRDOM
  • Functional characterization of unknown domains
  • Identification of core domains/groups in a
    functional group
  • Tracking domain evolution through species
    evolution
  • Predicting protein-protein interaction by
    identifying evolutionary merging of domain groups

25
Biological Pathway Discovery thru AI Planning
Techniques
  • AI planning is a computational method to develop
    complex plans of action using the representation
    of the initial states, the actions which
    manipulate these states to achieve the goal
    states specified.
  • Initial States The initial state representation
    of objects in the "plan world"
  • Actions Logical descriptions of preconditions
    and effects
  • Goals The end states desired
  • HTN (Hierarchical Task Network) Planning proceeds
    by task decompostion of networks, and a
    successful is one that satisfies a task network.

26
Uses of the Signal Transduction Planner
  • To produce computer interpretable plans capturing
    relevant qualitative information regarding signal
    transduction pathways.
  • To produce testable hypotheses regarding gaps in
    knowledge of the pathway, and drive future signal
    transduction research in an ordered manner.
  • To identify key nodes where many pathways are
    regulated by a node with only 1 functional
    protein serving as a critical checkpoint.
  • To perform in silico experiments of hyper
    expression and deletion mutation.
  • To enable pathway vizualization tools by
    providing human- and machine-readable pathway
    description.

27
Advantages of Planning
  • Operator schema Abstracted axiomatic definitions
    of sub-cellular processes, understandable to
    human computer
  • Task abstraction Decomposition of complex task
    into simpler, interchangeable actions.
  • Reduces search space, conflicts
  • Modeling of pathways at different levels of
    biochemical detail
  • Search conducted in Plan Space Most planners
    perform bi-directional search (vs. Pathway Tools,
    Prolog implementations, etc.)
  • Partial-order Planning Succinct representation
    of multiple pathways helps identify key causal
    relationships

28
Advantages of Planning (contd.)
  • Conditional effects can be used to model special
    cases ("exceptions") when applying operator
    schema
  • Resource Utilization can be used to model
    quantitative aspects such as amplification of a
    signal, feedback and feed-forward loops
  • Plan re-use Old plans can be successfully
    inserted into new ones (if initial and final
    conditions are met )without additional
    computation

29
(ontologically driven) Operator Schema Example
Transport
  • (action transport
  • parameters (?mol - macromolecule,
  • ?compfrom, ?compto - compartment)
  • condition (and (in ?mol ?compfrom)
  • (open ?compfrom ?compto))
  • effects (and (in ?mol ?compto)
  • (not (in ?mol ?compfrom)))

30
RTK-MAPK pathway
Activation of Ras following binding of a hormone
(eg. EGF) to a receptor
31
RTK-MAPK pathway step O-Plan Output
Phosphorylation of GRB2 at domain Sh2 by the RTK
receptor
32
Summary
  • Bioinformatics has many features amenable to
    multi-agent information gathering approach
  • BioMAS Automated Analysis EST processing to
    functional annotation ontologies
  • DECAF / RETSINA / TÆMS
  • GOFigure! And electronic GO annotation
  • CoPrDom Co-Present Domain Analysis
  • Signal Transduction Pathway Discovery

33
BioMAS Future Work
  • Sophisticated queries are possible, but how to
    make available to Biologists??
  • Show me all glycoproteins in Mareks Disease
    virus with a tyrosine phosphorylation motif and a
    transmembrane domain value 2 that are expressed
    in feather follicles
  • Robustness, efficiency, scale, data
    materialization issues
  • Automating and integrating more complex analysis
    processes (using existing software!)
  • Estimating physical location of genes by synteny
  • Integrate new data sources
  • Microarray and other gene expression data
  • And thus, more analyses QTL mapping, metabolic
    pathway learning
  • New off-site organism databases and analysis
    agents

http//udgenome.ags.udel.edu/
http//www.cis.udel.edu/decaf/
Write a Comment
User Comments (0)
About PowerShow.com