From Biological Data to Biological Knowledge - PowerPoint PPT Presentation

About This Presentation
Title:

From Biological Data to Biological Knowledge

Description:

Group for Biological Information Systems. MIPS / Institute for Bioinformatics ... Homonym / Synonym problems. e.g. gene identifiers. Ambiguity of terms ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 21
Provided by: vol42
Category:

less

Transcript and Presenter's Notes

Title: From Biological Data to Biological Knowledge


1
From Biological Datato Biological Knowledge
  • Volker Stümpflen
  • Group for Biological Information Systems
  • MIPS / Institute for Bioinformatics
  • GSF National Research Center for Environment
    and Health

2
Something About Our Problem
  • For a long time we focused on individual genes /
    proteins
  • but e.g. humans dont have much more genes than
    simple organisms
  • because complexity occurs at the level of
    biological networks
  • We cant understand anything without
    understanding the context

3
Small Scale Knowledge Generation
  • Accessing some of the several hundred (web)
    resources (public available data gt 2 Petabyte)

gt Compilation of required knowledge by hand
4
Large Scale Assessment of Information and
Knowledge
  • R. Shamir et. al., Revealing modularity and
    organization in the yeast molecular network by
    integrated analysis of highly heterogeneous
    genomewide data, PNAS, Vol. 101, No. 9, 2004, p.
    2981-2986
  • To gain deeper understanding of the biological
    systems, it is pertinent to analyze heterogeneous
    data sources in a truly integrated fashion and
    shape the analysis results into one body of
    knowledge.
  • By integrating experimental data of
    heterogeneous sources and types, we are able to
    perform analysis on a much broader scope than
    previous studies.

5
Technical Problems
  • Information integration from heterogeneous and
    distributed data sources (databases AND
    applications)
  • Solvable with n-Tier architectures
  • E.g. GenRE at MIPS
  • J2EE based middleware
  • Enterprise Java Beans (EJBs) and Web Services (WS)

6
Semantic Problems
  • Sloppy Definitionse.g. Gene has Function
  • Homonym / Synonym problemse.g. gene identifiers
  • Ambiguity of terms
  • Differences in meaning of terms between different
    biological communities
  • Results of in-vitro often differ within the
    experimental scope (e.g. Protein Interactions)

7
Strategies
  • Complete semantic annotation of (all) resources
  • Funding ?
  • Data models ?
  • Modeling of individual domains
  • Suited for biologists (Topic Maps)
  • Access of relevant data sources
  • Merging of individual domains to obtain the
    complete picture

8
Static Generation of Topic Maps
  • Highly flexible data model
  • Straightforward process
  • Intuitive user interface
  • Finding the right information easy
  • Topic maps tend to be very large
  • Redundant information in DBs and Topic Map files
  • Update problems
  • Dynamic generation of topic maps

9
Dynamic Topic Map Generation
  • Dynamical information retrieval via EJBs / Web
    Services
  • Each topic type is mapped to a EJBs / Web Service
  • Each association is also represented by a EJBs /
    Web Service
  • Straightforward extension of the data model
  • Afterwards user's adjustments are possible
  • Intuitive navigation of related information

Protein ECNum Association
has
Protein
EC Number
is associated to
Protein Web Services
EC Number Web Services
Protein ECNum Association Web Services
10
Interface Definition
  • Information retrieval via EJBs (Web Service)
  • Each topic type is mapped to a EJB / WS
  • Each association type is also represented by a
    EJB / WS
  • Straightforward extension of the data model
  • Afterwards user's adjustments are possible



11
DTMG Architecture(Extension of GenRE)
K. Nenova
12
Worst Case Example
  • Combination of two large resources at MIPS
  • Annotated ProteinsCalculated properties of
    genes / proteins from various organisms
  • OrthologsCalculated similarities of
    proteins(all against all)

K. Nenova / R. Gregory
13
Large Scale Annotation with PEDANT(Protein
Extraction, Description and Analysis Tool)
  • Covers currently gt 400 genomes
  • 1000 end of this year

14
SIMAP Precalculated Sequence Homologies
SIMAP database NFS-Server Grid Master
  • 450 proteoms
  • 4 sequence collections
  • 7.5 million protein entries
  • 3.5 million sequences

LAN
Grid execution hosts
8 billion FASTA hits
External users MIPS WWW users
Web- server
SIMAP client
Internet
Linux
BOINC core
  • BOINC
  • 12600 hosts
  • 2.3 TeraFLOPS

Mac
BOINC daemons
Windows
SIMAP database
Database-, Fileserver
R. Arnold, T. Rattei, P. Tischler, V. Stümpflen,
M-D. Truong and HW. Mewes Bioinformatics in press
15
Topic Map Schema
is represented by
Pedant URL
Classification
Description
Length
Molecular Weight
Contig Name
Sequence
Description
Description
has is associated to
Genome
contains belongs
EC Number
Protein
PFAM Domain
Pfam URL
URL
KEGG URL
belongs
is represented by
has orthologs
Domain
Genome
Fun Cat
Taxonomy Id
Description
Strain
Status
FunCat URL
Description
16
Some Screenshots
17
Improvements
  • Parallel searches based on Message Driven Beans

R. Gregory
18
Further Improvements
  • More Maps
  • Deseases, Metabolisms
  • Combination with Text Mining
  • Inference Engines, Reasoners

Computer Show me all proteins in mus
musculus involved in transmembrane signal
transduction and show me the orthologs in rattus
norvegicus
19
Conclusion
  • Topic Maps suitable for semantic information
    integration
  • Development of a Dynamic Topic Map Generation
    (DTMG) Framework
  • Generation of fragments based on component and
    service oriented architectures
  • Capable to gain deeper understanding of
    biological entities and systems in a truly
    integrated fashion

20
Acknowledgements
  • Filka NenovaRichard GregoryMatthias Oesterheld
    Roland ArnoldOctave NoubibouMarisa
    ThomaKonrad Schreiber
  • Thomas Rattei
  • Ulrich GüldenerMartin Münsterkötter
  • FundingImpuls- und Vernetzungsfonds
    derHelmholtz-Gemeinschaft Deutscher
    Forschungszentren e.V.
Write a Comment
User Comments (0)
About PowerShow.com