Bioinformatics: building bridges to biology information - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Bioinformatics: building bridges to biology information

Description:

Genome objects fit in directories, search & retrieve as annotated sequence (like ... Same directory methods work for objects, databases, software and other resources ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 28
Provided by: dong167
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics: building bridges to biology information


1
Bioinformatics building bridges to biology
information
  • Biology Information Access
  • Genome Information Systems
  • Bio-Grids and Bio-Directories

Don Gilbert, gilbertd_at_bio.indiana.edu November
2002
2
Biology information access projects
  • Bio-info archiving and distribution
  • IUBio Archive, http//iubio.bio.indiana.edu/ --
    public molecular biology data / software archive
  • Bio-Mirrors, http//www.bio-mirror.net/ --
    Sequence and related biology databanks
  • Genome information systems
  • FlyBase, http//flybase.bio.indiana.edu/ --
    genome infosystem of Drosophila fruitfly
  • euGenes, http//eugenes.org/ -- infosystem for 8
    important eukaryotes with 180,000 genes
  • Bioinformatics services
  • http//sunflower.bio.indiana.edu/bioweb/ --
    molecular biology program use via web
  • Bio-Data Grids
  • http//iubio.bio.indiana.edu/grid/ --
    experimental distributed computing

3
(No Transcript)
4
BioData
  • BioData size, contents, dispersion, uses
  • Genome data
  • very important, highly complex, harder to find,
    long lived
  • Literature (abstracted and curated), Sequence and
    feature analyses, maps, controlled
    vocabulary/ontologies, people, biologics,
    contacts, etc.
  • BioData access
  • Need to find and use best data
  • New data kinds and sources - bio-information is
    very fluid
  • Need current data update monthly, weekly, daily
  • Distributed widely in world among 1000s of
    national, regional centers labs

5
Bio Databanks, EBI, Sept. 2002
6
Constellation of Bio-Data (SRS - Lion Bioscience)
7
(No Transcript)
8
FlyBase and euGenes
9
Genome Databases
  • Drosophila FlyBase, http//flybase.net/ (Indiana
    Univ.)
  • C. elegans WormBase, http//www.wormbase.org/
  • Mouse MGD, http//www.informatics.jax.org/
  • Saccaromyces SGD, http//genome-www.stanford.edu/
    Saccharomyces/
  • Human LocusLink, http//www.ncbi.nlm.nih.gov/Locu
    sLink/
  • Human GeneCards http//bioinfo.weizmann.ac.il/car
    ds/
  • Various eukaryotes Ensembl http//www.ensembl.org
    /
  • Various eukaryotes euGenes http//eugenes.org/
    (Indiana Univ.)
  • Many new organism genome systems for Daphnia,
    insects, vertebrates, others with complete genome
    data

10
FlyBase.net
  • Distributed project (4 sites, 6 PIs, 15
    curators, 15 informaticians) 10 years old
  • Multiple databases project data flow and
    exchange critical
  • Curated and computed data, from expt. literature,
    genome sequence
  • Integrated database modules (for generic use w/
    GMOD)
  • Genetics, Sequences, Maps, Expression
  • Controlled vocabularies Ontologies
  • Computational analyses
  • Organism, taxonomy, phylogenetic/comparative
  • Publications, General

11
euGenes.org
  • Automated genome summaries for Human, Fruitfly,
    Mouse, Mosquito, Arabidopsis, C. elegans,
    Saccharomyces, Zebrafish
  • 3 year, computational DB project, 1 part-time
    informatician (dgg ?)
  • genome maps, sequences, gene reports, external
    database links
  • cross-species comparisons similar genes, genome
    features, gene function

12
(No Transcript)
13
Genome Data Objects

Drosophila genome, FlyBase, Sept. 2002
8 eukaryote genomes, euGenes, July 2002
14
Genome attributes in euGenesJuly 2002
Genes as extracted from genome project sources.
These differ from true gene numbers by orphan
gene records, prediction artifacts, unmerged
predicted/expt. records, and unfinished
sequencing gaps.
15
Anatomy of genome database info system
16
FlyBase/euGenes Query System
17
FlyBase Query Results
FlyBase Genes query results Query   (
libsFBgn PFgn-allwing or libs-synwing )
and libs-orgDmel,  No. matches 1437 Bookmark
FBquery ( libsFBgn PFgn-allwing
libs-synwing ) libs-orgDmel
Symbol Name  Map Alleles Stocks Refs DNA Date
1 18w 18 wheeler 56F11 16 2 56 13 31 May
02 2 2R-F - - 2 1 3 - 31 May 02 ...
19 Act42A Actin 42A 42A2 2 - 73 23 31 May
02 20 Act5C Actin 5C 5C7 14 1 129 43 31 May
02 ------------------- Page and Sort results
------------------ Batch Download Fetch items x
All Items   Format Spreadsheet 
Report content Summary  Report only Select
fields Field list Refine query or find
items in related data Refine query ( libsFBgn
PFgn-allwing or libs-synwing ) and
libs-orgDmel and other fields matches
.. Search Genes , retrieve Related Data
Classes (alleles, aberrations, transcripts,
insertions, sequences )
18
GMOD - Generic genome database tools
  • Generic Model Organism Database Construction Set,
    http//www.gmod.org/
  • Database schemas
  • Literature curation tools
  • Gene ontology management tools
  • Visualization tools
  • Data processing pipelines

19
Bio-grids - what might they be ?
  • transparent use of available workstations
    commodity grid resources (commercial, academic)
  • find biodata, computing resources easily and
    automatically via directories
  • personal/project resources and peer-peer sharing
  • less reliance, less cost for centralized services
    or building local IT centers
  • Power grid - plug in your toaster, ignore the
    power sources and grid. Bio grid - plug in
    workstation, ignore where data and compute power
    comes from -- eventually!

20
EU DataGrid Interfacesfrom Bob Jones, CERN
Computing Elements
Mass Storage Systems HPSS, Castor
21
BioGrid Schematic
  • Grid-aware client software
  • Data and software resource directories
  • Grid of processing computers

22
Directories of Genome Data
  • For genome data, "broad and shallow" directories
    federate the "narrow and deep" data-bases
  • BioData access tools
  • SRS - Sequence Retrieval System Entrez AceDB
  • RDBMS Ensembl IBM DiscoveryLink BioSQL
    BioDAS
  • Directory services - Data tools LDAP , Web
    Services
  • LDAP mature, efficient for high volumes, allows
    federated queries over distributed directories,
    and works well for SRS databanks and genome
    annotations
  • Web Services new, simple complex for XML
    messages over Web has wide industry support ,
    but its many standards are in flux

23
Bio-directories Technology
  • Technology for finding bio-data
  • Current Web pages Web Indexers (Google) FTP
    servers
  • Sometimes CORBA Java RMI
  • Usable Lightweight Directories - LDAP
  • Developing Web Services (XML on Web SOAP, WSDL,
    UDDI, ...)
  • Related BioDAS BioMoby Life Sciences ID
    (LSID)

24
SRS - LDAP WebXML gateways
  • Sequence Retrieval System (SRS) knows millions of
    bio-objects good start for bio-directories
  • OpenLDAP server combined with SRS6
  • WebService SOAP server/client with SRS6
  • SRS-LDAP-SOAP software is available at
    http//iubio.bio.indiana.edu/biogrid/directories/
  • Compare LDAP, SOAP, Wgetz, FTP for Grid uses
  • LDAP is 5x faster than SOAP, Wgetz

25
BioDirectory tests for Grid
26
Using Bio Directories
  • Simple client software
  • Automated use
  • People use
  • Discovery
  • Search by many criteria
  • Retrieve bulk subsets

27
Genome Feature Directory
  • Genome objects fit in directories, search
    retrieve as annotated sequence (like BioDAS)
  • Same directory methods work for objects,
    databases, software and other resources

28
BioGrid Runner
29
Wrap up
  • Future of Bio-data distribution
  • Computationally find and use dispersed, complex
    data
  • Best methods for Bio-data Grids
  • High volume and complex data
  • Efficient selection and transport to grid
    computers
  • LDAP works well Web-XML is usable
  • Community needs and uses
  • Shared data descriptions, schema, ontologies
    (Semantic web)
  • Simple, practical, flexible grid methods use
    existing dbs
  • Use common developing standards

30
Thanks to the work, help and support of these
folks and others IUBio Archive -- Danfeng Yao,
Paul Poole Bio-Mirrors --Yoshihiro Ugawa, Tin
Tan Wee, Markus Buchhorn, Akira Mizushima,
Juncai MA, and others. FlyBase -- Victor
Strelets, Gary Grumbling, Nihar Sheth, Manish
Anand, Edwin Wang, Thom Kaufman, Kathy
Matthews, and a crowd at other sites euGenes
-- original idea of and work with Bill Gelbart
and others Bioinformatics web -- Sue Olson
Eugenes fulgens (Magnificent Hummingbird, Costa
Rica)
Write a Comment
User Comments (0)
About PowerShow.com