Analysis Environments For Functional Genomics - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Analysis Environments For Functional Genomics

Description:

Apis mellifera. Division of labour: foraging-related? 69 ... Bee: Apis mellifera. FlyEEB: Fly Ecology, Evolution and Behavior. Bird: Bird Communication ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 57
Provided by: CAN128
Category:

less

Transcript and Presenter's Notes

Title: Analysis Environments For Functional Genomics


1
Analysis Environments For Functional Genomics
Bruce R. Schatz Institute for Genomic
Biology University of Illinois at
Urbana-Champaign schatz_at_uiuc.edu
www.beespace.uiuc.edu
Genomic Ecology seminar December 2, 2005 UIUC
2
What are Analysis Environments
  • Functional Analysis
  • Find the underlying Mechanisms
  • Of Genes, Behaviors, Diseases
  • Comparative Analysis
  • Top-down data mining (vs Bottom-up)
  • Multiple Sources especially literature

3
Building Analysis Environments
  • Manual by Humans
  • Interaction user navigation
  • Classification collection indexing
  • Automatic by Computers
  • Federation search bridges
  • Integration results links

4
Needles and Haystacks
  • Genes
  • Honey Bees have 13K genes
  • Perhaps 100 have known functions
  • Paths
  • Perhaps 30K protein families exist
  • KEGG has 200 known pathways
  • Statistical Clustering for Interactive Discovery
  • Across Two Orders of Magnitude!

5
Trends in Analysis Environments
  • Central versus Distributed Viewpoints
  • The 90s Pre-Genome
  • Entrez (NIH NCBI) versus
  • WCS (NSF Arizona)
  • The 00s Post-Genome
  • GO (NIH curators) versus
  • BeeSpace (NSF Illinois)

6
Pre-Genome Environments
  • Focused on Syntax pre-Web
  • WCS (Worm Community System)
  • Search words across sources
  • Follow links across sources
  • Words automatic, Links manual
  • Towards Integrated Searching

7
Post-Genome Environments
  • Focused on Semantics post-Web
  • BeeSpace (Honey Bee Inter Space)
  • Navigate concepts across sources
  • Integrate data across sources
  • Concepts automatic, Links automatic
  • Towards Conceptual Navigation

8
Worm Community System
  • WCS Information
  • Literature BIOSIS, MEDLINE, newsletters,
    meetings
  • Data Genes, Maps, Sequences, strains, cells
  • WCS Functionality
  • Browsing search, navigation
  • Filtering selection, analysis
  • Sharing linking, publishing
  • WCS 250 users at 50 labs across Internet (1991)

9
WCS Molecular
10
WCS Cellular
11
WCS invokes gm
12
WCS vis-à-vis acedb
13
Towards the Interspace
  • from Objects to Concepts
  • from Syntax to Semantics
  • Infrastructure is Interaction with Abstraction

Internet is packet transmission across
computers Interspace is concept navigation
across repositories
14
THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
15
LEVELS OF INDEXES
16
Navigation in MEDSPACE
  • For a patient with Rheumatoid Arthritis
  • Find a drug that reduces the pain (analgesic)
  • but does not cause stomach (gastrointestinal)
    bleeding

Choose Domain
17
Concept Search
18
Concept Navigation
19
Retrieve Document
20
Navigate Document
21
Post-Genome Informatics I
  • Comparative Analysis within the
  • Dry Lab of Biological Knowledge
  • Classical Organisms have Genetic Descriptions.
  • There will be NO more classical organisms beyond
  • Mice and Men, Worms and Flies, Yeasts and Weeds.
  • Must use comparative genomics on classical
    organisms
  • Via sequence homologies and literature analysis.

22
Post-Genome Informatics II
  • Functional Analysis within the
  • Dry Lab of Biological Knowledge
  • Automatic annotation of genes to standard
    classifications, e.g. Gene Ontology via homology
    on computed protein sequences.
  • Automatic analysis of functions to scientific
    literature, e.g. concept spaces via text
    extractions. Thus must use functions in
    literature descriptions.

23
BeeSpace FIBR Project
  • BeeSpace project is NSF FIBR flagship
  • Frontiers Integrative Biological Research,
  • 5M for 5 years at University of Illinois
  • Analyzing Nature and Nurture in Societal Roles
    using honey bee as model
  • (Functional Analysis of Social Behavior)
  • Genomic technologies in wet lab and dry lab
  • Bee Biology gene expressions
  • Space Informatics concept navigations

24
(No Transcript)
25
(No Transcript)
26
System Architecture
  • BeeSpace
  • Concepts
  • Concepts
  • SEQ
  • Expressions
  • Expressions
  • Databases
  • Bees
  • Flies
  • Documents
  • Documents
  • SEQ
  • Community
  • Community

27
Conceptual Navigation in BeeSpace
28
BeeSpace Analysis Environment
  • Build Concept Space of Biomedical Literature for
    Functional Analysis of Bee Genes
  • -Partition Literature into Community Collections
  • -Extract and Index Concepts within Collections
  • -Navigate Concepts within Documents
  • -Follow Links from Documents into Databases
  • Locate Candidate Genes in Related Literatures
    then follow links into Genome Databases

29
Concept Extraction
30
Functional Phrases
  • ltgenegt encodes ltchemicalgt
  • Sokolowski and colleagues demonstrated in
    Drosophila melanogaster that the foraging gene
    (for) encodes a cGMP dependent protein kinase
    (PKG).
  • The dg2 gene encodes a cyclic guanosine
    monophosphate (cGMP)- dependent protein kinase
    (PKG).
  • ltchemicalgt affects/causes ltbehaviorgt
  • Thus, PKG levels affected food-search behavior.
  • cGMP treatment elevated PKG activity and caused
    foraging behavior.
  • ltgenegt regulates ltbehaviorgt
  • Amfor, an ortholog of the Drosophila for gene, is
    involved in the regulation of age at onset of
    foraging in honey bees.
  • This idea is supported by results for malvolio
    (mvl), which encodes a manganese transporter and
    is involved in regulating Drosophila feeding and
    age at onset of foraging in honey bees.

31
Gene Summarization
  • D. melanogaster gene foraging , abbreviated as
    for , is reported here . It has also been known
    in FlyBase as BcDNAGM08338, CG10033 and
    l(2)06860. It encodes a product with
    cGMP-dependent protein kinase activity
    (EC2.7.1.-) involved in protein amino acid
    phosphorylation which is a component of the
    cellular_component unknown . It has been
    sequenced and its amino acid sequence contains an
    eukaryotic protein kinase , a protein kinase
    C-terminal domain , a tyrosine kinase catalytic
    domain , a serine/Threonine protein kinase family
    active site , a cAMP-dependent protein kinase and
    a cGMP-dependent protein kinase . It has been
    mapped by recombination to 2-10 and cytologically
    to 24A2--4 . It interacts genetically with Csr .
    There are 27 recorded alleles 1 in vitro
    construct (not available from the public stock
    centers), 25 classical mutants ( 3 available from
    the public stock centers) and 1 wild-type.
    Mutations have been isolated which affect the
    larval nerve terminal and are behavioral, pupal
    recessive lethal, hyperactive, larval
    neurophysiology defective and larval neuroanatomy
    defective. for is discussed in 80 references
    (excluding sequence accessions), dated between
    1988 and 2003. These include at least 6 studies
    of mutant phenotypes , 2 studies of wild-type
    function , 3 studies of natural polymorphisms and
    7 molecular studies . Among findings on for
    function, for activity levels influence adult
    olfactory trap response to a food medium
    attractant. Among findings on for polymorphisms,
    the frequency of for R and for s strains in three
    natural populations are studied to determine the
    contribution of the local parasitoid community to
    the differences in for R and for s frequencies.

32
Well Characterized Gene
  • Ling et. al., PSB 2006

33
Poorly Characterized Gene
  • Ling et. al., PSB 2006

34
BeeSpace Information Sources
  • Biomedical Literature
  • Medline (medicine)
  • Biosis (biology)
  • Agricola, CAB Abstracts, Agris (agriculture)
  • Model Organisms (heredity)
  • -Gene Descriptions (FlyBase, WormBase)
  • Natural Histories (environment)
  • -BeeKeeping Books (Cornell, Harvard)

35
Medical Concept Spaces (1998)
  • Medical Literature (Medline, 10M abstracts)
  • Partition with Medical Subject Headings (MeSH)
  • Community is all abstracts classified by core
    term
  • 40M abstracts containing 280M concepts
  • computation is 2 days on NCSA Origin 2000
  • Simulating World of Medical Communities
  • 10K repositories with gt 1K abstracts
  • (1K with gt 10K)

36
Biological Concept Spaces (2006)
  • Compute concept spaces for All of Biology
  • BioSpace across entire biomedical literature
  • 50M abstracts across 50K repositories
  • Use Gene Ontology to partition literature into
  • biological communities for functional analysis
  • GO same scale as MeSH but adequate coverage?
  • GO light on social behavior (biological process)

37
BeeSpace Prototype Collections
  • Organism
  • Bee Apis mellifera
  • FlyEEB  Fly Ecology, Evolution and Behavior
  • Bird  Bird Communication
  • Development
  • Behaviorial  Maturation
  • Development  Development of insects
  • Communication  Communication by insects
  • Behavior
  • Agonistic Agonistic and Territorial Behaviors
  • Forage Behavior of Resource Acquisition
  • Nest  Home Maintenance and Defense
  • Social Behavior of Social Integration in Insects

38
Semantic Concept Clustering
  • 558 clusters from 90K BeeSpace abstracts
  • Cluster 336
  • cactus opuntia sonoran_desert
  • drosophila_aldrichi lophocereu
  • senita pachea d_aldrichi
  • barker-j-s-f mettleri d_buzzatii
  • schottii cactaceae rot
  • cacti aldrichi nigrospiracula
  • yeast_species cladode

39
Community Collection Clustering
  • Ecology and population genetics of Sonoran Desert
    Drosophila
  • VECTORING OF CACTOPHILIC YEASTS BY DROSOPHILA
  • ATTRACTION OF LARVAE OF DROSOPHILA-BUZZATII AND
    DROSOPHILA-ALDRICHI TO YEAST SPECIES ISOLATED
    FROM THEIR NATURAL ENVIRONMENT
  • Coexistence of ecologically similar colonising
    species Drosophila aldrichi and Drosophila
    buzzatii Larval performance on, and adult
    preference for, three Opuntia cactus species
  • HOST-PLANT SPECIFICITY IN THE CACTOPHILIC
    DROSOPHILA-MULLERI SPECIES COMPLEX
  • YEAST COMMUNITIES FROM HOST PLANTS AND ASSOCIATED
    DROSOPHILA IN SOUTHERN ARIZONA USA ANALYSIS OF
    THE RELATIVE IMPORTANCE OF HOSTS AND VECTORS ON
    COMMUNITY COMPOSITION
  • HETEROGENEITY OF THE YEAST FLORA IN THE BREEDING
    SITES OF CACTOPHILIC DROSOPHILA
  • AN ANALYSIS OF THE YEAST FLORA ASSOCIATED WITH
    CACTIPHILIC DROSOPHILA AND THEIR HOST PLANTS IN
    THE SONORAN DESERT AND ITS RELATION TO TEMPERATE
    AND TROPICAL ASSOCIATIONS

40
Concept Switching
  • In the Interspace
  • each Community maintains its own repository
  • Switching is navigating Across repositories
  • use your specialty vocabulary to search another
    specialty

41
CONCEPT SWITCHING
  • Concept versus Term
  • set of semantically equivalent terms
  • Concept switching
  • region to region (set to set) match

42
Biomedical Session
43
Categories and Concepts
44
Concept Switching
45
Document Retrieval
46
Prototype System
  • Overall Architecture and Interface -- Todd
    Littell
  • Language Parsing and Entity Recognition Jing
    Jiang
  • Normalization and Theme Clustering Qiaozhu Mei
  • Concept Navigation and Switching Azadeh Shakery
  • Document Clustering and Partitioning Brant Chee
  • Annotation Pipeline and Classification Xin He
  • Gene Summarization and Integration Xu Ling

47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
Interactive Functional Analysis
  • BeeSpace will enable users to navigate a uniform
    space of diverse databases and literature sources
    for hypothesis development and testing, with a
    software system beyond a searchable database,
    using literature analyses to discover functional
    relationships between genes and behavior.
  • Genes to Behaviors
  • Behaviors to Genes
  • Concepts to Concepts
  • Clusters to Clusters
  • Navigation across Sources

53
BeeSpace Information Sources
  • General for All Spaces
  • Scientific Literature
  • -Medline, Biosis, Agricola
  • Genome Databases
  • -GenBank, ProteinDataBank, ArrayExpress
  • Special for BeeSpace
  • Model Organisms (heredity)
  • -Gene Descriptions (FlyBase, WormBase)
  • Natural Histories (environment)
  • -BeeKeeping Books (Cornell, Harvard)

54
XSpace Information Sources
  • Organize Genome Databases (XBase)
  • Compute Gene Descriptions from Model Organisms
  • Partition Scientific Literature for Organism X
  • Compute XSpace using Semantic Indexing
  • Boost the Functional Analysis from Special
    Sources
  • Collecting Useful Data about Natural Histories
  • e.g. CowSpace Leverage in AIPL Databases

55
Towards SoySpace
  • Organize Genome Databases (SoyBase)
  • Partition Scientific Literature for SoyBean
  • Gene Descriptions from Models (TAIR)
  • Natural Histories from Population Databases
  • Key to Functional Analysis is Special Sources
  • Collecting Appropriate Text about Genes
  • Extracting Adequate Data about Histories
  • Leverage is National Archives of germplasm and
    Historical Records for soybean crops

56
Towards the Interspace
  • The Analysis Environment technology is
    GENERAL! BirdSpace? BeeSpace?
  • PigSpace? CowSpace?
  • BehaviorSpace? BrainSpace?
  • SoySpace? CropSpace?
  • BioSpace
  • Interspace
Write a Comment
User Comments (0)
About PowerShow.com