Title: BeeSpace Informatics: Interactive System for Functional Analysis
1BeeSpace InformaticsInteractive System for
Functional Analysis
- Bruce Schatz
- Institute for Genomic Biology
- University of Illinois at Urbana-Champaign
- www.beespace.uiuc.edu
- Fifth Annual Project Workshop
- IGB, Urbana IL May 22, 2009
2Concept Navigation in BeeSpace
3Informatics From Bases to Spaces
- data Bases support genome data
- e.g. FlyBase has sequences and maps
- Genes annotated by GeneOntology and
- linked to biological literature
- information Spaces support biological literature
- e.g. BeeSpace uses automatically generated
- conceptual relationships to navigate functions
4System Architecture
5System Versions
- V1 Filter Concept Graph
- Search, Expand, Merge, Switch, Visualize
- V2 Cluster Conceptual Groupings
- Small Worlds (Natural), Language Model
(Steerable), Concepts/Documents - V3 Summarize Gene Descriptions
- Gene Extraction, Sentence Classification
- V4 Analyze Functional Concepts
- Concept Identification, Category Grouping
- V5 Answer Entity Relationships
- Entities, Relations, Templates
6Informatics Researchers (Faculty)
- Investigators
- Bruce Schatz, systems (Medical Information
Science) - ChengXiang Zhai, algorithms (Computer Science)
-
- Collaborators (students)
- Saurabh Sinha, Computer Science
- Jiawei Han, Computer Science
- Sheng Zhong, Bioengineering
- Nathan Price, Chemical Biomolecular Engineering
- Collaborators (advices)
- John MacMullen, Library Information Science
- Dan Roth, Computer Science
- Roxana Girju, Linguistics
- Karrie Karahalios, Computer Science
7Informatics Researchers (Staff)
- V1-V3
- Todd Littell, research programmer
- Jim Buell, research coordinator
- Nyla Ismail, biology postdoc
- Moushumi Sen Sarma, biology postdoc
- V4-V5
- David Arcoleo, research programmer
- Barry Sanders, research programmer
- Moushumi Sen Sarma, biology postdoc
- Radhika Khetani, biology postdoc
8Informatics Researchers (Students)
- V1 Filter (parse)
- Jing Jiang, Azadeh Shakery, Yuanhua Lv
- V2 Cluster (group)
- Brant Chee, Qiaozhu Mei, Peixiang Zhao
- V3 Summarize (classify)
- Xu Ling, Jing Jiang, Qiaozhu Mei, Xin He
- V4 Analyze (annotate)
- Xin He, Brant Chee, Moushumi Sarma, Xu Ling
- V5 Answer (extract)
- Xu Ling, Xin He, Yanen Li, Yue Lu
9Analysis Environment Features
- SPACE is a Paradigm not a Metaphor!
- Point of View for YOUR Problem
- Externally
- -Dynamically describe custom Region of Space
- -Merge Regions to form Hypothesis Space
- -Differentially express genes against Space
10Analysis Environment System
- Concepts and Genes are Universal Entities!
- Uniformly Represented
- Uniformly Manipulated
- Internally
- -Extract and Index Concepts within Collections
- -Navigate Concepts within Documents
- -Follow Genes from Documents into Databases
-
11Automatic Categorization v2
- Sorting of Spaces based on Metadata
- Sorting of Spaces based on Ontology
- MeSH for Medline Abstracts
- Gene Ontology computed for documents
- Sorting of Spaces based on Clustering
- Natural Maps from Small Worlds
- Steerable Maps from Language Models
- Semantic Indexing of Dynamic Spaces
- Fast System enables Interactive Sorting!
12Small World Graph
13Semantics Deeper and Faster
- Semantic Indexing across all of Medline
- Previous Attempts used Word Co-Occurrence
- Now Phrase Parser works general-purpose
- Now Mutual Information full differential
- Parallel Optimization of MI Graph
- Real-time Computation Shared Memory Cluster
- Interactive on our 16PC 256GB RAM workerbee
- Dynamic Spaces then Dynamic Semantic Indexing
- Interactive Clustering Natural Map
- Heuristic Approximation Small Worlds Graphs
14Dynamic Clustering
- Community Structure enables Dynamic Clustering
with Large Vectors
15Automatic Curation v3
- Automatic Summarization of Genes
- Retrieve relevant sentences about gene
- Classify sentences into important aspects
- protein domain, homolog/ortholog
- expression pattern, phenotype function
- regulatory element, genetic interaction
- Generalizing to Biology Entities
- Genes, anatomical, behavior, chemical
- Question answering from biology factoids
- Computed Curation from Literature
16 Gene Summary (FlyBase)
17Gene Summary (BeeSpace)
- Structured summary consists of relevant sentences
covering 6 aspects of a gene - Gene Products (GP)
- Expression Location (EL)
- Sequence Information (SI)
- Wild-type Function Phenotypic Information
(WFPI) - Mutant Phenotype (MP)
- Genetical Interaction (GI)
18Drosophila gene Abelson (Abl) tyrosine kinase
19Tribolium gene Scr
20Gene Summarizer New Aspects
- New categories (proposed by FlyBase curators)
- GP SI gt PS (protein domain or structure)
- SI gt HO (homologs or orthologs)
- EL gt EP (spatial/temporal expression patterns)
- SI gt RE (regulatory element information)
- WFPI MP gt PF (wild-type or mutant phenotype
and function) - GI gt IT (genetic or physical interaction)
- New (beyond FlyBase) gt PG (population genetics)
- Utilize cross-domain information for improving
the GS on other organisms.
21(No Transcript)
22BeeSpace System v3
- SPACES and REGIONS
- Dynamic and Relative
- Space is collection of documents
- Region is collection of terms
- Extract creates new Region from old Space
- Map creates new Space from old Region
- New from Old Spaces and Regions via merges
- Summarize classifies Gene within Space
- Annotate finds differential functional expression
23BeeSpace Semantic Operations
- Merge (S1,S2) into S3
- Summarize (S) into Gene classify
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40New Interface v4
- Single Window, Multiple Panes
- Space Panel, Service Tabs
- SPACES custom, system
- FILTER searching, sorting
- CLUSTER map natural and steerable
- SUMMARIZE categorize using space
- ANALYZE annotate using space
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48Functional Analysis v4
- The software system goes beyond a searchable
database, using statistical literature analyses
to discover functional relationships between
genes and behavior. - This research will enable all scientists who
study bee genes to live on the frontier of
integrative biology, where biotechnology enables
routine expression analysis and bioinformatics
enables functional analysis - unconstrained by pre-existing categories.
- Genelist Analyzer v4
- -Differential Expression of Gene Names against
Space - -Background is custom made Literature Space
- -Produces Concept List from Gene List
- -Analyze using Concept Navigation and Gene
Summarization -
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59Question Answering v5
- Entities and Relations
- Question Answering templates
- Entity
- Gene, Anatomical
- Behavior, Chemical
- Relation
- Regulation (Gene-Gene)
- Expression (Gene-Anatomy)
- Function (Gene-Behavior) Biological Process
- Function (Gene-Chemical) Molecular Function
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66Towards the Interspace
- The Analysis Environment technology is
GENERAL! BirdSpace? BeeSpace? - PigSpace? CowSpace?
-
- ArthropodSpace? AnimalSpace?
- BioSpace? MedSpace?