Title: Towards BioMedSpace: Infrastructure Integrative Biomedical Informatics
1Towards BioMedSpace Infrastructure Integrative
Biomedical Informatics
Bruce R. SchatzDepartment of Medical Information
Science Institute for Genomic Biology University
of Illinois at Urbana-Champaign schatz_at_uiuc.edu,
www.canis.uiuc.edu
Center Computational Medicine Bioinformatics Unive
rsity of Michigan, November 4, 2009
2The Grand Vision
3Linguistics Levels and Universal Units
- 1985 Syntax Files (wholes)
- 1995 Structure Objects (parts)
- 2005 Semantics Concepts (meaning)
- 2015 Pragmatics Features (reality)
4Building Analysis Environments
- Biomedical Informatics
- Bioinformatics Concept Summarization
- Medinformatics Cohort Identification
- Integrative Biology
- Federation search similarity
- Integration navigation links
5BeeSpace Goals
- Bioinformatics Flagship within NSF
- Frontiers Integrative Biological Research
- BEE BIOLOGY
- Experimentally measure gene expression in the
brain for important societal roles during normal
behavior varying heredity (nature) and
environment (nurture) - SPACE INFORMATICS
- Interactively annotate functions for differential
expression using conceptbased navigation of
biological literature and genebased
summarization analysis
6Conceptual Navigation in BeeSpace
7From Bases to Spaces
- Comparative Genomics using Classical Models
- data Bases support genome data
- Sequence-based gene analysis using FlyBase
- To standard classifications such as Gene Ontology
- Based on manual annotation by human curators
- information Spaces support biological information
- Literature-based gene analysis using BeeSpace
- To computed classifications via extracted
concepts - Based on automatic annotation by conceptual
relationships - Descriptions in Literature MUST be used in future
- interactive environments for functional analysis!
8(No Transcript)
9System Versions
- V1 Filter Concept Graph
- Search, Expand, Merge, Switch, Visualize
- V2 Cluster Conceptual Groupings
- Small Worlds (Natural), Language Model
(Steerable), Concepts/Documents - V3 Summarize Gene Descriptions
- Gene Extraction, Sentence Classification
- V4 Analyze Functional Concepts
- Concept Identification, Category Grouping
- V5 Answer Entity Relationships
- Entities, Relations, Templates
10Automatic Categorization v2
- Sorting of Spaces based on Metadata
- Sorting of Spaces based on Ontology
- MeSH for Medline Abstracts
- Gene Ontology computed for documents
- Sorting of Spaces based on Clustering
- Natural Maps from Small Worlds
- Steerable Maps from Language Models
- Semantic Indexing of Dynamic Spaces
- Fast System enables Interactive Sorting!
11Small World Graph
12Semantics Deeper and Faster
- Semantic Indexing across all of Medline
- Previous Attempts used Word Co-Occurrence
- Now Phrase Parser works general-purpose
- Now Mutual Information full differential
- Parallel Optimization of MI Graph
- Real-time Computation Shared Memory Cluster
- Interactive on our 16PC 256GB RAM workerbee
- Dynamic Spaces then Dynamic Semantic Indexing
- Interactive Clustering Natural Map
- Heuristic Approximation Small Worlds Graphs
13Dynamic Clustering
- Community Structure enables Dynamic Clustering
with Large Vectors
14Automatic Curation v3
- Automatic Summarization of Genes
- Retrieve relevant sentences about gene
- Classify sentences into important aspects
- protein domain, homolog/ortholog
- expression pattern, phenotype function
- regulatory element, genetic interaction
- Generalizing to Biology Entities
- Genes, anatomical, behavior, chemical
- Question answering from biology factoids
- Computed Curation from Literature
15 Gene Summary (FlyBase)
16Gene Summary (BeeSpace)
- Structured summary consists of relevant sentences
covering 6 aspects of a gene - Gene Products (GP)
- Expression Location (EL)
- Sequence Information (SI)
- Wild-type Function Phenotypic Information
(WFPI) - Mutant Phenotype (MP)
- Genetical Interaction (GI)
17Drosophila gene Abelson (Abl) tyrosine kinase
18Tribolium gene Scr
19Gene Summarizer New Aspects
- New categories (proposed by FlyBase curators)
- GP SI gt PS (protein domain or structure)
- SI gt HO (homologs or orthologs)
- EL gt EP (spatial/temporal expression patterns)
- SI gt RE (regulatory element information)
- WFPI MP gt PF (wild-type or mutant phenotype
and function) - GI gt IT (genetic or physical interaction)
- New (beyond FlyBase) gt PG (population genetics)
- Utilize cross-domain information for improving
the GS on other organisms.
20(No Transcript)
21Semantics Deeper and Faster
- Semantic Indexing across all of Medline
- Previous Attempts used Word Co-Occurrence
- Entity Recognition works general-purpose
- Function Categorization works general-purpose
- Parallel Optimization of Entity Summarization
- Batch Computation on national Cloud Cluster
- Yahoo/HP/Intel 1000 processor cloud computer
- Largest job thus far (10hrs, 512cores)
- Interactive Clustering underway Steerable Map
- Hybrid of Language Model and Small Worlds
22BeeSpace System v3
- SPACES and REGIONS
- Dynamic and Relative
- Space is collection of documents
- Region is collection of terms
- Extract creates new Region from old Space
- Map creates new Space from old Region
- New from Old Spaces and Regions via merges
- Summarize classifies Gene within Space
- Annotate finds differential functional expression
23BeeSpace Semantic Operations
- Merge (S1,S2) into S3
- Summarize (S) into Gene classify
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40New Interface v4
- Single Window, Multiple Panes
- Space Panel, Service Tabs
- SPACES custom, system
- FILTER searching, sorting
- CLUSTER map natural and steerable
- SUMMARIZE categorize using space
- ANALYZE annotate using space
41Functional Analysis v4
- The software system goes beyond a searchable
database, using statistical literature analyses
to discover functional relationships between
genes and behavior. - This research will enable all scientists who
study bee genes to live on the frontier of
integrative biology, where biotechnology enables
routine expression analysis and bioinformatics
enables functional analysis - unconstrained by pre-existing categories.
- Genelist Analyzer v4
- -Differential Expression of Gene Names against
Space - -Background is custom made Literature Space
- -Produces Concept List from Gene List
- -Analyze using Concept Navigation and Gene
Summarization -
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62Towards the Interspace
- The Analysis Environment technology is
GENERAL! BirdSpace? BeeSpace? - PigSpace? CowSpace?
-
- ArthropodSpace? AnimalSpace?
- BioSpace? MedSpace?
63Question Answering v5
- Entities and Relations
- Question Answering templates
- Entity
- Gene, Anatomical
- Behavior, Chemical
- Relation
- Regulation (Gene-Gene)
- Expression (Gene-Anatomy)
- Function (Gene-Behavior) Biological Process
- Function (Gene-Chemical) Molecular Function
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72Towards Question Answering
- Merging Filter and Summarize
- Extract Entities from Literature
- Generate Relations from Entities
- Generate Answers from Relations
- Question Answering is
- Multiple Steps of
- Entity Relation Semantics
73(No Transcript)
74(No Transcript)
75Healthcare Infrastructure
- Provider Pyramids
- Scale to Volumes for Chronic Illness
- Population Health with Patient Tracking
- Risk Assessment
- Cohort Cluster by Treatment Effects
- Simulate with Yahoo Health Messages
76Symptom Clustering
77Condition Clustering
78Informatics Researchers (Faculty)
- Investigators
- Bruce Schatz, systems (Medical Information
Science) - ChengXiang Zhai, algorithms (Computer Science)
-
- Collaborators (students)
- Saurabh Sinha, Computer Science
- Jiawei Han, Computer Science
- Sheng Zhong, Bioengineering
- Nathan Price, Chemical Biomolecular Engineering
- Collaborators (advices)
- John MacMullen, Library Information Science
- Dan Roth, Computer Science
- Roxana Girju, Linguistics
- Karrie Karahalios, Computer Science
79Informatics Researchers (Staff)
- V1-V3
- Todd Littell, research programmer
- Jim Buell, research coordinator
- Nyla Ismail, biology postdoc
- Moushumi Sen Sarma, biology postdoc
- V4-V5
- David Arcoleo, research programmer
- Barry Sanders, research programmer
- Moushumi Sen Sarma, biology postdoc
- Radhika Khetani, biology postdoc
80Informatics Researchers (Students)
- V1 Filter (parse)
- Jing Jiang, Azadeh Shakery, Yuanhua Lv
- V2 Cluster (group)
- Brant Chee, Qiaozhu Mei, Peixiang Zhao
- V3 Summarize (classify)
- Xu Ling, Jing Jiang, Qiaozhu Mei, Xin He
- V4 Analyze (annotate)
- Xin He, Brant Chee, Moushumi Sarma, Xu Ling
- V5 Answer (extract)
- Xu Ling, Xin He, Yanen Li, Yue Lu