Title: Automatically Generating Gene Pathways from Biomedical Literature
1Automatically Generating Gene Pathways from
Biomedical Literature
- Bruce Schatz, Principal Investigator
- Department of Medical Information Science
- University of Illinois at Urbana-Champaign
- schatz_at_uiuc.edu, www.canis.uiuc.edu
- Institute for Genomic Biology
- Theme for Genomics of Neural and Behavioral
Plasticity - www.beespace.uiuc.edu
- U. Michigan NCBC Collaboration Visit, February
10, 2006
2 Institute for Genomic Biology (IGB)
- Project Cost 75 million
- Gross square feet (GSF) 174,485
- Net assignable square feet (NASF) 100,816
- Move in Headcount 392
- Completion Date October 2006
3IGB Research Schematic
Host-Microbe Systems
Microbial Genome Informatics
Genomics of Neural and Behavioral Plasticity
Mining Microbial Genomes for Novel Antibiotics
Animal Genome Informatics
Genomic Ecology of Global Change
Regenerative Biology and Tissue Engineering
Biocomplexity
Molecular Bioengineering of Biomass Conversion
Research Core Facilities
Vivarium
Bioinformatics
Precision Proteomics
Program Area 1 - Systems Biology Program Area 2 -
Cellular and Metabolic Engineering Program Area 3
- Genome Technology
4BeeSpace FIBR Project
- BeeSpace project is NSF FIBR flagship
- Frontiers Integrative Biological Research,
- 5M for 5 years at University of Illinois
- Nature-Nurture using honey bee as model
- Genome technologies in wet lab and dry lab
biology - Localized Gene Expression for Normal Social
Behavior - Gene Robinson, Entomology (behavioral
expressions) - Susan Fahrbach, Entomology (anatomical
localization) - Sandra Rodriguez-Zas, Animal Sciences (data
analysis) - Interactive Information System for Functional
Analysis - Bruce Schatz, Library Information Science
(info systems) - ChengXiang Zhai, Computer Science (text
analysis) - Chip Bruce, Library Information Science (user
support)
5Conceptual Navigation in BeeSpace
6BeeSpace Prototype Collections
- Organism
- Bee Apis mellifera
- Fly Fly Ecology, Evolution and Behavior
- Bird Bird Communication
- Development
- Behaviorial Maturation
- Development Development of insects
- Communication Communication by insects
- Behavior
- Agonistic Agonistic and Territorial Behaviors
- Forage Behavior of Resource Acquisition
- Nest Home Maintenance and Defense
- Social Behavior of Social Integration in Insects
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12Pathway ExtractionPrototype System
- Systems Integration and Data Collections --
Bruce Schatz - Text Documents producing Relationship Graphs
- Language Based Entity Summarization ChengXiang
Zhai (CS) - Frequent Subgraph Data Mining Jiawei Han (CS)
- Ordering Graphs into Pathways using Biology
Datasets - Microarray Derived Signaling Pathway Sheng
Zhong (Bioeng) - Sequence Derived Gene Regulation Saurabh Sinha
(CS) - Microarray Derived Functional Clusters Ping Ma
(Statistics)
13From Text to Entity-Relation Graph
Enhance
Gene
Biomedical Text
Gene
14Adapting Biological Named Entity Recognizer
E
test data
training data
T1
Tm
15Preliminary Evaluation Results
- Recognizing gene names
- Maximum entropy/Logistic regression recognizer
- Text data from BioCreAtIvE (Medline)
- 3 organisms (Fly, Mouse, Yeast), each contributes
5,000 sentences with 2,500 with gene mentions
16Scalable Mining and Searching of Graph Data
- Efficient frequent and closed graph mining GSpan
(1 in Google Scholar graph mining) and
CloseGraph (11) - gIndex Graph indexing by data mining approach
(SIGMOD04, invited to TODS05) - Search graphs in massive biological and chemical
databases - Grafil Similarity search in massive graph data
sets (SIGMOD05, invited to TODS06) - Pattern compression and profile exploration
(KDD05 award, invited to Machine Learning
journal, VLDB05)
17CODENSE Mine Coherent Dense Subgraphs (ISMB05)
18Our Recent Work on Graph/Network Mining
- D. Cai, Z. Shao, X. He, X. Yan, and J. Han,
Community Mining from Multi-Relational
Networks, PKDD'05. - H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, Mining
Coherent Dense Subgraphs across Massive
Biological Networks for Functional Discovery,
ISMB'05. - C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu,
Mining Behavior Graphs for Backtrace'' of
Noncrashing Bugs'', SDM'05 - C. Liu, X.Yan, and J. Han, Mining Control Flow
Abnormality for Logic Error Isolation, SDM'06. - X. Yan and J. Han, gSpan Graph-Based
Substructure Pattern Mining, ICDM'02 - X. Yan and J. Han, CloseGraph Mining Closed
Frequent Graph Patterns, KDD'03 - X. Yan, P. S. Yu, and J. Han, Graph Indexing A
Frequent Structure-based Approach, SIGMOD'04 - X. Yan, X. J. Zhou, and J. Han, Mining Closed
Relational Graphs with Connectivity Constraints,
KDD'05 - X. Yan, P. S. Yu, and J. Han, Substructure
Similarity Search in Graph Databases, SIGMOD'05 - X. Yan, F. Zhu, J. Han, and P. S. Yu, Searching
Substructures with Superimposed Distance,
ICDE'06 - D. Xin, J. Han, X. Yan and H. Cheng, Mining
Compressed Frequent-Pattern Sets, VLDB'05. - X. Yan, H. Cheng, J. Han, and D. Xin,
Summarizing Itemset Patterns A Profile-Based
Approach, KDD'05.
19From microarray to transcriptional network
http//bioinfor.bioen.uiuc.edu
Dannenberg, Zhong et al, Genes Dev. 2005
20Revised confidence in Pathway
Pathway
Revised pathway
Gene
Tr.Factor
Gene
Gene
Gene
Promoters
Gene targets
.
Orthologs
Scan Genome (Stubb software)
Phylogenetic Motif finding
DNA motif(s)
Sinha (2006) unpublished
21Epigenetic regulatory network elucidation
through multiple information
integration
Ma et al (2006) unpublished data
22Cancer Drug Response Prediction through Gene
Expression Profiling
Cancer drug GI 50
Ma et al (2005) unpublished manuscript