Title: Building Analysis Environments Beyond the Genome and the Web
1Building Analysis Environments Beyond the Genome
and the Web
Bruce R. Schatz CANIS LaboratorySchool of
Library Information ScienceSchool of
Biomedical Health Information
Sciences University of Illinois at
Urbana-Champaign schatz_at_uiuc.edu ,
www.canis.uiuc.edu
Michigan Life Sciences Corridor Bioinformatics,
University of Michigan March 14, 2001
2Technological Progress
- In the past decade, technology has created
- the Genome and the Web
- In 1991, these ideas were only plans
- In 2001, they have already progressed
- from research systems to commercial products
- In the next decade, the revolution will actually
- begin and the world will be completely different!
3Paradigm Shift (Pre)
- Towards Dry-Lab Biology, Walter Gilbert (Jan
1991) - The new paradigm, now emerging, is that all the
'genes' will be known (in the sense of being
resident in databases available electronically),
and that the starting point of a biological
investigation will be theoretical. An
individual scientist will begin with a
theoretical conjecture, only then turning to
experiment to follow or test that hypothesis.
... - To use this flood of knowledge the total
sequence of the human and model organisms, which
will pour across the computer networks of the
world, biologists not only must become
computer-literate, but also change their approach
to the problem of understanding life. ... - The Coming of Informational Science
- Correlation of Information across Sources
4Paradigm Shift (Post)
- Dissecting Human Disease, Victor McKusick (Feb
2001) - Structural genomics Functional genomics
- Genomics Proteomics
- Map-based gene discovery Sequence-based gene
discovery - Monogenic disorders Multifactorial disorders
- Specific DNA diagnosis Monitoring susceptibility
- Analysis of one gene Analysis of multi-gene
pathways - Gene action Gene regulation
- Etiology (mutation) Pathogenesis (mechanism)
- One species Several species
5Analysis Environments I
- The Present -- Year 2001
- Search Central Archives
- Locating a Generic (average) solution
- mining sequences from the Genome
- diagnosing diseases from the Clinical Trial
- some Problems may have point Solutions
- find the cystic fibrosis gene
- find the diabetes treatment
6Analysis Environments II
- The Future -- Year 2011
- Navigate Distributed Repositories
- Locating a Specific (situational) solution
- correlating sequences, genes, expressions
- correlating diagnoses, treatments, lifestyles
- most Problems have cluster Solutions
- find genes for Heart Disease
- find treatments for Arthritis
7Testbeds of the Future
- WCS -- a testbed for the world of 2001
- community repositories before the Web
- in 1991, a distributed analysis environment
- MCS -- a testbed for the world of 2011
- concept navigation before the Interspace
- in 2001, a biomedical analysis environment
- to enable Michigan Corridor faculty and students
- to live in the world of the future (information
space)
8Community Systems
results
data
(database management)
(electronic mail)
knowledge
(hypertext annotations)
literature
news
(information retrieval)
(bulletin boards)
Formal
Informal
browse and share all the knowledge of a community
9Worm Community System
- WCS Information
- Literature BIOSIS, MEDLINE, newsletters,
meetings - Data Genes, Maps, Sequences, strains, cells
- WCS Functionality
- Browsing search, navigation
- Filtering selection, analysis
- Sharing linking, publishing
- WCS 250 users at 50 labs across Internet (1991)
10WCS Molecular
11WCS Cellular
12WCS Publishing
13WCS Linking
14WCS invokes gm
15WCS vis-Ã -vis acedb
16WCS PPCS demo
17A Model Community
- 1984-1988 Telesophy (Bellcore)
- prototype to federate objects
- 1989-1994 WCS (Arizona)
- testbed in molecular biology
- National Model for Biomedical Informatics
- NAS National Collaboratories report
- NIH Human Brain project
- Translational Results
- NCSA Mosaic into Web browsers
- acedb (worm) into Genome databases
- Biology Workbench, 10K users across Web
18THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
19Towards the Interspace
- from Objects to Concepts
- from Syntax to Semantics
- Infrastructure is Interaction with Abstraction
Internet is packet transmission across
computers Interspace is concept navigation
across repositories
20COMPUTING CONCEPTS
92 4,000 (molecular biology) 93 40,000
(molecular biology) 95 400,000 (electrical
engineering) 96 4,000,000 (engineering) 98
40,000,000 (medicine)
21Simulating a New World
- Obtain discipline-scale collection
- MEDLINE from NLM, 10M bibliographic abstracts
- human classification Medical Subject Headings
- Partition discipline into Community Repositories
- 4 core terms per abstract for MeSH classification
- 32K nodes with core terms (classification tree)
- Community is all abstracts classified by core
term - 40M abstracts containing 280M concepts
- concept spaces took 2 days on NCSA Origin 2000
- Simulating World of Medical Communities
- 10K repositories with gt 1K abstracts (1K w/ gt
10K)
22Concept Navigation
- Semantic Indexes for Community Repositories
- Navigating Abstractions within Repository
- concept space
- category map
- Interactive browsing by Community experts
23Interspace Remote Access Client
24Navigation in MEDSPACE
- For a patient with Rheumatoid Arthritis
- Find a drug that reduces the pain (analgesic)
- but does not cause stomach (gastrointestinal)
bleeding
Choose Domain
25Concept Search
26Concept Navigation
27Retrieve Document
28Navigate Document
29Retrieve Document
30Concept Switching
- In the Interspace
-
- each Community maintains its own repository
- Switching is navigating Across repositories
- use your specialty vocabulary to search another
specialty
31Biomedical Session
32Categories and Concepts
33Concept Switching
34Document Retrieval
35Towards A Model Discipline
- 1995-1999 Interspace (Illinois, Urbana)
- prototype to federate concepts
- 2000-2004 MEDSPACE (Illinois, Chicago)
- testbed in clinical medicine (plan, demo)
- National Model for Biomedical Informatics
- lead news in Science on MEDLINE dry-run
- Best Paper at AMIA (Medical Informatics)
- 2001-2005 MCS (Michigan)
- testbed in biomedical research
36Michigan Interspace
- Gather the Information Sources
- Michigan Corridor System (MCS)
- each (department, institute, lab) has repository
- Generate the Community Repositories
- text documents with articles and annotations
- specialty datatypes databases and motifs
- Construct the Analysis Environment
- federated concept navigation across repositories
- type-dependent parsing for text/data interlinks
37MCS Sources
- Literature
- Journals MEDLINE, BIOSIS, full-text
- Specialty Conferences (e.g. Neuroscience)
- Community Newsletters, Lab Annotations
- Databases
- Sequences GENBANK, Celera
- Genes and Maps from Model Organisms
- Microarray Expressions, Protein Structures
- Gene Pathways, Cellular Anatomy
38Ten Steps from Here to There
- Determine Users (range of needs)
- Develop Hardware (networks)
- Determine Collections (range of types)
- Develop Software (databases)
- Interlinks Automatic (name recognition)
- Interlinks Manual (distributed annotation)
- Community Literature (journals, conferences)
- Concept Navigation (indexing, switching)
- Custom Databases (community datasets)
- Custom Software (specialized analysis)
39Bioinformatics Center
- Institute for Biological Information Systems
- develop new information systems
- deploy to study biological systems
- integrated analysis for biological information
- analysis environment for community repositories
- Interspace technologies support Communities
- Basic Science Individual Genomes
- Clinical Practice Individual Patients
40IBIS New Glory
- Institute for Biological Information Systems
- unique facility for all Michigan laboratories
- interactive systems training for all levels
- IBIS reborne
- Thoth, sacred ibis who hatched the world
- inventor of writing, keeper of divine archives
- inventor of arts sciences, medicine surgery
- First of the magicians, he was called the Elder
- His disciples claimed access to the crypt where
he kept his books of magic, so they undertook to
decipher and learn these formulas which
commanded all the forces of nature and subdued
the very gods themselves.