Title: Towards Telesophy: Federating All the Worlds Knowledge
1Towards TelesophyFederating All the Worlds
Knowledge
Bruce R. SchatzCANIS Laboratory Department of
Medical Information Science Department of
Computer Science University of Illinois at
Urbana-Champaign schatz_at_uiuc.edu,
www.canis.uiuc.edu
Google Tech Talk Mountain View, CA July 11, 2007
2Telesophy Session 1985
3Half-way to Hive-mind
- Access Fetching
- Organization Searching
- Analysis Comparing
- Synthesis Combining
4OUTLINE
- Point of View
- Scalable Semantics
- Concept Navigation
- Hero experiment
5Cyberspace Visions
6THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
7Linguistics Levels and Universal Units
- 1985 Syntax Files (wholes)
- 1995 Structure Records (parts)
- 2005 Semantics Concepts (meaning)
- 2015 Pragmatics Features (reality)
8Federation Levels and Functions
- 1985 Syntax Federation (e.g. Telesophy)
- uniform formats, duplicate elimination
- 1995 Structure Federation (e.g. DeLIver)
- uniform markups, tag-value equivalence
- 2005 Semantics Federation (e.g. BeeSpace)
- phrase typing, concept switching
9SCALABLE SEMANTICS
- Surface versus Deep Structure
- Broad Context beats Deep Meaning
- Parsing from Phrases to Entities
- Co-occurring from Pairs to Graphs
- Think Globally, Act Locally
10LEVELS OF INDEXES
11Towards Typed Entities
- Hand Tagged XML (Semantic Web)
- Domain Dependent DTDs (Entity Types)
- Machine Tagged with Training Sets
- Using Phrases and Parts of Speech
- Names Persons Places - Things
12Functional Phrases
- ltgenegt encodes ltchemicalgt
- Sokolowski and colleagues demonstrated in
Drosophila melanogaster that the foraging gene
(for) encodes a cGMP dependent protein kinase
(PKG). - The dg2 gene encodes a cyclic guanosine
monophosphate (cGMP)- dependent protein kinase
(PKG). - ltchemicalgt affects/causes ltbehaviorgt
- Thus, PKG levels affected food-search behavior.
- cGMP treatment elevated PKG activity and caused
foraging behavior. - ltgenegt regulates ltbehaviorgt
- Amfor, an ortholog of the Drosophila for gene, is
involved in the regulation of age at onset of
foraging in honey bees. - This idea is supported by results for malvolio
(mvl), which encodes a manganese transporter and
is involved in regulating Drosophila feeding and
age at onset of foraging in honey bees.
13Biology Entities
- Easy (Systematic Names)
- Organism / Chemical
- Medium (Some Variations)
- Gene / Bodypart
- Hard (Always Idiosyncratic)
- Behavior / Phenotype
14Towards Concept Spaces
- Extract Units automatically from text
- Compute Context Graphs from units
- Co-occurrence Frequency pairwise
- Mutual Information all pairwise links
- Bandpass Filters and Domain Weights
15COMPUTING CONCEPTS
92 4,000 (molecular biology) 93 40,000
(molecular biology) 95 400,000 (electrical
engineering) 96 4,000,000 (engineering) 98
40,000,000 (medicine)
16Medical Concept Spaces (1998)
- Medical Literature (Medline, 10M abstracts)
- Partition with Medical Subject Headings (MeSH)
- Community is all abstracts classified by core
term - 40M abstracts containing 280M concepts
- computation is 2 days on NCSA Origin 2000
- Simulating World of Medical Communities
- 10K repositories with gt 1K abstracts
- (1K with gt 10K)
17Small World Graph
- Community Structure enables Dynamic Clustering
with High Coherence
18CONCEPT NAVIGATION
- Manual by Humans
- Interaction user navigating
- Classification collection tagging
- Automatic by Computers
- Federation search bridges
- Integration results links
19Towards the Interspace
- from Objects to Concepts
- from Syntax to Semantics
- Infrastructure is Interaction with Abstraction
Internet is packet transmission across
computers Interspace is concept navigation
across repositories
20Concept Navigation
21(No Transcript)
22(No Transcript)
23Concept Navigation in BeeSpace
24BeeSpace General Bioinformatics
- Bioinformatics of Genes and Behavior
- Using scalable semantics technology
- Using General Expressions and Literatures
- Annotation Pipelines from Sequence and Text
- Creating and Merging multiple SPACES
- Where REGIONS are semantically created
- And useful regions become shared spaces
25Analysis Environment Functions
- SPACE is a Paradigm not a Metaphor!
- Point of View for YOUR Problem
- Externally
- -Dynamically describe custom Region of Space
- -Merge Regions to form Hypothesis Space
- -Differentially express genes against Space
26Analysis Environment Structures
- Concepts and Genes are Universal Entities
- Uniformly Represented
- Uniformly Manipulated
- Internally
- -Extract and Index Concepts within Collections
- -Navigate Concepts within Documents
- -Follow Genes from Documents into Databases
-
27BeeSpace Semantic Operations
- Merge (S1,S2) into S3
- Summarize (S) into Gene classify
28BeeSpace v3 Example
- Refining and Merging Space Regions
- Cross bee species differential gene expression
for behavioral maturation into adult forager - Comparative Analysis for Similar Situation
- Behavioral Maturation merge into
- Cross-Species Comparisons
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44Towards the Interspace
- The Analysis Environment technology is
GENERAL! BirdSpace? BeeSpace? - PigSpace? CowSpace?
- BrainSpace? BehaviorSpace?
- BioSpace
- Interspace
45THE DISTRIBUTED WORLD
- Community Repositories in the Interspace
- Peer to Peer Networking Infrastructure
- Every Person performs Every Role
USER request LIBRARIAN reference INDEXER classif
y PUBLISHER quality AUTHOR generate
46ISPACE (Illinois Interspace)
- TEXT (library, courses)
- CONTEXT (conversations, relationships)
- Meta-Analysis to Forge useful Links
- Google Books plus Rice Connexions
- GMAIL plus GPHONE
- Text plus Message plus Voice
- Internal Federate plus Integrate
- External Science plus Scholarship
- University Environment via Social Networks
47Today the Hive Tomorrow the HiveMind