Title: The Evolution of the Net: Predicting Global Infrastructure
1The Evolution of the NetPredicting Global
Infrastructure
Bruce R. SchatzCANIS Laboratory Graduate School
of Library Information Science schatz_at_uiuc.edu,
www.canis.uiuc.edu
Department of Computer Science seminar University
of Illinois, February 14, 2005
2Art of Physical Architecture
3Art of Logical Architecture
4The Evolution of the Net
- Niels Bohr on Quantum Theory
- Prediction is very Difficult, especially about
the Future
5THE THIRD WAVE OF NET EVOLUTION
CONCEPTS
OBJECTS
PACKETS
6Computer Science and Infrastructure
- Transparent Federation across Sources
- Generic Protocols for Global Infrastructure
- Ultimate Goal is cyberspace visions of
- being one with all the worlds knowledge
7Computer Science and Infrastructure
- 1985 Operating Systems caching
- 1995 Database Management tagging
- 2005 Information Retrieval clustering
- 2015 Artificial Intelligence recognizing
8Linguistics Levels and Universal Units
- 1985 Syntax Files (wholes)
- 1995 Structure Records (parts)
- 2005 Semantics Concepts (meaning)
- 2015 Pragmatics Features (reality)
9Evolution of Information Retrieval
Evolution of Information Retrieval across the Net
from Bruce R. Schatz, Information Retrieval in
Digital Libraries Bringing Search to the Net
cover article in Science, vol 275, Jan 17, 1997
special issue on Bioinformatics
10- WorldWide Information Spaces
111985 Syntax Federation
- Same Query into Multiple Sources
- Results return Uniform Packages
- Packets are for Bits, but Objects need more
- Information Units are for Database Items
121985 Technology Environment
- CMU Computer Science Andrew
- Apollo Domain distributed file system
- Xerox Star multimedia document system
- Bellcore Network Systems Fibers
- Telenet International Packet Switches
- Dialog Bibliographic Text Searches
13Telesophy Prototype
- Distributed Documents
- Distributed Collections
- Multimedia Documents
- Networked Hypertext
- Document Browsing (links across sources)
- Document Search (texts across sources)
14Telesophy Session
15Telesophy Implementation
- Bitmapped Workstation with Custom Software
- 30K Apollo with 10Mb/s WAN
- Windows via Brown hypertext
- Objects via Xerox Smalltalk
- Information Units and Data Items
- 300K Units across 20 sources
- Bellcore RD, 2.5M 1984-1988
16Operating System Research
- Browsing requires Caching across Internet
- Raw bandwidth insufficient
- 200ms Ping versus 250ms Saccade
- Lookahead Applications Specific Protocols
- 1987 Internet Research Task Force
- 1989 ARPANET 20th Anniversary
- 1990 Dissertation on Interactive Retrieval
17- WorldWide Information Spaces
181995 Structure Federation
- Search using Parts of Documents
- Transparent merge different Schema
- Results return Complete Displays
- Displayers invoked for all types
191995 Technology Environment
- NCSA and the World-Wide Web
- Mosaic multimedia document browsing
- HTTP standard query protocol
- University Library and Online Retrieval
- Ovid full-text journal searching
- SGML standard document protocol
20DeLIver System
- Full Distributed Documents
- Full Displays with tables and equations
- Distributed Collections from publishers
- Single Federated Collection
- Streamlined search using tag structure
- Canonical tag schema with translation
21DeLIver Session
22DeLIver Implementation
- Desktop PC plus Custom Software Integration
- 5K IBM Personal Computer
- Mosaic via NCSA hypertext
- Displays via SoftQuad viewers
- Custom DTD and SSL for tags and styles
- 100K articles for 3000 users
- NSF DLI, 5M 1994-1998
23Database Management Research
- Metadata Extraction for Structure Federation
- Raw schema insufficient
- Different names and different types
- Author tags in physics vs mathematics
- 1995 interactive databases using Mosaic
- 1997 Beat Elsevier using canonical tags
- 1999 production distributed XML federation
24- Semantic Concept Switching
252005 Semantic Federation
- Search using Concepts above Words
- Extraction of Concepts from Documents
- Statistical Index on Community Collections
- Concept Navigation across Collections
262005 Technology Environment
- Web Portals and statistical NLP
- Google statistical linked contexts
- NLP statistical generic parsers
- Fast Processors and Big Disks
- Gigaflops Beowulfs and cluster computing
- Terabytes RAIDs and literature scaling
27BeeSpace System
- Fully Parsed Documents
- Concepts and Entities auto generated
- Distributed Collections from communities
- Fully Related Concepts
- Switching across Community Repositories
- Automatic Links to Entity Databases
28BeeSpace Session
29BeeSpace Implementation
- Commodity PC plus Custom Software
- 1K Dell Personal Computer
- 15K Server 1 Gflops 2 TBytes
- Semantic Indexing generic scalable
- Concept Extraction and Normalization
- Concept Co-occurrence on Collections
- 50M articles across 50K repositories
30Information Retrieval Research
- Statistical Clustering Equivalent Phrases
- Raw phrases insufficient
- Phrase parsing with normalization
- Entity recognition with normalization
- 1998 semantic indexing
- (concepts from terms)
- 1999 information spaceflight
- (categories from documents)
31CONCEPT SPACES
- from Objects to Concepts
- from Syntax to Semantics
- Infrastructure is Interaction with Abstraction
Internet is packet transmission across
computers Interspace is concept navigation
across repositories
32LEVELS OF INDEXES
33Technology Trends
- IEEE Computer for January 2002
- Information Infrastructure for Trends issue
- Document Representation (Semantic Web)
- Language Parsing (TIPSTER)
- Statistical Indexing (TREC)
- Peer-Peer Networking (SETI_at_home)
- Vocabulary Switching (UMLS)
34SCALABLE SEMANTICS
- Automatic indexing
- Domain-Independent indexing
- Statistical clustering
- Compute Context of
- concepts within documents
- documents within repositories
35COMPUTING CONCEPTS
92 4,000 (molecular biology) 93 40,000
(molecular biology) 95 400,000 (electrical
engineering) 96 4,000,000 (engineering) 98
40,000,000 (medicine)
36SIMULATING A NEW WORLD
- Obtain discipline-scale collection
- MEDLINE from NLM, 10M bibliographic abstracts
- human classification Medical Subject Headings
- Partition discipline into Community Repositories
- 4 core terms per abstract for MeSH classification
- 32K nodes with core terms (classification tree)
- Community is all abstracts classified by core
term - 40M abstracts containing 280M concepts
- concept spaces took 2 days on NCSA Origin 2000
- Simulating World of Medical Communities
- 10K repositories with gt 1K abstracts (1K w/ gt
10K)
37COMMUNITY PROCESSING
38INTERSPACE NAVIGATION
- Semantic Indexes for Community Repositories
- Navigating Abstractions within Repository
- concept space category map
- Interactive browsing by Community experts
- www.canis.uiuc.edu/interspace-prototype
39Interspace Remote Access Client
40Navigation in MEDSPACE
- For a patient with Rheumatoid Arthritis
- Find a drug that reduces the pain (analgesic)
- but does not cause stomach (gastrointestinal)
bleeding
Choose Domain
41Concept Search
42Concept Navigation
43Retrieve Document
44Navigate Document
45Retrieve Document
46Concept Navigation
47(No Transcript)
48SWITCHING
- In the Interspace
-
- each Community maintains its own repository
- Switching is navigating Across repositories
- use your vocabulary to search
another specialty
49CONCEPT SWITCHING
- Concept versus Term
- set of semantically equivalent terms
- Concept switching
- region to region (set to set) match
50Biomedical Session
51Categories and Concepts
52Concept Switching
53Document Retrieval
54THE NET OF THE 21st CENTURY
- Beyond Objects to Concepts
- Beyond Search to Analysis
- Problem Solving via Cross-Correlating Multimedia
Information across the Net - Every community has its own special library
- Every community does semantic indexing
- The Interspace approximates Cyberspace
55- Semantic Concept Switching
- Continuous Feature Monitors
562015 Pragmatics Federation
- Beyond Words and Concepts to Reality
- Feature Vectors describing Situation
- Each Individual has Vector (lt Community)
- Discrete Samples into Continuous Monitors
572015 Technology Environment
- Continuous Vector Recording
- Health Grid personal lifestyle monitors
- Peer-to-Peer beyond Napster and Amazon
- Individual User Modeling
- Cohort Grouping custom clustering
- Adaptable Interfaces multiple levels
58Lifestyle Monitor System
- Continuous Monitoring
- Adaptive Questionnaires full-spectrum
- Distributed Collections from individuals
- Situational Analysis
- Structured Vectors custom for Individuals
- Population Cohorts for Decision Support
59Lifestyle Monitor Questions
Sample General Health Questions for User
Modeling
60Lifestyle Monitor Session
61Artificial Intelligence Research
- Structured Vectors Individual customized
- Raw concepts insufficient
- Adaptive Concepts for individual situations
- Structured Vectors for cohort clustering
- Situational Analysis infrastructure support
- 2007 Internet Health Monitors prototypes
- 2011 Population Health Monitors for chronic
illness regionally deployed
62THE DISTRIBUTED WORLD
- Community Repositories in the Interspace
- Peer to Peer Networking Infrastructure
- Every Person performs Every Role
USER request LIBRARIAN reference INDEXER class
ify PUBLISHER quality AUTHOR generate
63FEATURE VECTORS
- from Concepts to Features
- from Semantics to Pragmatics
- Infrastructure is Interaction with Abstraction
Interspace is concept navigation across
repositories Intermind is feature comparison
across individuals
64Towards the Intermind
- Beyond Concepts to Features
- Beyond Analysis to Synthesis
- Problem Solving via Cross-Correlating Universal
Knowledge across the Net - Every individual has its own special vector
- Every viewpoint does semantic clustering
- The Intermind is true Cyberspace
65Today the Hive Tomorrow the HiveMind