Title: CS652 Spring 2004 Summary
1CS652 Spring 2004Summary
2Course Objectives
- Learn how to extract, structure, and integrate
Web information - Learn what the Semantic Web is
- Learn how to build ontologies for the Semantic
Web - Investigate class-related research topics
- Be introduced to Semantic Web services
3Generally Applicable Ideas
- Semantic Understanding
- Data attribute-value pairs
- Information data in a conceptual model
- Knowledge information with agreement
- Meaning useful knowledge
- Measuring Success
- Recall NrCorrect/TotalCorrect
- Precision NrCorrect/(NrCorrectNrIncorrect)
- F-measure (ß21)PR/(ß2PR)
4Information Extraction
- Get relevant information
- Not
- Information retrieval get relevant pages
- Web mining discover unknown associations
- Wrapper maps data to a suitable format
- Generation techniques
- Machine learning (e.g. RAPIER)
- Natural language processing (e.g. RAPIER)
- Hidden Markov Models
- By-example generation tools (e.g. Lixto)
- By-pattern generation (e.g. RoadRunner)
- Wrapper Maintenance
5Information Extraction BYU Ontos
- Ontology-based
- Data frames
- Strengths
- Resilient to page changes
- Robust across sites within the same domain
- Works well with all types of data-rich text
- Weaknesses
- Hand-crafted ontologies and data frames
- Requires record-boundary recognition
- Does not learn
- Applications
- Extraction
- High-precision classification
- Schema mapping
- Semantic Web annotation
- Agent communication
- Ontology generation
6Semantic Web
- Tim Berners-Lee
- information has a well-defined meaning
- enables computers and people to work in
cooperation - Adds context and structure via metadata
- Agent computing paradigm
- Knowledge markup semantic annotation
7Ontologies
- a formal, explicit specification of a shared
conceptualization Gruber93 - Formal machine readable FOL
- Explicit concepts and constraints explicitly
defined - Shared community accepted
- Conceptualization abstract model (OSM)
- shared vocabulary
8Ontology Formalism
Ontology O ltV, Agt where V vocabulary
predicate symbols (each with some arity) A
axioms formulas (constraints and rules)
Predicates Owner(x), Vehicle(x), Car(x),
Truck(x), Owner(x) owns Vehicle(y) Formulas
?x(Car(x)?Truck(x) ? Vehicle(x)) ?x(Owner(x) ?
??1y(Owner(x) owns Vehicle(y)) Inference Rules
TruckOwner(x) - Owner(x), Owner(x) owns
Vehicle(y), Truck(y)
9Semantic Web Ontologies
10Semantic Web Annotationwith BYU Ontos
BYU Ontos Extraction Ontology
OWL Ontology
osm.cs.byu.edu/CS652s04/ontologies/OWL/carads.owl
Annotated Semantic Web Page
osm.cs.byu.edu/CS652s04/ontologies/annotatedPages/
carSrch1_semweb.html
11Ontology Generation for the Semantic Web
- Necessary for the Semantic Web
- Ontology engineering
- Tools
- Methodology
- Languages (e.g. SHOE, OWL)
- Semiautomatic generation
- NLP machine learning (e.g. OntoText)
- Create from dictionary or lexicon (e.g. Doddle)
- Generation from tables (e.g. TANGO)
- Ontology maintenance
12Ontology Libraries for theSemantic Web
- Locating ontologies
- Indexing and organization
- Search mechanisms
- Reusing ontologies
- Find one and modify
- Find several, merge and modify
13Ontology Mapping, Merging, and Integration for
the Semantic Web
- Ontology reuse
- Heterogeneous agent communication
- Agent commitment to a new ontology
- On the fly map, merge, integrate (nontrivial to
automate) - Can we do well enough?
- Can we synergistically involve a user?
- Information extraction wrt target
- Table extraction (BYU Ontos)
- Semiautomatic wrapper/mediator construction by
automatically providing mappings
14Schema Mapping
- Schema-level matchers
- Name matchers (dictionaries WordNet)
- Structural context matchers
- Instance-level matchers
- Value characteristics
- Data-frame matchers
- Mapping cardinality
- 11 (direct)
- 1n, n1, nm (indirect, complex)
- Multi-faceted mapping techniques
15Schema Integration
- FCA merge using lattices
- Global as View (GAV)
- Global mediator relations are views over source
relations - Dynamic mediator schema changes to accommodate
new sources (hard to add new sources) - Query only requires view unfolding
- Good for static, centralized systems
- TSIMMIS
- Local as View (LAV)
- Local source relations are views over mediator
relations - Fixed mediator schema new sources identify
components covered (easy to add new sources) - Complex query rewriting
- Good for dynamic, distributed systems
- Information Manifold
16What is your dream for the Semantic Web?
- Intelligent personal agents that can
- Gather (just) the information we want and deliver
it to us when we want it - Help us with scheduling
- Help us buy the goods we want
- Negotiate and conduct business for us
-
- Intelligent business agents
- Intelligent discovery agents
-
What can you do to make your dreams come true?