Title: CMP 788 Distributed IR
1CMP 788 Distributed IR
- Lecture 1
- Introduction to IR
- Fall 05
- Department of Mathematics and Computer Science
- Lehman College
2The History of IR
3IR Development
4IR Development (contd)
5IR Related Areas
6IR Related Areas (contd)
7IR Related Areas (contd)
8IR Related Areas (contd)
9Brief Introduction to Ontology and Semantic Web
10Machine Process-able Knowledge on the Web
- Unique identity of resources and objects- URI
- Metadata Annotations
- Data describing the content and meaning of
resources - But everyone must speak the same language
- Terminologies
- Shared and common vocabularies
- For search engines, agents, curators, authors and
users - But everyone must mean the same thing
- Ontologies
- Shared and common understanding of a domain
- Essential for exchange and discovery
- Inference
- Apply the knowledge in the metadata and the
ontology to create new metadata and new knowledge
11What is an ontology?
- An ontology is an explicit specification of a
conceptualization - An ontology is a shared understanding of some
domain of interest. - There are many definitions
- a formal specification EXECUTABLE
- of a conceptualization of a domain COMMUNITY
- of some part of world that is of interest
APPLICATION - Defines
- A common vocabulary of terms
- Some specification of the meaning of the terms
- A shared understanding for people and machines
12Ontologies The Semantic Backbone
13Communities of Practice Ontology
- Representations of ontologies will put
semantics on - the web
14XML is not good for describing ontologies
- XML defines grammars to verify and structure
documents - The grammar enforces constraints on tags
- Different grammars define the same content
- XML lacks a semantic model it only has a
surface model which is a tree.
ltcourse date...gt lttitlegt...lt/titlegt ltteachergt
...lt/teachergt ltnamegt...lt/namegt lthttpgt...lt/http
gt ltstudentsgt...lt/studentsgtlt/coursegt
- node label attr/values contents
15XML is not good for describing ontologies
- Meaning of XML documents is intuitively clear
- semantic markup tags are domain terms
- But computers do not have intuition
- Tag names per se do not provide semantics
- The semantics are encoded outside the XML
specification - XML makes no commitment on
- Domain specific ontological vocabulary
- Ontological modelling primitives
- ? requires pre-arranged agreement on ? ?
- Feasible for closed collaboration
- agents in a small stable community
- pages on a small stable intranet
16What is the Semantic Web?
- Power search
- Semantic integration
- The metadata makes the central role in the S Web
environment. - Distributed agents revisited
- The mother of all databases
- The mother of all knowledge bases
- How many cows in Texas?
- Query answering a knowledge base
All of the above
17The Semantic Web
18Language Tower
Attribution
Explanation
Rules Inference
Ontologies
Metadata annotations
Standard Syntax
Identity
19Information Aggregation
- A directed graph.
- RDF repositories query languages.
- Graph matching graph merging.
- Model theoretic semantics by Pat Hayes
- www.w3c.org/TR/rdf-mt
20DAMLOIL / OWL
- DAMLOIL /OWL ontology consists of set of axioms
asserting characteristics of classes and
properties and individuals, each of which can
have an ID which is URI reference - E.g. Person is kind of Animal whose parents are
Persons - RDF used for class/property membership assertions
(data) - E.g. John is an instance of Person h John
Mary i is an instance of parent - http//www.daml.org/
- OWL Web Ontology Language 1.0 Reference
- http//www.w3.org/TR/owl-ref/
- DAML OIL / OWL designed to describe structure of
domain (schema)
21Person
participants gt1
Sport
Team-based Sport
Blackburn Rovers
participants gt1
Blackburn
Soccer Club
Soccer
Sports Club
UK
partof
Sport
Club
Europe
Country
Organisation
22Event
Competition
Tournament
Sports Tournament
Soccer Tournament
Andy Cole
Brad Friedal
Soccer Player
Blackburn Rovers
Worthington Cup
Sports Player
23Blackburn Rovers
Nottingham
UK
partof
Europe
birthplace
Country
Andy Cole
Soccer Player
nationality
Sports Player
Person
24Blackburn Rovers
Lakewood
UK
USA
partof
Europe
Country
Country
Brad Friedal
birthplace
Soccer Player
nationality
Sports Player
Person
25Information Age
26IR in General
27IR in General (in other words)
28Data vs. Information Retrieval
29DBMS vs. IRS
30IR Systems
31IR Systems focus on
32The User Task
33Document
34Logical View of Documents
35IR Systems in General
36The retrieval Process
37The retrieval Process (contd)
38The retrieval Process (contd)
39Topics To be Covered (Topics 1)
- Automatic Indexing (a task for representing IR
items automatically) - File and access structures for IR system (a
physical layer) - Inverted file structure (covered)
- B-tree, Hashing (reading materials)
- IR data modeling (logical data models)
- Set theoretic models Boolean, Fuzzy, Extended
Boolean - Algebraic Model Vector Space Model
- Probabilistic Model based on probabilistic
indexing ( VSM) - Performance Measures (Effectiveness Recall vs.
Precision) - Retrieval Performance Enhancement Techniques
(just introduction to relevance feedback-based on
machine learning techniques (e.g., Artificial
Neural Network).