CMP 788 Distributed IR - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

CMP 788 Distributed IR

Description:

Machine Process-able Knowledge on the Web. Unique identity of resources and objects- URI ... of a conceptualization of a domain COMMUNITY ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 40
Provided by: GJu
Category:
Tags: cmp | distributed

less

Transcript and Presenter's Notes

Title: CMP 788 Distributed IR


1
CMP 788 Distributed IR
  • Lecture 1
  • Introduction to IR
  • Fall 05
  • Department of Mathematics and Computer Science
  • Lehman College

2
The History of IR
3
IR Development
4
IR Development (contd)
5
IR Related Areas
6
IR Related Areas (contd)
7
IR Related Areas (contd)
8
IR Related Areas (contd)
9
Brief Introduction to Ontology and Semantic Web
10
Machine Process-able Knowledge on the Web
  • Unique identity of resources and objects- URI
  • Metadata Annotations
  • Data describing the content and meaning of
    resources
  • But everyone must speak the same language
  • Terminologies
  • Shared and common vocabularies
  • For search engines, agents, curators, authors and
    users
  • But everyone must mean the same thing
  • Ontologies
  • Shared and common understanding of a domain
  • Essential for exchange and discovery
  • Inference
  • Apply the knowledge in the metadata and the
    ontology to create new metadata and new knowledge

11
What is an ontology?
  • An ontology is an explicit specification of a
    conceptualization
  • An ontology is a shared understanding of some
    domain of interest.
  • There are many definitions
  • a formal specification EXECUTABLE
  • of a conceptualization of a domain COMMUNITY
  • of some part of world that is of interest
    APPLICATION
  • Defines
  • A common vocabulary of terms
  • Some specification of the meaning of the terms
  • A shared understanding for people and machines

12
Ontologies The Semantic Backbone
13
Communities of Practice Ontology
  • Representations of ontologies will put
    semantics on
  • the web

14
XML is not good for describing ontologies
  • XML defines grammars to verify and structure
    documents
  • The grammar enforces constraints on tags
  • Different grammars define the same content
  • XML lacks a semantic model it only has a
    surface model which is a tree.


ltcourse date...gt lttitlegt...lt/titlegt ltteachergt
...lt/teachergt ltnamegt...lt/namegt lthttpgt...lt/http
gt ltstudentsgt...lt/studentsgtlt/coursegt
  • node label attr/values contents

15
XML is not good for describing ontologies
  • Meaning of XML documents is intuitively clear
  • semantic markup tags are domain terms
  • But computers do not have intuition
  • Tag names per se do not provide semantics
  • The semantics are encoded outside the XML
    specification
  • XML makes no commitment on
  • Domain specific ontological vocabulary
  • Ontological modelling primitives
  • ? requires pre-arranged agreement on ? ?
  • Feasible for closed collaboration
  • agents in a small stable community
  • pages on a small stable intranet

16
What is the Semantic Web?
  • Power search
  • Semantic integration
  • The metadata makes the central role in the S Web
    environment.
  • Distributed agents revisited
  • The mother of all databases
  • The mother of all knowledge bases
  • How many cows in Texas?
  • Query answering a knowledge base

All of the above
17
The Semantic Web
18
Language Tower
Attribution
Explanation
Rules Inference
Ontologies
Metadata annotations
Standard Syntax
Identity
19
Information Aggregation
  • A directed graph.
  • RDF repositories query languages.
  • Graph matching graph merging.
  • Model theoretic semantics by Pat Hayes
  • www.w3c.org/TR/rdf-mt

20
DAMLOIL / OWL
  • DAMLOIL /OWL ontology consists of set of axioms
    asserting characteristics of classes and
    properties and individuals, each of which can
    have an ID which is URI reference
  • E.g. Person is kind of Animal whose parents are
    Persons
  • RDF used for class/property membership assertions
    (data)
  • E.g. John is an instance of Person h John
    Mary i is an instance of parent
  • http//www.daml.org/
  • OWL Web Ontology Language 1.0 Reference
  • http//www.w3.org/TR/owl-ref/
  • DAML OIL / OWL designed to describe structure of
    domain (schema)

21
Person
participants gt1
Sport
Team-based Sport
Blackburn Rovers
participants gt1
Blackburn
Soccer Club
Soccer
Sports Club
UK
partof
Sport
Club
Europe
Country
Organisation
22
Event
Competition
Tournament
Sports Tournament
Soccer Tournament
Andy Cole
Brad Friedal
Soccer Player
Blackburn Rovers
Worthington Cup
Sports Player
23
Blackburn Rovers
Nottingham
UK
partof
Europe
birthplace
Country
Andy Cole
Soccer Player
nationality
Sports Player
Person
24
Blackburn Rovers
Lakewood
UK
USA
partof
Europe
Country
Country
Brad Friedal
birthplace
Soccer Player
nationality
Sports Player
Person
25
Information Age
26
IR in General
27
IR in General (in other words)
28
Data vs. Information Retrieval
29
DBMS vs. IRS
30
IR Systems
31
IR Systems focus on
32
The User Task
33
Document
34
Logical View of Documents
35
IR Systems in General
36
The retrieval Process
37
The retrieval Process (contd)
38
The retrieval Process (contd)
39
Topics To be Covered (Topics 1)
  • Automatic Indexing (a task for representing IR
    items automatically)
  • File and access structures for IR system (a
    physical layer)
  • Inverted file structure (covered)
  • B-tree, Hashing (reading materials)
  • IR data modeling (logical data models)
  • Set theoretic models Boolean, Fuzzy, Extended
    Boolean
  • Algebraic Model Vector Space Model
  • Probabilistic Model based on probabilistic
    indexing ( VSM)
  • Performance Measures (Effectiveness Recall vs.
    Precision)
  • Retrieval Performance Enhancement Techniques
    (just introduction to relevance feedback-based on
    machine learning techniques (e.g., Artificial
    Neural Network).
Write a Comment
User Comments (0)
About PowerShow.com