Semantic integration of traditional and web-based information sources

About This Presentation
Title:

Semantic integration of traditional and web-based information sources

Description:

– PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 33
Provided by: infUs

less

Transcript and Presenter's Notes

Title: Semantic integration of traditional and web-based information sources


1
Semantic integration of traditional and web-based
information sources    
  • Gergely Lukácsy, BUTE
  • Péter Szeredi, BUTE
  • Péter Krauth, IQSYS
  • Attila Bodnár, IQSYS

2
What is a mashup?
  • A mashup is a website or application that
    combines content from more than one source into
    an integrated experience.
  • The etymology of this term possibly derives from
    its similar use in pop music.
  • /Wikipedia/

3
Quotes on mashups
  • Web mashups, and other Web 2.0 development (e.g.
    Ajax) are all facets of the same phenomenon that
  • information and presentation are being separated
    in ways that allow for novel forms of reuse.
  • The mash-up is the offspring of an environment
    where application developers facilitate the
    creation of integrated, yet highly derivative
    application hybrids by third parties, something
    they do by providing rich public APIs to their
    user base.

4
Whats so special about mashups?
  • Content used in mashups is typically sourced from
    a third party via a public interface or API.
  • Other methods of sourcing content for mashups
    include web feeds (e.g. RSS or Atom), web
    services and screen scrapping.
  • Some in the community believe that only cases
    where public interfaces are not used count as
    mashups.
  • Many people are experimenting with mashups using
    Google, eBay, Amazon, Flickr, and Yahoos APIs.
  • Google has a mashup editor in beta.

?
Mashup Application Integration á la Web 2.0
5
What we are going to speak of?
  • S emantic
  • IN tegration
  • T echnology
  • A pplied in
  • G rid-like,
  • M odel-driven
  • A rchitectures
  • RD project
  • Sponsored by the National Research and
    Development Program, 2005-2007
  • Consortia
  • Coordinator IQSYS
  • Developer Organisations
  • IQSYS, BUTE, SZTAKI
  • User Organisations
  • OSZK, MTI, ARECO/eBolt

6
Information Integration with Sintagma
Presentation and further processing
Database A
Application A(web service)
Data access and transformation
SINTAGMA
External Application(e.g. mashup application)
Database B
Application B(traditional)
(RDBMS, XML, RDF)
  • Separates clearly the data access and
    transformation layers of integration from the
    presentation layer
  • Uses a comprehensive metadata repository (Model
    Warehouse)
  • Semantics of data represented in the repository
    maps local and remote metadata to each other
  • Data access and transformation driven by the
    repository

7
Search and analysis of Web data
Search and analysis application (e.g. mashup)
d a t a s e r v i c e
m e t a d a t a m a p p i n g
m e t a d a t a m a p p i n g
m e t a d a t a m a p p i n g
Legend
SINTAGMA-node
8
Sintagma an approach to information integration
  • Key Principles
  • No duplication of data Model Warehouse vs. Data
    Warehouse
  • Communication one-way, on-line (no modification
    of data, instant access)
  • Integration of web services as information
    sources supported (no modification required)
  • Key Components
  • Manages various forms of metadata (Model Manager)
  • Accesses various structured and semi-structured
    information sources (Wrappers)
  • RDBMS
  • RDF
  • XML
  • Web Services
  • Preprocesses various unstructured information
    sources (Annotators)
  • Texts
  • Raster maps (labels and signs)
  • Excel tables
  • Optimises query execution query planning using
    deduction (Mediator)
  • Data Quality Control

9
Architecture of SINTAGMA
SintagmaGUI
Model Manager /Model Warehouse
DQ Engine(meta)
Mediator(local)
Model Manager(remote)
Data QualityController
DQ log
TextAnnotator
MapServer
JDBCWrapper
WSWrapper
RDFWrapper
WDWrapper
XMLWrapper
RDBMS
Web Service
RDF
HTML
XML
Text Annotation subsystem
MapAnnotator
DQ Engine(native)
Data Quality Control subsystem
Map Annotation subsystem
10
Model Warehouse of SINTAGMA
Common, clarifiedconcepts
Domain specificknowledge/ontologies
Conceptual viewsof workers in a business area
Special concepts of business areas
Domain specific terminology
Conceptual Level
Integrated Conceptual Model
Integrated Application Model
External model (e.g. BPM)
local
Application Level
unified
transformed
local
Interface Level
local
local
local
Legend
model
Source Level
mapping input
11
Modelling in SINTAGMA
  • The Model Warehouse
  • content of the Model Warehouse
  • interface models and abstractions
  • ontology concepts
  • Use cases
  • Product comparison
  • Workflow of Equipment purchase
  • Web service integration demos

12
Model Warehouse
  • Content of the Model Warehouse
  • Object-oriented models
  • Structural properties of sources in UML Object
    Model
  • Non-structural information given as OCL
    Constraints
  • Mapping between models as abstractions
  • Description Logic models
  • Queries source and conceptual level
  • Classification of models
  • interface
  • unified (application)
  • conceptual
  • Modeling SILan Semantic Integration Language
  • Describes content of Model Warehouse in textual
    format
  • Has well-defined semantics

13
Interface Models
14
Higher level models
  • Abstractions (data transformations)
  • populate higher level entities
  • Filter low level data (suppliers)
  • Transform data to appropriate higher level form
    (clients)
  • can have multiple suppliers and clients

15
Higher level models (contd)
  • Invariants
  • have to be satisfied by all the instances of a
    model element
  • can contain navigation
  • Queries
  • can be formulated on any model
  • Interface level models directly accessing data
    sources
  • higher level models using mediation
  • are interchangable with abstractions

16
Conceptual Models
17
Conceptual models (contd)
  • These models encapsulate concepts given in
    Description Logic formalism

18
Use case 1 Product comparison
  • Goal find products that are similar to the
    products in a host system
  • Information sources
  • catalogues from various vendors in Excel
  • database of the host system
  • Problems to solve
  • heterogenity of the catalogues preprocessing
  • algorithm for product comparison

19
Solution in SINTAGMA
Model Warehouse
Similar Products
Product comparison
Unified Products
Catalogue
Host Database
Excel
MySQL
XML
Excel
Excel
Preprocessing
20
Use case 2 Equipment purchase
21
Equipment purchase in an organisation
  • Scenario
  • Each department maintains a wish-list of
    equipments
  • There are vendors who provide products to
    departments
  • Vendors sell different types of products (vendor
    A sells printers and toners, Vendor B monitors
    and printers etc.)
  • The financial department dynamically designates a
    preferred vendor for each product
  • Questions is there any expensive order? what is
    teh total ? etc.
  • Information Sources
  • Departments wish-list
  • relational database with columns description,
    category, e.g. we have run out of paper,
    15/18
  • Financial department
  • Web service, with operation determining where to
    buy a given product, e.g. (15,8) -gt (A4 paper,
    4, 23)
  • Vendors
  • Heterogenous web service which return prices,
    units and delivery date, e.g. 23 -gt (12, 1,
    2007-07-01)

22
Event Driven Process Chain
23
Solution in Sintagma
24
Use case 3 Web Service Integration
  • Integrating Amazon and BarnesNoble
  • Integrating RSS-sources (e.g. origo, nol, index,
    metro)
  • Integrating World Championship Results (20o2 and
    2006)

25
Integrating Amazon and BarnesNoble
Conceptual Level
Price comparison
Availability under limit in HUF
AmazonBN
Application Level
BarnesNoble
Interface Level
currency
Amazon
Legend
model
mapping
Source Level
query input
26
Integrating results of World Championships
Positions
Team positions by year
derivation
Team matches
Team matchesby year
First FourTeams
Conceptual Level
No of matches played by teams
grouping
Optimised WC matches
Teams in both WCs
transformation
Matches in both WCs
Unified WC matches
Application Level
Matches of teams
combination
Interface Level
2002 WC
2006 WC
Legend
model
Score n-m Match Id 0-63
Score1 n Score2 m Match Id 1-64
mapping
Source Level
query input
27
Integrating RSS-feeds
TextAnnotator
opposition
goverment
Search forhigh level concepts (e.g. political
conflicts)
Conceptual Level
Search for occurances of a specific word(e.g.
budapest)
Unified RSS-feeds
members of
Application Level
combination
nol
Interface Level
metro
index
origo
VIP
Legend
model
mapping
Source Level
query input
28
Summary
  • The system
  • is a semantic information integration tool
  • handles various structured sources
  • relational, various semi-structured sources and
    web services
  • preprocesses various unstructured sources
  • texts, maps, tables
  • uses logic / constraint logic programming
  • can be used in mashup creation
  • disciplined and flexible approach to data access
    in mashups
  • separates data integration from mashup
    presentation logic
  • resolves semantic and technical differences in
    sources

29
Real estate search - Trulia
  • A real estate search engine that helps you find
    homes for sale and provides real estate
    information at the local level to help you make
    better decisions in the process. Trulia pulls in
    real estate data from partnerships with thousands
    of brokers and agents and displays it on a Google
    Maps interface.
  • Trulia shows you how sales prices have been
    trending where it mattersin your county, city,
    ZIP code and neighborhood. They also offer heat
    maps and real estate guides.
  • http//www.trulia.com/start

30
Hotel Guide - Trivop
  • The self-proclaimed first videoguide for hotels
    doesnt disappoint. Locate hotels on this Google
    Maps Hotel mashup and view user-created videos
    of the hotels. This gives a much better view of a
    prospective hotel before visiting.
  • Currently looks like they only have hotels in
    England and France, but with their recruiting
    efforts one can only assume Trivop will becoming
    to a region near you.
  • http//www.trivop.com

31
Visual Music search Music Map
  • Visual music search application mashed with
    Amazon data. Choose and artist and album, see
    related artists in an abstract tree graph.
    Wicked.
  • http//www.dimvision.com/musicmap/

32
Search for Popular Music Hype Machine
  • The Hype Machine follows music blog discussion.
    Every day, hundreds of people around the world
    write about music they love.
  • The Hype Machine tracks a variety of MP3 blogs.
    If a post contains MP3 links, it adds those links
    to its database and displays them on the front
    page.
  • Some of the frequently accessed tracks are cached
    by the Hype Machine server, much like Google
    Search caches web pages, to reduce load on the
    bloggers' servers and protect their bandwidth.
    Those tracks are NOT available for download, but
    you can preview them via the "listen" links that
    are next to each track or using your media
    player.
  • The blog that posted a particular track is
    identified under every track by name and with a
    "read post" link that leads to the blog post
    itself. If you enjoyed a track someone posted,
    stop by and let them know!
  • You can purchase CDs and individual tracks by
    using the "amazon" and "itunes" links that appear
    next to most tracks. Each purchase you make via
    the Amazon and iTunes links supports both the
    artists and the Hype Machine. Please buy and
    enjoy.
  • http//hypem.com/
Write a Comment
User Comments (0)