Title: Semantic integration of traditional and web-based information sources
1Semantic integration of traditional and web-based
information sources   Â
- Gergely Lukácsy, BUTE
- Péter Szeredi, BUTE
- Péter Krauth, IQSYS
- Attila Bodnár, IQSYS
2What is a mashup?
- A mashup is a website or application that
combines content from more than one source into
an integrated experience. - The etymology of this term possibly derives from
its similar use in pop music. - /Wikipedia/
3Quotes on mashups
- Web mashups, and other Web 2.0 development (e.g.
Ajax) are all facets of the same phenomenon that
- information and presentation are being separated
in ways that allow for novel forms of reuse. - The mash-up is the offspring of an environment
where application developers facilitate the
creation of integrated, yet highly derivative
application hybrids by third parties, something
they do by providing rich public APIs to their
user base.
4Whats so special about mashups?
- Content used in mashups is typically sourced from
a third party via a public interface or API. - Other methods of sourcing content for mashups
include web feeds (e.g. RSS or Atom), web
services and screen scrapping. - Some in the community believe that only cases
where public interfaces are not used count as
mashups. - Many people are experimenting with mashups using
Google, eBay, Amazon, Flickr, and Yahoos APIs. - Google has a mashup editor in beta.
?
Mashup Application Integration á la Web 2.0
5What we are going to speak of?
- S emantic
- IN tegration
- T echnology
- A pplied in
- G rid-like,
- M odel-driven
- A rchitectures
- RD project
- Sponsored by the National Research and
Development Program, 2005-2007 - Consortia
- Coordinator IQSYS
- Developer Organisations
- IQSYS, BUTE, SZTAKI
- User Organisations
- OSZK, MTI, ARECO/eBolt
6Information Integration with Sintagma
Presentation and further processing
Database A
Application A(web service)
Data access and transformation
SINTAGMA
External Application(e.g. mashup application)
Database B
Application B(traditional)
(RDBMS, XML, RDF)
- Separates clearly the data access and
transformation layers of integration from the
presentation layer - Uses a comprehensive metadata repository (Model
Warehouse) - Semantics of data represented in the repository
maps local and remote metadata to each other - Data access and transformation driven by the
repository
7Search and analysis of Web data
Search and analysis application (e.g. mashup)
d a t a s e r v i c e
m e t a d a t a m a p p i n g
m e t a d a t a m a p p i n g
m e t a d a t a m a p p i n g
Legend
SINTAGMA-node
8Sintagma an approach to information integration
- Key Principles
- No duplication of data Model Warehouse vs. Data
Warehouse - Communication one-way, on-line (no modification
of data, instant access) - Integration of web services as information
sources supported (no modification required) - Key Components
- Manages various forms of metadata (Model Manager)
- Accesses various structured and semi-structured
information sources (Wrappers) - RDBMS
- RDF
- XML
- Web Services
- Preprocesses various unstructured information
sources (Annotators) - Texts
- Raster maps (labels and signs)
- Excel tables
- Optimises query execution query planning using
deduction (Mediator) - Data Quality Control
9Architecture of SINTAGMA
SintagmaGUI
Model Manager /Model Warehouse
DQ Engine(meta)
Mediator(local)
Model Manager(remote)
Data QualityController
DQ log
TextAnnotator
MapServer
JDBCWrapper
WSWrapper
RDFWrapper
WDWrapper
XMLWrapper
RDBMS
Web Service
RDF
HTML
XML
Text Annotation subsystem
MapAnnotator
DQ Engine(native)
Data Quality Control subsystem
Map Annotation subsystem
10Model Warehouse of SINTAGMA
Common, clarifiedconcepts
Domain specificknowledge/ontologies
Conceptual viewsof workers in a business area
Special concepts of business areas
Domain specific terminology
Conceptual Level
Integrated Conceptual Model
Integrated Application Model
External model (e.g. BPM)
local
Application Level
unified
transformed
local
Interface Level
local
local
local
Legend
model
Source Level
mapping input
11Modelling in SINTAGMA
- The Model Warehouse
- content of the Model Warehouse
- interface models and abstractions
- ontology concepts
- Use cases
- Product comparison
- Workflow of Equipment purchase
- Web service integration demos
-
12Model Warehouse
- Content of the Model Warehouse
- Object-oriented models
- Structural properties of sources in UML Object
Model - Non-structural information given as OCL
Constraints - Mapping between models as abstractions
- Description Logic models
- Queries source and conceptual level
- Classification of models
- interface
- unified (application)
- conceptual
- Modeling SILan Semantic Integration Language
- Describes content of Model Warehouse in textual
format - Has well-defined semantics
13Interface Models
14Higher level models
- Abstractions (data transformations)
- populate higher level entities
- Filter low level data (suppliers)
- Transform data to appropriate higher level form
(clients) - can have multiple suppliers and clients
15Higher level models (contd)
- Invariants
- have to be satisfied by all the instances of a
model element - can contain navigation
- Queries
- can be formulated on any model
- Interface level models directly accessing data
sources - higher level models using mediation
- are interchangable with abstractions
16Conceptual Models
17Conceptual models (contd)
- These models encapsulate concepts given in
Description Logic formalism
18Use case 1 Product comparison
- Goal find products that are similar to the
products in a host system - Information sources
- catalogues from various vendors in Excel
- database of the host system
- Problems to solve
- heterogenity of the catalogues preprocessing
- algorithm for product comparison
19Solution in SINTAGMA
Model Warehouse
Similar Products
Product comparison
Unified Products
Catalogue
Host Database
Excel
MySQL
XML
Excel
Excel
Preprocessing
20Use case 2 Equipment purchase
21Equipment purchase in an organisation
- Scenario
- Each department maintains a wish-list of
equipments - There are vendors who provide products to
departments - Vendors sell different types of products (vendor
A sells printers and toners, Vendor B monitors
and printers etc.) - The financial department dynamically designates a
preferred vendor for each product - Questions is there any expensive order? what is
teh total ? etc. - Information Sources
- Departments wish-list
- relational database with columns description,
category, e.g. we have run out of paper,
15/18 - Financial department
- Web service, with operation determining where to
buy a given product, e.g. (15,8) -gt (A4 paper,
4, 23) - Vendors
- Heterogenous web service which return prices,
units and delivery date, e.g. 23 -gt (12, 1,
2007-07-01)
22Event Driven Process Chain
23Solution in Sintagma
24Use case 3 Web Service Integration
- Integrating Amazon and BarnesNoble
- Integrating RSS-sources (e.g. origo, nol, index,
metro) - Integrating World Championship Results (20o2 and
2006)
25Integrating Amazon and BarnesNoble
Conceptual Level
Price comparison
Availability under limit in HUF
AmazonBN
Application Level
BarnesNoble
Interface Level
currency
Amazon
Legend
model
mapping
Source Level
query input
26Integrating results of World Championships
Positions
Team positions by year
derivation
Team matches
Team matchesby year
First FourTeams
Conceptual Level
No of matches played by teams
grouping
Optimised WC matches
Teams in both WCs
transformation
Matches in both WCs
Unified WC matches
Application Level
Matches of teams
combination
Interface Level
2002 WC
2006 WC
Legend
model
Score n-m Match Id 0-63
Score1 n Score2 m Match Id 1-64
mapping
Source Level
query input
27Integrating RSS-feeds
TextAnnotator
opposition
goverment
Search forhigh level concepts (e.g. political
conflicts)
Conceptual Level
Search for occurances of a specific word(e.g.
budapest)
Unified RSS-feeds
members of
Application Level
combination
nol
Interface Level
metro
index
origo
VIP
Legend
model
mapping
Source Level
query input
28Summary
- The system
- is a semantic information integration tool
- handles various structured sources
- relational, various semi-structured sources and
web services - preprocesses various unstructured sources
- texts, maps, tables
- uses logic / constraint logic programming
- can be used in mashup creation
- disciplined and flexible approach to data access
in mashups - separates data integration from mashup
presentation logic - resolves semantic and technical differences in
sources
29Real estate search - Trulia
- A real estate search engine that helps you find
homes for sale and provides real estate
information at the local level to help you make
better decisions in the process. Trulia pulls in
real estate data from partnerships with thousands
of brokers and agents and displays it on a Google
Maps interface. - Trulia shows you how sales prices have been
trending where it mattersin your county, city,
ZIP code and neighborhood. They also offer heat
maps and real estate guides. - http//www.trulia.com/start
30Hotel Guide - Trivop
- The self-proclaimed first videoguide for hotels
doesnt disappoint. Locate hotels on this Google
Maps Hotel mashup and view user-created videos
of the hotels. This gives a much better view of a
prospective hotel before visiting. - Currently looks like they only have hotels in
England and France, but with their recruiting
efforts one can only assume Trivop will becoming
to a region near you. - http//www.trivop.com
31Visual Music search Music Map
- Visual music search application mashed with
Amazon data. Choose and artist and album, see
related artists in an abstract tree graph.
Wicked. - http//www.dimvision.com/musicmap/
32Search for Popular Music Hype Machine
- The Hype Machine follows music blog discussion.
Every day, hundreds of people around the world
write about music they love. - The Hype Machine tracks a variety of MP3 blogs.
If a post contains MP3 links, it adds those links
to its database and displays them on the front
page. - Some of the frequently accessed tracks are cached
by the Hype Machine server, much like Google
Search caches web pages, to reduce load on the
bloggers' servers and protect their bandwidth.
Those tracks are NOT available for download, but
you can preview them via the "listen" links that
are next to each track or using your media
player. - The blog that posted a particular track is
identified under every track by name and with a
"read post" link that leads to the blog post
itself. If you enjoyed a track someone posted,
stop by and let them know! - You can purchase CDs and individual tracks by
using the "amazon" and "itunes" links that appear
next to most tracks. Each purchase you make via
the Amazon and iTunes links supports both the
artists and the Hype Machine. Please buy and
enjoy. - http//hypem.com/