MatchIT 1.1: Data Integration with Semantic Mapping Technologies - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

MatchIT 1.1: Data Integration with Semantic Mapping Technologies

Description:

... based on the ontological distance between them within a semantic knowledge base. ... Uses knowledge base distance measures to determine semantic similarity. ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 23
Provided by: csN4
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: MatchIT 1.1: Data Integration with Semantic Mapping Technologies


1
MatchIT 1.1 Data Integration with Semantic
Mapping Technologies
  • Michael Schidlowsky
  • Sr. Software Architect

2
Data Integration
  • Motivated by
  • Organizational Changes
  • Mergers and Acquisitions
  • Internal reorganizations (e.g., DHS)
  • Data Mining
  • Standards Conformance
  • Migration Efforts
  • Legacy Systems
  • Decouple data sources from application code

3
Data Integration
  • Challenges for integration specialist include
  • Domain-specific terms
  • Unfamiliarity with source schemas
  • Large size of schema set
  • Semantics often not captured
  • Captured semantics
  • Stored in ad-hoc formats
  • Cannot be reused to facilitate future data
    integration efforts

4
Data Integration Example
  • Background
  • Acme Inc., merges with CompuGlobalHyperMeganet.
  • Technical Challenge
  • Need Virtual Database of all sales for all
    stores in real-time.
  • Which fields represent customers?
  • CUSTOMERID
  • CUST_ID
  • SSN
  • Which fields represent Price?
  • Sale_Amt
  • Total_Sale
  • What if your database has 10,000 columns?

5
Data Integration Example
  • Background
  • HR needs to use employee information for new
    company portal.
  • Technical Challenge
  • Data must be in XML and conform to standard HR
    schema.
  • Find all fields related to Address?
  • RESIDENCE
  • PREV_RESIDENCE
  • What if your database has 10,000 columns?

6
Ideal Matching Solution
  • Finds lexical relationships
  • Captures semantic information
  • Finds semantic relationships
  • Provides programmatic access to results (API)
  • Fast
  • Scalable
  • Human Involvement

7
MatchIT Philosophy
  • Best Matching tool already exists!
  • What is meant by ID?

8
MatchIT Philosophy
  • Best Matching tool already exists!
  • What is meant by ID?
  • PLEASE PRESENT ID

9
MatchIT Philosophy
  • Best Matching tool already exists!
  • What is meant by ID?
  • PLEASE PRESENT ID
  • NY, NJ, ID

10
MatchIT Philosophy
  • Best Matching tool already exists!
  • What is meant by ID?
  • PLEASE PRESENT ID
  • NY, NJ, ID
  • SUPEREGO, EGO, ID

11
MatchIT 1.1
  • - MatchIT is a semantic and lexical matching
    tool.
  • Session Outline
  • Import and process schemas
  • Perform lexical matching
  • Create and manage a semantic vocabulary
  • Perform semantic matching
  • Demonstrate 3rd Party integration with Data
    Integration tool (MetaMatrix)

12
Import Process Schemas
  • Revelytix Models are RDF/OWL
  • Flexible model architecture
  • Extensible
  • Interoperable
  • Current Importers
  • JDBC
  • XML Schema
  • MetaMatrix XMI Models

Importer Demo
13
Lexical Matching
  • Uses lexical distance measures to determine
    lexical similarity.
  • Fastest matching technique
  • Requires no work other than importing schemas
  • Often yields interesting results

Lexical Matching Demo
14
Create Vocabulary from Schemas
  • A Vocabulary is
  • A set of symbols
  • Occurrences of those symbols in your schemas
  • Binding of each symbol to one or more semantic
    concepts
  • Created by MatchIT from schemas using
    tokenization algorithms.
  • Reusable

15
Tokenization Algorithms
  • Different schemas require different tokenization
    techniques.
  • Tokenization algorithms determine how symbols are
    extracted from schemas
  • Capitalization
  • Delimiters
  • English Language

Vocabulary Demo
16
Matching Techniques
  • MatchIT currently uses two types of matching
    techniques
  • Lexical Matching
  • Attempts to determine similarity based on the
    lexical distance between them.
  • Semantic Matching
  • Attempts to determine similarity based on the
    ontological distance between them within a
    semantic knowledge base.

17
Parts Supplier Schema(as seen by a person)
18
Parts Supplier Schema (as seen by a computer)
19
Semantic Matching
  • How semantically similar are two concepts?

20
Semantic Matching
  • Uses knowledge base distance measures to
    determine semantic similarity.
  • Presents ranked candidate matches
  • Based on semantics captured in Vocabularies
  • The only way to effectively find relationships
    between lexically dissimilar symbols
  • GenderCode SexCode
  • Provider Supplier
  • Amount Quantity

Semantic Matching Demo
21
3rd Party Integration
  • MatchIT Integration
  • MatchIT Java API
  • Stand-alone application
  • Embeddable application (as Eclipse plug-ins).
  • Hides unapproved matches
  • Useful for various 3rd Party applications
  • Data Integration
  • Data Discovery
  • Ontology Mediation
  • Search
  • Metadata Management
  • Data Cleansing

MetaMatrix Demo
22
Questions?
  • MatchIT 30-day trial available at
    http//www.revelytix.com
  • Michael Schidlowsky
  • michaels_at_revelytix.com
Write a Comment
User Comments (0)
About PowerShow.com