Reconciling Schemas of Disparate Data Sources: A MachineLearning Approach - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Reconciling Schemas of Disparate Data Sources: A MachineLearning Approach

Description:

Contributions of base learners and the constraint handler ... Conclusion and Future Work. Improve over time. Extensible framework. Multiple types of knowledge ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 26

Provided by: LiXu8

Category:

Tags: machinelearning | approach | base | data | disparate | knowledge | reconciling | schemas | sources

Transcript and Presenter's Notes

Title: Reconciling Schemas of Disparate Data Sources: A MachineLearning Approach

1
Reconciling Schemas of Disparate Data Sources A
Machine-Learning Approach

AnHai Doan
Pedro Domingos
Alon Halevy

2
Data Integration
3
Problem Solution

Problem
Large-scale Data Integration Systems
Bottleneck Semantic Mappings
1-1 Mappings
Solution
Multi-strategy Learning
Integrity Constraints
XML Structure Learner

4
Learning Source Descriptions (LSD)

Components
Base learners
Meta-learner
Prediction converter
Constraint handler
Operations
Training phase
Matching phase

5
Learners

Basic Learners
Name Matcher (Whirl)
Content Matcher (Whirl)
Naïve Bayes Learner
County-Name Recognizer
XML Learner
Meta-Learner (Stacking)

6
XML Learner
7
XML Learner (Cont.)
8
Constraint Handler

Domain Constraints

9
Constraint Handler (Cont.)

Search Heuristic
Mapping Cost

10
Training Phase
11
Example1 (Training Phase)
12
Example1 (Cont.)
13
Example1 (Cont.)
(location ,ADDRESS)
(Miami, FL, ADDRESS)
14
Matching Phase
15
Example2 (Matching Phase)
16
Example2 (Cont.)
17
Example2 (Cont.)
18
Empirical Evaluation
19
Measures

Matching accuracy of a source
Average matching accuracy of a source
Average matching accuracy of a domain

20
Experiment Result
21
Experiment Result (Cont.)
Contributions of base learners and the constraint
handler
22
Experiment Result (Cont.)
Contributions of Schema information and Data
Instances
23
Experiment Result (Cont.)
Performance sensitivity to the amount of data
instances
24
Limitations

Enough Training Data
Domain Dependent Learners
Ambiguities in Sources
Efficiency
Overlapping of Schemas

25
Conclusion and Future Work

Improve over time
Extensible framework
Multiple types of knowledge
Non 1-1 mapping ?

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Data Warehouse Architecture Inmon or Kimball PowerPoint PPT Presentation

Data Warehouse Architecture Inmon or Kimball - A presentation that considers the approach to creating a data warehouse, Inmon or Kimball. Which approach to use and how do they compare ? | PowerPoint PPT presentation | free to view

Keyword Search on Structured and Semi-Structured Data PowerPoint PPT Presentation

Keyword Search on Structured and Semi-Structured Data - XRANK: Ranked keyword search over XML documents. ... Tutorial * * Databases / XML data ... dataflow Result Definition on XML & Trees /1 In an XML tree, ... | PowerPoint PPT presentation | free to view

SAP Insider webcast: Advanced Self-Service Master Data Improvement PowerPoint PPT Presentation

SAP Insider webcast: Advanced Self-Service Master Data Improvement - Learn how to improve master data quality without outside help. Companies with robust processes can manage this challenge themselves with the right combination of tools and embedded in-house knowledge. | PowerPoint PPT presentation | free to view

DECOMPOSITION OF RELATIONS: A NEW APPROACH TO CONSTRUCTIVE INDUCTION IN MACHINE LEARNING AND DATA MINING - AN OVERVIEW PowerPoint PPT Presentation

DECOMPOSITION OF RELATIONS: A NEW APPROACH TO CONSTRUCTIVE INDUCTION IN MACHINE LEARNING AND DATA MINING - AN OVERVIEW - DECOMPOSITION OF RELATIONS: A NEW APPROACH TO CONSTRUCTIVE INDUCTION IN MACHINE LEARNING AND DATA MINING - AN OVERVIEW Marek Perkowski Portland State University | PowerPoint PPT presentation | free to view

A Principled Approach to Data Integration and Reconciliation in Data Warehousing PowerPoint PPT Presentation

A Principled Approach to Data Integration and Reconciliation in Data Warehousing - Title: A Principled Approach to Data Integration and Reconciliation in Data Warehousing Author: Alan Wessman Last modified by: Continuing Ed Created Date | PowerPoint PPT presentation | free to view

Tips & Tricks to drive effective Master Data Management & ERP Harmonization PowerPoint PPT Presentation

Tips & Tricks to drive effective Master Data Management & ERP Harmonization - ERP Systems Are Indispensable to Business Operations in Large Organizations This presentation will help you to understand how Master Data Quality affects the Accuracy, Efficiency, and Reliability of business processes. Learn from Jeffrey Karson of Siemens Water Technology as to how Master Data Rationalization helped SWT increase the ROI of an existing ERP Investment, especially in the scenario of consolidation of legacy systems. | PowerPoint PPT presentation | free to view

Integrated Data Systems for Genomic Analysis Genomics and Bioinformatics for the Advancement of Clinical Sciences Thomas Jefferson University, Oct. 14, 2002 PowerPoint PPT Presentation

Integrated Data Systems for Genomic Analysis Genomics and Bioinformatics for the Advancement of Clinical Sciences Thomas Jefferson University, Oct. 14, 2002 - Integrated Data Systems for Genomic Analysis Genomics and Bioinformatics for the Advancement of Clinical Sciences Thomas Jefferson University, Oct. 14, 2002 | PowerPoint PPT presentation | free to view

Learn Big Data HADOOP Online Training in Hyderabad | Bangalore | India - Imaginelife PowerPoint PPT Presentation

Learn Big Data HADOOP Online Training in Hyderabad | Bangalore | India - Imaginelife - Enroll in www.imaginelife.in to learn Big data Hadoop training courses and Hadoop certification courses from industry experienced Real time professional through online live classes and E learning courses. | PowerPoint PPT presentation | free to view

Preservation Strategies in the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives PowerPoint PPT Presentation

Preservation Strategies in the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives - Preservation Strategies in the North Carolina Geospatial Data Archiving Project (NCGDAP) NCSU Libraries Steve Morris Head of Digital Library Initiatives | PowerPoint PPT presentation | free to view

Testing A Community Data Model for Hydrologic Observations PowerPoint PPT Presentation

Testing A Community Data Model for Hydrologic Observations - Testing A Community Data Model for Hydrologic Observations David G Tarboton Jeff Horsburgh David R. Maidment Ilya Zaslavsky David Valentine Blair Jennings | PowerPoint PPT presentation | free to view

Defining Disparity Related to Tobacco Use PowerPoint PPT Presentation

Defining Disparity Related to Tobacco Use - Defining Disparity Related to Tobacco Use A data driven process | PowerPoint PPT presentation | free to view

MatchIT 1.1: Data Integration with Semantic Mapping Technologies PowerPoint PPT Presentation

MatchIT 1.1: Data Integration with Semantic Mapping Technologies - MatchIT 1.1: Data Integration with Semantic Mapping Technologies Michael Schidlowsky Sr. Software Architect | PowerPoint PPT presentation | free to view

Data%20integration:%20an%20overview%20on%20statistical%20methodologies%20and%20applications. PowerPoint PPT Presentation

Data%20integration:%20an%20overview%20on%20statistical%20methodologies%20and%20applications. - Data integration: an overview on statistical methodologies and applications. Mauro Scanu Istat Central Unit on User Needs, Integration and Territorial Statistics | PowerPoint PPT presentation | free to view

The TSIMMIS Approach to Mediation: Data Models and Languages PowerPoint PPT Presentation

The TSIMMIS Approach to Mediation: Data Models and Languages - The TSIMMIS Approach to Mediation: Data Models and Languages Hector Garcia-Molina Yannis Papakonstantinou Dallan Quass Anand Rajaraman Yehoshua Sagiv | PowerPoint PPT presentation | free to view

Using SQL Queries to Insert, Update, Delete, and View Data PowerPoint PPT Presentation

Using SQL Queries to Insert, Update, Delete, and View Data - Using SQL Queries to Insert, Update, Delete, and View Data Chapter 3 Lesson A Objectives Learn how to run a script to create database tables automatically Learn how ... | PowerPoint PPT presentation | free to view

Query Relaxation Using Malleable Schemas PowerPoint PPT Presentation

Query Relaxation Using Malleable Schemas - Problem Multiple data sources Unmatched schemas Approach Malleable schemas Discover correlations Relax user queries Malleable ... sur_name name Malleable ... | PowerPoint PPT presentation | free to view

Towards a harmonised approach for collection and interpretation of data on emerging substances in the environment in support of European environmental policies PowerPoint PPT Presentation

Towards a harmonised approach for collection and interpretation of data on emerging substances in the environment in support of European environmental policies - Towards a harmonised approach for collection and interpretation of data on emerging substances in the environment in support of European environmental policies | PowerPoint PPT presentation | free to view

Proteomics Data Interoperation with Applications to Integrated Datamining and Enhanced Information Retrieval PowerPoint PPT Presentation

Proteomics Data Interoperation with Applications to Integrated Datamining and Enhanced Information Retrieval - Proteomics Data Interoperation with Applications to Integrated Datamining and Enhanced Information Retrieval Andrew Smith Thesis Defense 8/25/2006 | PowerPoint PPT presentation | free to view

Best DATA STAGE Online Training In India,UK, USA, Canada PowerPoint PPT Presentation

Best DATA STAGE Online Training In India,UK, USA, Canada - Data Stage Online Training At Smart Mind Online Training . Data Period was pared right down to be as unobtrusive as you can. There isn't any "client" software to download, quite few required metadata fields, as well as a file system that builds on formats the user should already be comfortable with. End users can link to Data Stage Online Training mapped drive on Windows, Linux or Mac machines, and Data Stage can be reachable by means of a web interface. The tool enables integration of processing high quantities of the information and the data. Data phase comes with a user friendly graphical front end to designing jobs which handle validating, transforming, gathering and loading data from several sources, including the business applications like SAP, Oracle, People Soft and mainframes. | PowerPoint PPT presentation | free to view

THE SALES COMPARISON APPROACH PowerPoint PPT Presentation

THE SALES COMPARISON APPROACH - CHAPTER TERMS AND CONCEPTS. Open-market transaction. Physical characteristics. Public records. Range. Sales comparison approach. Sales history. Sample. Seller s motives | PowerPoint PPT presentation | free to view

SCHEMA MARKUP IN SEO PowerPoint PPT Presentation

SCHEMA MARKUP IN SEO - SCHEMA MARKUP IS THE CODE THAT ALLOW THE SEARCH ENGINE TO UNDERSTAND WHY YOUR DATA IS ACTUALLY THERE AND WHAT DOES IT MEANS | PowerPoint PPT presentation | free to view

Introduction to Big Data HADOOP HDFS MapReduce - Department of Computer Engineering PowerPoint PPT Presentation

Introduction to Big Data HADOOP HDFS MapReduce - Department of Computer Engineering - This presentation is an Introduction to Big Data, HADOOP: HDFS, MapReduce and includes topics What is Big Data and its benefits, Big Data Technologies and their challenges, Hadoop framework comparison between SQL databases and Hadoop and more. It is presented by Prof. Deptii Chaudhari, from the department of Computer Engineering at International Institute of Information Technology, I²IT. | PowerPoint PPT presentation | free to view

Benefits Of Using Data Quality Tools PowerPoint PPT Presentation

Benefits Of Using Data Quality Tools - In this ppt, We describe about Benefits Of Using Data Quality Tools. Traditional as well as technology-based enterprises are looking to harness data to drive business gains. Data quality tools have become a vital part of information management schemes. As organizations become increasingly dependent on information elements to conduct their operations and plan for the future, it has become essential to generate consistent and accurate data. | PowerPoint PPT presentation | free to view

Nested JSON data processing using Apache Spark with Coding PowerPoint PPT Presentation

Nested JSON data processing using Apache Spark with Coding - Here we have given information about nested JSON data processing using Apache Spark in this article and given some necessary code related to it, then go to the end of this article to get more information about it. | PowerPoint PPT presentation | free to view

Big Data Analytics Service in India PowerPoint PPT Presentation

Big Data Analytics Service in India - Genpro provides clinical data visualization tools, dashboards and drill down reports to help sponsors to monitor their clinical and safety data. We can provide sponsors with web-based dashboards to monitor their clinical data including tabular and graphical reports from a single database or across multiple databases. Some of the data sources that we work with include EDC data records, Clinical data repositories, PK/PD data, Patient safety profiles along with real world events and outcomes. | PowerPoint PPT presentation | free to view

Big Data Analytics Service Provider in India PowerPoint PPT Presentation

Big Data Analytics Service Provider in India - NYGCI has delivering successful solutions for large and medium-sized enterprises. We use advanced tools to create a rich set of analytics outputs, directly enabling intelligent data-driven decision making. | PowerPoint PPT presentation | free to view

Data Analytics Staffing Solutions | NLB Services PowerPoint PPT Presentation

Data Analytics Staffing Solutions | NLB Services - We provide top-notch digital solutions with data analytics staffing solutions. We help firms unlock growth prospects. Know more us at https://nlbservices.com/data-analytics-and-engineering/ | PowerPoint PPT presentation | free to view