Title: Chen Li
1Data IntegrationThe Good, the Bad, and the Ugly
2My story
Doing Data Integration research as a PhD student
3New adventure as a junior faculty (2001)
Problem data needs to be cleaned!
4Supporting approximate string matching
Movies starring Schwarrzenger
Find movies with a star similar to
5More challenging problems in data cleaning!
- Indexing structures
- Top-k queries
- Approximate joins
- Selectivity estimation of fuzzy predicates
Microsoft SQL Server 2005 supports data cleaning
6(Good) Lessons
- Talk to our customers
- Find their need and real problems
- Test our solutions in general domains
7Data Integration Good/Bad/Ugly
8The Good
- Many many many many many applications!
9The Bad
10The Ugly
11The Future
12Data ? Data integration
in data integration
13Demand driven
Customer demand
Data Integration toolkit
14The next Google for data integration?
- Not just small toolkit of data integration
- The technique should have enough demand!
16One problem Approximate String Searching