Mining Distributed Databases - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Mining Distributed Databases

Description:

Mining Distributed Databases Raj Bhatnagar University of Cincinnati – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 22

Provided by: rajb152

Learn more at: https://eecs.ceas.uc.edu

Category:

Tags: databases | distributed | hops | mining

Transcript and Presenter's Notes

Title: Mining Distributed Databases

1
Mining Distributed Databases

Raj Bhatnagar
University of Cincinnati

2
Distributed Databases

D D1 X D2 X . . . X Dn
- D is implicitly specified
Goal Discover patterns in implicit D, using
the explicit Dis

Geographically distributed nodes
Limitations - Cant move Dis to a common
site - Size / communication cost/Privacy -
Cant update local databases - Cant send actual
data tuples
3
Explicit and Implicit Databases

Implicit Database
4
Decomposition of Computations

- Since D is implicit,
- For a computation
- Decompose F into G and gs
- Decomposition depends on
- F
- Dis
- Set of shared attributes

5
Decomposition of Computations

Computational primitives
Arithmetic primitives
Count of tuples in implicit D
Mean Value of an attribute in D
Informational entropy for a subset of D
Covariance matrix for D
non-numeric primitives
Median value of an atribute in D
Sorting subsets of tuples in D

6
Decomposition of Computations

Computational cost of decomposition
Communication cost
Number of messages exchanged
Number of database queries
Who does the decomposition?
Algorithm itself, at run time
Depending on the nature of overlap in Dis

7
Count All Tuples in Implicit D
Can be decomposed as

condJ Jth tuple in Shareds
n number of participating databases (Dis)
(N(Dt)condJ) count of tuples in Dt satisfying
condJ
Local computation gi(Di,) N(Dt)condJ
G is a sum-of-products

8
Implementing Decomposed Computations

Stationary Agents
Mobile Agents
Aglet
Messages
9
Implementation of Count(D)

Stationary Agents
- Request / Send Summaries
- Simple SQL interface
- 1 count / message
- l attributes having k values each
- Query-code interface
- counts/message
- l attributes having k values each
Mobile Agents

Messages exchanged
Messages exchanged
Number of hops
10
Implementation of Count(D-test)

Stationary Agents
- Simple SQL interface
- Query-code interface
Mobile Agents

Shareds
L attributes k values each
tuples
11
Average Value of an attribute in D

Compute counts for each value of an attribute

Stationary Agents - Simple SQL interface -
Query-code interface Mobile Agents
Messages exchanged
(1 integer/message)
Messages exchanged
integers/message
Number of hops
12
Exception Tuples

Database of interest may exclude some tuples of D
Learning site keeps a relation E of exception
tuples
E may have explicit tuples
E may have rules to generate exception tuples

C
A
E
B
1
1
3
2
2
1
-
-
1
2
-
-
2
2
-
-
SharedSet
Exceptions
Explicit Databases
13
Computing Informational Entropy

Consists of various counts only
Stationary agent/Simple SQL interface
Stationary agent/Query-code interface
Mobile agent

Messages exchanged
Messages exchanged
Number of messages/hops is independent of
the size of D
14
Decomposition of Algorithms

Arithmetic primitives are 1-step decompositions
Counts, averages, entropy
Algorithms involve
Arithmetic primitives
non-numeric primitives
Control structure
Decomposition studied for
Decision tree induction algorithm
Mining of association rules
Control structure is unaltered
Primitive computations are decomposed

Learner Node
Control structure
Decomposition
Composition

15
Building a Decision Tree

To induce a decision tree having
- d levels m attributes in n databases
l shared attributes
- k values/attribute
Stationary agent/Simple SQL interface
Stationary agent/Query-code interface
Mobile agent

Number of messages/hops is independent of
the size of D
16
Mining Association Rules

Main operations
- Enumerate item-sets
- Compute support/confidence
- Basic computation Count-of-tuples
Communication Complexity
- m (avg.) item sets at each level of
enumeration tree
- j levels of enumeration tree
- Query-code can count for all item sets at a
level simultaneously
- Therefore, we need

Number of Counts Needed
17
More Complex Computations

Covariance matrix for D
Useful for eigen vectors/principal components
Needs second order moments
Graph/Network algorithms
Each node has part of a graph
Some nodes are shared
Determine MST
Paths of Min/Max flow
flow patterns

18
Sum of Products

Sum of products for two attributes
There are six different ways in which x and y may
be distributed
Each requires a different decomposition
Case 1 x same as y and x belongs to the
SharedSet.
Case 2 x same as y and x does not belong to the
SharedSet.
Case 3 x and y both belong to the SharedSet.

19
Sum of Products

Case 4 x belongs to SharedSet and y does not.
Case 5 x, y dont belong to the SharedSet and
reside on different nodes.
For each tuple t in SharedSet, obtain
and then
Case 6 x, y dont belong to the SharedSet and
reside on the same node.

where
Prod(t) is average of product of x and y for
cond-t of SharedSet
20
Self-decomposing Algorithms

Easy decomposability of arithmetic primitives
Average/Covariance matrix/Entropy
Control structure of algorithms is not altered
More gains possible, by altering control
structure
Decomposition is driven by the set of shared
attributes
Algorithm can determine shared attributes in n
messages/hops
Algorithms decompose in accordance with attribute
sharing
No human intervention needed
Message complexity is independent of sizes of
databases

21
Continuing Work

Determine patterns of flow in a network
Communication network traffic
Geographic/economic flows

Local flow data
Local flow data
Local flow data
Local flow data

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Mining Association Rules in Large Databases PowerPoint PPT Presentation

Mining Association Rules in Large Databases - Mining single-dimensional Boolean association rules from transactional databases ... people who purchase tires and auto accessories also get automotive services done ... | PowerPoint PPT presentation | free to view

How To Make Money Mining Bitcoin? | Bridge Advisors PowerPoint PPT Presentation

How To Make Money Mining Bitcoin? | Bridge Advisors - Choose Bridge Advisors to get a complete idea how to Make Money Mining Bitcoin. There are many way to get mining bitcoins and appeal to those who want to get a Bitcoin by selling any good products or services that are performing without using services such as exchanges or performing as a profession. There are 7 bitcoin mining technologies (cryptography, peer-to-peer networks, distributed databases) and introduction of my bitcoin to help some users make experience with these techniques. | PowerPoint PPT presentation | free to view

Vital Concepts of Data Mining PowerPoint PPT Presentation

Vital Concepts of Data Mining - Data mining can be understood as extraction of data. It is subject-oriented and integrated from various sources, as of flat files, relational databases and online records. Certain conventions are to be followed while integrating scattered data into useful data. Data mining services are vital for business research services also. Both walk hand and hand; therefore, these would be concluded as inter-related. Data warehouse’ enterprise, data mart and virtual warehouse of data are its three models. | PowerPoint PPT presentation | free to view

Information Extraction, Data Mining PowerPoint PPT Presentation

Information Extraction, Data Mining - Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton ... | PowerPoint PPT presentation | free to view

Biological Information and Biological Databases PowerPoint PPT Presentation

Biological Information and Biological Databases - Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore Biological Information Nature of Life Science ... | PowerPoint PPT presentation | free to view

Data Mining in Market Research PowerPoint PPT Presentation

Data Mining in Market Research - Data Mining in Market Research What is data mining? Methods for finding interesting structure in large databases E.g. patterns, prediction rules, unusual cases | PowerPoint PPT presentation | free to view

Distributed Control System Market PowerPoint PPT Presentation

Distributed Control System Market - Future Market Insights (FMI) has published a new market research report on social employee recognition systems. The report has been titled, Global Distributed Control System Market: Global Industry Analysis,Forecast. Long-term contracts with large enterprises and private companies are likely to aid the expansion of business revenues, and innovation in the industry will enable social employee recognition system vendors to reach out to new potential customers in emerging markets. These factors are expected to help the global market for social employee recognition systems observe stellar growth in next few years. | PowerPoint PPT presentation | free to view

Data Mining: Concepts and Techniques Getting to Know Your Data PowerPoint PPT Presentation

Data Mining: Concepts and Techniques Getting to Know Your Data - Data Mining: Concepts and Techniques Getting to Know Your Data * | PowerPoint PPT presentation | free to view

Web Mining : A Bird PowerPoint PPT Presentation

Web Mining : A Bird - Web Mining : A Bird s Eye View Sanjay Kumar Madria Department of Computer Science University of Missouri-Rolla, MO 65401 madrias@umr.edu | PowerPoint PPT presentation | free to view

Querying and Mining Data Streams: You Only Get One Look A Tutorial PowerPoint PPT Presentation

Querying and Mining Data Streams: You Only Get One Look A Tutorial - Querying and Mining Data Streams: You Only Get One Look | PowerPoint PPT presentation | free to view

Data Warehousing/Mining Comp 150 DW Chapter 10. Applications and Trends in Data Mining PowerPoint PPT Presentation

Data Warehousing/Mining Comp 150 DW Chapter 10. Applications and Trends in Data Mining - ... of mining audio (such as music) databases which is to find patterns ... You pay for prescription drugs, or present you medical care number when visiting ... | PowerPoint PPT presentation | free to view

Intelligent Data Mining PowerPoint PPT Presentation

Intelligent Data Mining - Intelligent Data Mining Ethem Alpayd n Department of Computer Engineering Bo azi i University alpaydin@boun.edu.tr | PowerPoint PPT presentation | free to view

Graph Mining Applications in Machine Learning Problems PowerPoint PPT Presentation

Graph Mining Applications in Machine Learning Problems - Graph Mining Applications in Machine Learning Problems Max Planck Institute for Biological Cybernetics Koji Tsuda | PowerPoint PPT presentation | free to view

Spatial and Temporal Data Mining PowerPoint PPT Presentation

Spatial and Temporal Data Mining - Spatial and Temporal Data Mining Data Preprocessing Vasileios Megalooikonomou (based on notes by Jiawei Han and Micheline Kamber) Agenda Why data preprocessing? | PowerPoint PPT presentation | free to view

Graph Mining: patterns and tools for static and time-evolving graphs PowerPoint PPT Presentation

Graph Mining: patterns and tools for static and time-evolving graphs - Graph Mining: patterns and tools for static and time-evolving graphs | PowerPoint PPT presentation | free to view

CIS664-Knowledge Discovery and Data Mining PowerPoint PPT Presentation

CIS664-Knowledge Discovery and Data Mining - CIS664-Knowledge Discovery and Data Mining Data Warehousing and OLAP Technology Vasileios Megalooikonomou Dept. of Computer and Information Sciences | PowerPoint PPT presentation | free to view

Mining%20Decision%20Trees%20from%20Data%20Streams PowerPoint PPT Presentation

Mining%20Decision%20Trees%20from%20Data%20Streams - Mining Decision Trees from Data Streams Thanks: Tong Suk Man Ivy HKU | PowerPoint PPT presentation | free to view

High Performance Computing Solutions for Data Mining PowerPoint PPT Presentation

High Performance Computing Solutions for Data Mining - High Performance Computing Solutions for Data Mining Prof. Navneet Goyal | PowerPoint PPT presentation | free to view

Investigative Data Mining in Fraud Detection PowerPoint PPT Presentation

Investigative Data Mining in Fraud Detection - Investigative Data Mining in Fraud Detection Overview (1) Investigative Data Mining and Problems in Fraud Detection Definitions Technical and Practical Problems ... | PowerPoint PPT presentation | free to view

Data Mining: Current Status and Research Directions PowerPoint PPT Presentation

Data Mining: Current Status and Research Directions - Text mining, Web mining and Weblog analysis. Spatial, multimedia, scientific data analysis ... customization: home page Weblog user profiles. 9/3/09. Data ... | PowerPoint PPT presentation | free to view

Databases and Database Management System PowerPoint PPT Presentation

Databases and Database Management System - issues in database design and use (views, integrity constraints, triggers, ... Two common applications of db. 19. Examples of databases. Airline reservation system ... | PowerPoint PPT presentation | free to view

Using Relevance Feedback in Multimedia Databases PowerPoint PPT Presentation

Using Relevance Feedback in Multimedia Databases - Title: Using Relevance Feedback in Multimedia Databases Subject: VIS'04 Author: Chotirat Ann Ratanamahatana and Eamonn Keogh Last modified by: IBM | PowerPoint PPT presentation | free to view

Overview of Web Mining and E-Commerce Data Analytics PowerPoint PPT Presentation

Overview of Web Mining and E-Commerce Data Analytics - What is Data Mining. What do we need? Extract interesting and useful knowledge from the data. Find rules, regularities, irregularities, patterns, constraints | PowerPoint PPT presentation | free to view

Introduction to Data Mining PowerPoint PPT Presentation

Introduction to Data Mining - Introduction to Data Mining Y cel SAYGIN ysaygin@sabanciuniv.edu http://people.sabanciuniv.edu/~ysaygin/ | PowerPoint PPT presentation | free to view

Information Extraction, Data Mining and Joint Inference PowerPoint PPT Presentation

Information Extraction, Data Mining and Joint Inference - Information Extraction, Data Mining and Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton ... | PowerPoint PPT presentation | free to view

Environmental Data Warehousing and Mining PowerPoint PPT Presentation

Environmental Data Warehousing and Mining - Environmental Data Warehousing and Mining Nabil R. Adam Vijay Atluri, Dihua Guo, Songmei Yu Rutgers University CIMIC NSF Workshop on Next Generation Data Mining NGDM02 | PowerPoint PPT presentation | free to view

Privacy preserving data mining PowerPoint PPT Presentation

Privacy preserving data mining - Privacy preserving data mining randomized response and association rule hiding Li Xiong CS573 Data Privacy and Anonymity Partial s credit: W. Du, Syracuse ... | PowerPoint PPT presentation | free to view