Data Mining in Macroeconomic Data Sets - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Data Mining in Macroeconomic Data Sets

Description:

Task 1: Exploration of Economy Network Property. Task 2: Temporal Evolution Patterns ... RQ 1: How should we describe the web property of the economy network? ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 39
Provided by: ceC5
Category:

less

Transcript and Presenter's Notes

Title: Data Mining in Macroeconomic Data Sets


1
Data Mining in Macroeconomic Data Sets
  • Advised by Christos Faloutsos
  • 2006. 04. 27
  • Ping Chen

2
Outline
  • Research Background
  • Research Questions
  • Task 1 Exploration of Economy Network Property
  • Task 2 Temporal Evolution Patterns

3
Motivation
  • Economic Supply Chain Connections
  • Hidden sector connections
  • Economic Input Output (EIO) Account
    supply-demand connections

4
Approach
  • network system analysis

Power Supply Sector
Construc-tion Sector
Manufac -turing Sector
5
Research Questions
  • RQ1 Can we describe the properties of the
    economy network?
  • RQ2 Can we characterize the changes in the
    transactions over years and explain why?
  • RQ3 Can we spot anti-correlated, correlated
    sectors effectively?
  • RQ4 Can we detect outlier sectors effectively?

6
Data Preparation
  • EIO Table Structure
  • Row Supply Sector
  • Column Demand Sector
  • Sector Pair
  • Pair Transaction ()
  • Yearly Transaction
  • Set
  • Pair Transaction
  • Sequence

Power Sector
Construction Sector

Year 1947

Year 1958
Year 1982
7
Part I. Economy Network Property
8
Network Topology Weight Distribution
  • How would the transaction distribution look like,
    how to model them (Gaussian? Uniform? )

9
Network Topology Weight Distribution
1982 Inter-Transaction Distribution
10
Power Laws
  • Power Laws
  • Special case, Paretos Laws

Negative Cumulative Probability Density Function
Probability Density Function
Cumulative Number of sites with gt x visitors
(Log)
Proportion of sites (Log)
Slope-2.07
Slope-1.07
Number of visitors (Log)
Number of visitors (Log)
(Paretos Laws)
(Power Law)
11
Examples of double Pareto logNormal (dPlN)
Distribution
Log (density)
Log (density)
Log (density)
Log (Income)
Log (Income)
Log (Income)
United States
Canada
Sri Lanka
Household Income Data for different Countries
Reed, 2002 2003
12
Double Pareto LogNormal (dPlN) Distribution
  • Double Pareto LogNormal Distribution Reed, etc
    2003

CDF
NCDF
13
dPlN Parameter Interpretation
PDF, log normal
Mean Variance
CDF, log-log
Slope
NCDF, log-log
Slope
14
Weight (Transaction) Distribution
15
Weight (Transaction) Distribution
4.35.5
1.21.5
7.17.9
0.51.1
16
RQ 1 How should we describe the web property of
the economy network?
  • Highly skewed transaction data sets (network
    weight)
  • Transaction distribution is well fitted by double
    Pareto logNormal (dPlN) distribution

17
Part II. Economic Dependency Evolution Pattern
18
Research Questions
  • RQ1 How should we describe the web structure of
    the economy network?
  • RQ2 Can we characterize the changes in the
    transactions over years and explain why?
  • RQ4 Can we spot correlated sectors effectively?
  • RQ3 Can we detect outlier sectors effectively?

19
Clustering Methods Survey
  • K-means MacQueen, J. B, 1967
  • Singular Value Decomposition Maltseva, E.,
    Pizzuti, C., Talia, D, 2001

kth Singular Value
vk
sk
Variance of kth Principal Component of XTX
kth Principal Component of XTX
20
Power Sector
Construction Sector


(Construction Sector -gt Power Sector)
Year 1947

Year 1958
Year 1982
Yearly Transaction Set (Year 1947)
21
PCA Projection
22
PCA Projection
  • Advantage Dimension reduction, visualization
  • Disadvantage Suffer from data skewness

23
Data Normalization and Redo PCA
24
Sub-Questions
  • How to handle data skewness?
  • How to normalize data?
  • How to interpret the PCA outcomes using
    normalized data?
  • How to bring back information that is missing in
    data normalization process, i.e., transaction
    scale, etc?

Normalization
25
Solution Multiple Steps of Pattern REcognition
in skewed DAta (M-SPREAD)
  • Step 1 Data normalization
  • Step 2 Principal Component Analysis
  • Step 3 Data Visualization and Pattern
    Identification M-Plane
  • Step 4 Data Bucket Generation
  • Step 5 Sub data set Pattern Identification
    M-Slice

26
Principal Components and Interpretation
  • Observations
  • Reversed PC1 Continuous intensified
    inter-transactions
  • Reversed PC2 Interrupted inter-transaction
    growth in 1970s
  • Possible Reason Oil Crisis in 1970s

27
(Oil price chronology)
28
Oil Benefiting
PC2
Growing
Shrinking
PC1
Oil Suffering
29
M-Plane
30
Observations from M-Plane
  • Four Regions
  • Growing (Majority)
  • Shrinking
  • Oil Suffering (Another Cluster)
  • Oil Benefiting
  • Two clusters (C1, C2)
  • C1 Inter-transaction amounts grow
  • C2 Inter-transaction amounts suffered from Oil
    Crisis in 1970s

31
Data Bucket Generation
Before Normalization
Average
1
2
3
4
4
1
2
3
32
M-Slices
M-Plane
33
Observations from M-Slices
  • Major patterns change over data buckets having
    different data magnitude
  • Very Large-size Transaction pairs Most growing,
    very few oil sensitive
  • Large and Small-size Transaction pairs Both
    growing, a few oil sensitive
  • Small-size Transaction pairs mixed growth
    patterns

34
M-SubSettings
Demand Sector
Supply Sector

35
M-SubSetting Examples
Motor Vehicle and Equipment Industry
Aircraft and Parts
Supply sector
Demand sector
M-Plane
Petroleum Refining and related Industry
Transportation and Warehousing
36
Observations from M-Sub Settings
  • Individual Industry related dependence evolution
    pattern
  • Motor, Aircraft parts etc industry oil
    suffering
  • Domestic petroleum industry, warehousing industry
    related oil benefiting
  • Correlated sectors motor, aircraft parts
  • Observation of substitution phenomena
    transportation approach vs. warehousing facility
    etc.

37
Discussion of M-SPREAD Procedure
  • Effects of selecting of normalization methods

38
Discussion of M-SPREAD Procedure
  • Effects of removing small transactions

39
Summary of Patterns
  • Correlated and anti-correlated Sectors (auto vs.
    aircraft part, auto vs. warehousing)
  • Time Evolution Patterns (growing, oil
    suffering)
  • Effect of magnitude (largegrowing)
  • Outlier Sector, Correlations, Substitution effect
  • Outlier Time Stamp (1977)

40
Contributions
  • Discovery of dPlN of transaction distribution
  • M-SPREAD, handle normalization, handle various
    magnitude
  • Effective visualization method (M-Plane, M-Slice,
    M-Sub Setting)
  • Discovery of patterns
  • Time evolution pattern
  • Effect of magnitude
  • Correlated, anti-correlated sectors
  • Outliers

41
Complete Principal Components
42
Plot of Singular Values
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
Selection of Feature Values
Write a Comment
User Comments (0)
About PowerShow.com