Multisite Internet Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Multisite Internet Data Analysis

Description:

D. Blatt and A. Hero, 'Asymptotic distribution of log-likelihood maximization ... N. Patwari, A. O. Hero, and Brian Sadler, 'Hierarchical censoring sensors for ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: Alfre82
Category:

less

Transcript and Presenter's Notes

Title: Multisite Internet Data Analysis


1
Multisite Internet Data Analysis
  • Alfred O. Hero, Clyde Shih, David Barsic
  • University of Michigan - Ann Arbor
    hero_at_eecs.umich.edu
  • http//www.eecs.umich.edu/hero
  • Network Data Collection
  • Distributed Data Analysis
  • Dimension Reduction
  • Model-Based Data Analysis
  • Conclusions

Research supported in part by NSF CCR-0325571
2
1. Network Data Collection
  • Objectives
  • Global monitoring centers aggregate statistics
    from sites distributed around network to detect,
    classify, or estimate global network state while
    ensuring information privacy constraints
  • Local collection sites gather data relevant to
    local network state and share information as
    necessary to enhance local analysis.
  • Types of data measured
  • Active queries and requests, packet probes
  • Passive netflow, router fields, honeypots,
    backscatter

3
ISP 2
Local data collection and probing site
ISP 1
Monitoring Center
Data collection site
ISP 3
Data collector
4
Abilene Netflow Data
No. Flows Avg. Duration Std.
Duration Avg Packets Std. Packets Avg
Bytes Std. Bytes
Protocol
Dataset 1
No. Flows Avg. Duration Std.
Duration Avg Packets Std. Packets Avg
Bytes Std. Bytes
Dataset 2
5
Abilene Netflow Data
No. Flows Avg. Duration Std.
Duration Avg Packets Std. Packets Avg
Bytes Std. Bytes
Router
Dataset 1
No. Flows Avg. Duration Std.
Duration Avg Packets Std. Packets Avg
Bytes Std. Bytes
Dataset 2
6
Abilene Netflow Data
7
Challenges and Approaches
  • Challenges
  • High dimensional measurement space
  • Non-linear dependencies and non-stationarity
  • Privacy and proprietary concerns
  • Insufficient bandwidth for cts sampled data
  • Approaches
  • Dimension reduction
  • Model-based distributed inference
  • Controlled information sharing
  • Hierarchical and modular collection/analysis

8
Hierarchical Architecure
9
2. Distributed Data Analysis
Site C
Site A
Site B
  • Hypothesis data collected at sites A,B,C follow
    a statistical distribution defined over a lower
    dimensional manifold.
  • Overall objective Find distributed strategies to
    perform reliable statistical inference with
    minimum amount of data sharing


10
2.1 Distributed Dimension Reduction
Unknown Manifold
Unknown Embedding
Unknown Distribution
Sampling
Observed Sample
11
Geodesic Entropic GraphsA Planar Sample and its
Euclidean MST
12
GMST Dimension Estimation
GMST Estimates d13 H120(bits)_
13
Distributed GMST Estimator
  • Principal MST convergence result
  • Distributed BHH (Aggregation rule)
  • Tight upper and lower bounds on limit if
    exchange rooted dual graphs Yukich97 among
    sites

BHH Theorem
14
2.2 Distributed Model-based Inference
  • Global likelihood model
  • Global M-estimator recursion
  • Global Fisher score function
  • Local Fisher score functions

15
Distributed M-estimator
A
B
16
Properties
  • Communication requirement is
  • 2p bytes/update/site.
  • If data are independent
    attain stationary points of global likelihood
  • All local MLEs are available
    to each site.
  • For multimodal likelihood, improvement on local
    MLEs can be achieved by aggregation under
    mixture model.

17
Global Likelihood Function
Local MLEs
x xx x xx xxxx x xx
18
Key Theoretical Result
  • The asymptotic distribution of local estimates is
    a Gaussian mixture dependent on global likelihood
  • Parameters

Proof asymptotic normal theory of local maxima
(Huber67) see BlattHero2003
19
Local Estimator Aggregation Algorithm
Sample Covariance Analysis
Estimation of Gaussian Mixture Parameters (FS
,EM)
Aggregation To Final Estimate
20
Simple Example
IID Observation Model
Local maximum
Global maximum
  • Each site observes 2 component Gaussian mixture
  • Identical component variances
  • Unknown mixing parameters
  • Unknown component means
  • 200 data collection sites
  • 100 samples/site
  • CEM2 algorithm implemented for estimation and
    aggregation

Ambiguity function.
21
Clustering and Discrimination
Local maximum
Inverse FIM
Global maximum
2
m
Empirically estimated covariances via CEM2
m
1
22
Validation of Key Result
QQ for Cluster 1
QQ for Cluster 2
23
Conclusions
  • Lossless distributed dimension reduction and
    model-based inference requires
  • Reliable local inference methods
  • Aggregation rules for combining local statistics
  • Information sharing constraints?
  • Effects of bandwidth constraints - data
    compression?
  • Tracking in dynamical models?

24
References
  • A. O. Hero, B. Ma, O. Michel and J. D. Gorman,
    Application of entropic graphs, IEEE Signal
    Processing Magazine, Sept 2002.
  • J. Costa and A. O. Hero, Manifold learning with
    geodesic minimal spanning trees, accepted in
    IEEE T-SP (Special Issue on Machine Learning),
    2004.
  • D. Blatt and A. Hero, "Asymptotic distribution of
    log-likelihood maximization based algorithms and
    applications," in Energy Minimization Methods in
    Computer Vision and Pattern Recognition
    (EMM-CVPR), Eds. M. Figueiredo, R. Rangagaran, J.
    Zerubia, Springer-Verlag, 2003
  • M.F. Shih and A. O. Hero, "Unicast-based
    inference of network link delay distributions
    using mixed finite mixture models," IEEE T-SP,
    vol. 51, No. 9, pp. 2219-2228, Aug. 2003
  • N. Patwari, A. O. Hero, and Brian Sadler,
    "Hierarchical censoring sensors for Change
    Detection, Proc. Of SSP, St. Louis, Sept.
    2003.

25
Information Sharing Game
26
Addition of other Discriminants
Value-added due to transmission of likelihood
values
Write a Comment
User Comments (0)
About PowerShow.com