KYUNG-HWA%20KIM - PowerPoint PPT Presentation

About This Presentation
Title:

KYUNG-HWA%20KIM

Description:

Distributed network fault detection and analysis system. Motivation ... Distributing modules. Detecting and probing modules should be added and updated ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 21
Provided by: khk92
Category:

less

Transcript and Presenter's Notes

Title: KYUNG-HWA%20KIM


1
DYSWIS
  • KYUNG-HWA KIM
  • HENNING SCHULZRINNE
  • 12/09/2008
  • INTERNET REAL-TIME LAB,
  • COLUMBIA UNIVERSITY

2
Do You See What I See?
Do you see what I see?
End user
Internet
End user
End user
3
Outline
  • Overview
  • Fault Detection
  • Peer Selection
  • Probing
  • Problem
  • Implementation
  • Demo

4
Overview
  • Overview
  • DYSWIS Do you see what I see
  • Distributed network fault detection and analysis
    system
  • Motivation
  • Different causes for a particular network fault
  • Need different view from other sources for the
    fault
  • End-to-end diagnosis
  • Need user-friendly interface
  • Current Problem
  • Centralized management schemes
  • Complexity in the user network and devices
  • Failed to solve the service quality problem
  • Approach
  • Collaborate with other end users
  • P2P based
  • Remote probing

5
For Quick Understanding
6
Fault Detection
  • Automatic fault detection
  • Network raw packet capturing
  • Analyze network packet and protocol
  • Raw packet capturing
  • Check error response
  • Check timeout
  • Check TCP congestion
  • Monitoring TCP sequence numbers
  • Define fault cases
  • Automatic vs. Manual
  • FSM approach
  • pre-define
  • learning

7
FSM - Approach
Automatic Protocol Failure Detection Using
Finite State Machines Zhifeng Wang , Kai X.
Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa
Kim, Vishal Kumar Singh
8
FSM - Approach
Automatic Protocol Failure Detection Using
Finite State Machines Zhifeng Wang , Kai X.
Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa
Kim, Vishal Kumar Singh
9
Peer Selection
  • Peer Selection
  • DHT or Database
  • Register myself to DHT network
  • AS number, subnet, first hop, AP.
  • Search probing nodes
  • Inner nodes and outer nodes

You can contact to B. His IP address is
218.59.21.16 and port number is 9090
I need some nodes who can help me. Who is in
same subnet with me?
A
B
DHT
10
Peer Selection - DHT (key, value)
ltkeygt lttypegtnodelt/typegt ltasngt14ltasngt
ltsubnetgt128.59.0.0/16lt/subnetgt lt/keygt
ltvaluegt lttypegtnodelt/typegt ltipgt128.59.21.15lt/ip
gt ltportgt9090lt/portgt ltprotocolgtudplt/protocolgt lt
/valuegt
I need some nodes who can help me. Who is in
same subnet with me?
ltkeygt lttypegtnodelt/typegt ltasngt9880ltasngt
ltsubnetgt45.45.45.0/24lt/subnetgt
ltfirewallgtnolt/firewallgt ltnatgtnolt/natgt lt/keygt
ltvaluegt lttypegtnodelt/typegt ltipgt128.59.21.15lt/ip
gt lthostnamegtkkh.cs.columbia.edult/hostnamegt
ltportgt9090lt/portgt ltprotocolgttcplt/protocolgt lt/val
uegt
A
B
DHT
11
Remote Probing
  • Distributing modules
  • Detecting and probing modules should be added and
    updated
  • Dynamic class loading
  • Dynamic module distributing
  • Modules can be created and updated separately.
  • XMLRPC

12
Probing Scenarios
  • HTTP
  • Causes Dead web-server , page moved, low
    bandwidth
  • Check DNS query
  • TCP connection
  • Ask other node to try same query
  • Check TCP congestion
  • DNS
  • Causes Dead DNS server , resolution failed, udp
    is not working ,
  • Check other DNS server
  • Ask other node to try to connect my DNS server
  • Ask other node to query same host to another DNS
    server
  • SIP/RTP
  • Causes NAT, DNS, proxy server, authentication
  • Proxy connectivity test
  • Ask other node to try same action.

13
Probing Scenarios
  • Connection problem
  • Causes Dead server, firewall, wrong port number
  • Traceroute Check routers
  • Ask other node to try to connect the server
  • Ask other node to check my port
  • TCP Congestion
  • Causes Queuing delay, dead routers
  • Traceroute , ping
  • Try to find bottleneck

14
Probing Scenarios
A
B
15
Data Gathering
  • Problem
  • We have resources Other machines
  • But how do we use them efficiently?
  • We need real data
  • Approach
  • Collecting data
  • Collecting Scenarios
  • Implementing prototype

16
Implementation
  • Architecture

http//wiki.cs.columbia.edu/display/res/DYSWIS
17
For the detail, visit http//wiki.cs.columbia.e
du/display/res/DYSWIS
18
Demo
  • Demo

19
Future work
  • Implementation
  • http//www.cs.columbia.edu/khkim/project/dyswis
  • Coming soon Mac Linux
  • Testbed - PlanetLab
  • Mature research for analysis
  • Support real time protocols
  • How to find solutions for end users

20
backup
  • Check local network.
  • Select two nodes, one from same subnet, another
    one from outer subnet.
  • Let the nodes try to connect the server.
  • If both nodes failed to connect the server, log
    this fault as server failure.
  • If only internal node failed, execute traceroute
    to check where the packet is blocked.
  • If internal node succeeded, it is possible that
    this problem is caused by local firewall or
    something else.
  • Check incoming/outgoing port Let other nodes
    open same port, and try to connect there. Check
    the remote node received packet or not. Check the
    ACK from remote node came back.
Write a Comment
User Comments (0)
About PowerShow.com