Title: KYUNG-HWA%20KIM
1DYSWIS
- KYUNG-HWA KIM
- HENNING SCHULZRINNE
- 12/09/2008
- INTERNET REAL-TIME LAB,
- COLUMBIA UNIVERSITY
2Do You See What I See?
Do you see what I see?
End user
Internet
End user
End user
3Outline
- Overview
- Fault Detection
- Peer Selection
- Probing
- Problem
- Implementation
- Demo
4Overview
- Overview
- DYSWIS Do you see what I see
- Distributed network fault detection and analysis
system - Motivation
- Different causes for a particular network fault
- Need different view from other sources for the
fault - End-to-end diagnosis
- Need user-friendly interface
- Current Problem
- Centralized management schemes
- Complexity in the user network and devices
- Failed to solve the service quality problem
- Approach
- Collaborate with other end users
- P2P based
- Remote probing
5For Quick Understanding
6Fault Detection
- Automatic fault detection
- Network raw packet capturing
- Analyze network packet and protocol
- Raw packet capturing
- Check error response
- Check timeout
- Check TCP congestion
- Monitoring TCP sequence numbers
- Define fault cases
- Automatic vs. Manual
- FSM approach
- pre-define
- learning
7FSM - Approach
Automatic Protocol Failure Detection Using
Finite State Machines Zhifeng Wang , Kai X.
Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa
Kim, Vishal Kumar Singh
8FSM - Approach
Automatic Protocol Failure Detection Using
Finite State Machines Zhifeng Wang , Kai X.
Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa
Kim, Vishal Kumar Singh
9Peer Selection
- Peer Selection
- DHT or Database
- Register myself to DHT network
- AS number, subnet, first hop, AP.
- Search probing nodes
- Inner nodes and outer nodes
You can contact to B. His IP address is
218.59.21.16 and port number is 9090
I need some nodes who can help me. Who is in
same subnet with me?
A
B
DHT
10Peer Selection - DHT (key, value)
ltkeygt lttypegtnodelt/typegt ltasngt14ltasngt
ltsubnetgt128.59.0.0/16lt/subnetgt lt/keygt
ltvaluegt lttypegtnodelt/typegt ltipgt128.59.21.15lt/ip
gt ltportgt9090lt/portgt ltprotocolgtudplt/protocolgt lt
/valuegt
I need some nodes who can help me. Who is in
same subnet with me?
ltkeygt lttypegtnodelt/typegt ltasngt9880ltasngt
ltsubnetgt45.45.45.0/24lt/subnetgt
ltfirewallgtnolt/firewallgt ltnatgtnolt/natgt lt/keygt
ltvaluegt lttypegtnodelt/typegt ltipgt128.59.21.15lt/ip
gt lthostnamegtkkh.cs.columbia.edult/hostnamegt
ltportgt9090lt/portgt ltprotocolgttcplt/protocolgt lt/val
uegt
A
B
DHT
11Remote Probing
- Distributing modules
- Detecting and probing modules should be added and
updated - Dynamic class loading
- Dynamic module distributing
- Modules can be created and updated separately.
- XMLRPC
12Probing Scenarios
- HTTP
- Causes Dead web-server , page moved, low
bandwidth - Check DNS query
- TCP connection
- Ask other node to try same query
- Check TCP congestion
-
- DNS
- Causes Dead DNS server , resolution failed, udp
is not working , - Check other DNS server
- Ask other node to try to connect my DNS server
- Ask other node to query same host to another DNS
server - SIP/RTP
- Causes NAT, DNS, proxy server, authentication
- Proxy connectivity test
- Ask other node to try same action.
-
13Probing Scenarios
- Connection problem
- Causes Dead server, firewall, wrong port number
- Traceroute Check routers
- Ask other node to try to connect the server
- Ask other node to check my port
-
- TCP Congestion
- Causes Queuing delay, dead routers
- Traceroute , ping
- Try to find bottleneck
-
14Probing Scenarios
A
B
15Data Gathering
- Problem
- We have resources Other machines
- But how do we use them efficiently?
- We need real data
- Approach
- Collecting data
- Collecting Scenarios
- Implementing prototype
16Implementation
http//wiki.cs.columbia.edu/display/res/DYSWIS
17 For the detail, visit http//wiki.cs.columbia.e
du/display/res/DYSWIS
18Demo
19Future work
- Implementation
- http//www.cs.columbia.edu/khkim/project/dyswis
- Coming soon Mac Linux
- Testbed - PlanetLab
- Mature research for analysis
- Support real time protocols
- How to find solutions for end users
20backup
- Check local network.
- Select two nodes, one from same subnet, another
one from outer subnet. - Let the nodes try to connect the server.
- If both nodes failed to connect the server, log
this fault as server failure. - If only internal node failed, execute traceroute
to check where the packet is blocked. - If internal node succeeded, it is possible that
this problem is caused by local firewall or
something else. - Check incoming/outgoing port Let other nodes
open same port, and try to connect there. Check
the remote node received packet or not. Check the
ACK from remote node came back.