Geolocation by IP address - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Geolocation by IP address

Description:

Better scalability and lower cost than dedicated technology ... Mercator. UDP probes are send to high-numbered ports on a set of interfaces ... – PowerPoint PPT presentation

Number of Views:449
Avg rating:3.0/5.0
Slides: 56
Provided by: ls165
Category:

less

Transcript and Presenter's Notes

Title: Geolocation by IP address


1
Geolocation by IP address
  • Locating Internet hosts

Sándor Laki lakis_at_inf.elte.hu http//lakis.web.elt
e.hu
2
Outline
  • RADAR a wireless solution
  • IP2Geo on the Internet
  • Constraint-Based Geolocation
  • (GeoLim)
  • OCTANT framework
  • Topology-Based Geolocation

3
RADAR
4
RADAR, a wireless approach
  • Focus on the indoor environment
  • GPS does not work indoors
  • Dedicated technologies
  • Goals
  • Leverage existing infrastucture
  • Use wireless LAN
  • Software solution
  • Better scalability and lower cost than dedicated
    technology

5
RADAR
  • Key idea Signal strength matching
  • Offline calibration
  • Construct radio map (ltlocation, Sstrgt)
  • Real-time location and tracking
  • Extract SStr from beacons
  • Find table entry that best matches the measured
    SStr

6
RADAR Determine location
  • Find nearest neighbor in signal space (NNSS)
  • 1st solution
  • Physical position of NNSS gives the user location
  • 2nd solution
  • K-NNSS
  • Average the coordinates of k nearest neighbor
    gives the wanted position

7
Correlation between physical location and signal
strength
  • Base system
  • INFOCOM 2000 paper
  • Enhanced system
  • Microsoft Technical Report MSR-TR-2000-12

8
IP2Geo
  • Single-point localization

9
IP2Geo - Motivation
  • Much focus on location-aware services in wireless
    and mobile contexts
  • Such services are relevant in the Internet
    context too
  • targeted advertising
  • event notification
  • territorial rights management
  • network diagnostics
  • It is a challenging problem
  • IP address does not inherently contain an
    indication of location

10
IP2Geo
  • Multi-pronged approach that exploits various
    properties of the Internet
  • DNS names of router interfaces often indicate
    location
  • network delay tends to correlate with geographic
    distance
  • hosts that are aggregated for the purposes of
    Internet routing also tend to be clustered
    geographically
  • GeoTrack
  • determine location of closest router with a
    recognizable DNS name
  • GeoPing
  • use delay measurements to estimate location
  • GeoCluster
  • extrapolate partial (and possibly inaccurate)
    IP-to-location mapping information using BGP
    prefix clusters

11
GeoTrack main idea
  • Extract geographical information from DNS names
    of routers on the path
  • Localizes the target to the last router whose
    position is known
  • Example
  • ngcore1-serial8-0-0-0.Seattle.cw.net gt Seattle
  • 184.atm6-0.xr2.ewr1.alter.net gt New York
  • dnvr-scrm.abilene.ucaid.edu gt Denver

12
GeoTrack
  • GeoTrack operation
  • do a traceroute to the target IP address
  • determine location of last recognizable router
    along the path
  • Key ideas in GeoTrack
  • partitioned city code database to minimize chance
    of false match
  • ISP-specific parsing rules
  • delay-based correction
  • Limitations
  • routers may not respond to traceroute
  • DNS name may not contain location information or
    lookup may fail
  • target host may be behind a proxy or a firewall

13
GeoPing - Delay based localization
  • Delay-based triangulation is conceptually simple
  • delay to distance
  • distance from 3 or more non-colinear points gt
    target location
  • But there are practical difficulties
  • network path may be circuitous
  • transmission queuing delays may corrupt delay
    estimate
  • OWD is hard to measure
  • OWD ? RTT/2 because of routing asymmetry

14
GeoPing - details
  • Measure the network delay to the target host from
    several geographically distributed probes
  • typically more than 3 probes are used
  • round-trip delay measured using ping utility
  • small-sized packets gt transmission delay is
    negligible
  • pick minimum among several delay samples
  • Nearest Neighbor in Delay Space (NNDS)
  • akin to Nearest Neighbor in Signal Space (NNSS)
    in RADAR
  • construct a delay map containing (delay
    vector,location) tuples
  • given a vector of delay measurements, search
    through the delay map for the NNDS
  • location of the NNDS is our estimate for the
    location of the target host
  • More robust that directly trying to map from
    delay to distance

15
GeoPing Delay tends to increase with geographic
distance
16
GeoPing Estimation error
17
GeoCluster
  • A passive method

18
GeoCluster
  • A passive technique unlike GeoTrack and GeoPing
  • Basic idea
  • breaks the IP address space into clusters
  • assign a geographical location to each cluster
    based on IP-to-location third party databases
  • given a target IP address, first find the
    matching cluster using longest-prefix match.
  • location of matching cluster is our estimate of
    host location

19
GeoCluster
  • Example
  • consider the cluster 128.95.0.0/16 (containing
    65536 IP addresses)
  • suppose we know that the location corresponding
    to a few IP addresses in this cluster is Seattle
  • then given a new address, say 128.95.4.5, we
    deduce that it is likely to be in Seattle too

20
GeoCluster Clustering IP addresses
  • Exploit the hierarchical nature of Internet
    routing
  • inter-domain routing in the Internet uses the
    Border Gateway Protocol (BGP)
  • BGP operates on address aggregates
  • we treat these aggregates as clusters
  • in all we had about 100,000 clusters of different
    sizes

21
IP-to-location mapping
  • Data sources
  • e-mail service, business web-hosting companies,
    etc.
  • requires a large, fine-grain and fresh database!
  • Information
  • partial information (i.e., only for a small
    subset of addresses)
  • possibly inaccurate (e.g., manual input from
    user)

22
Extrapolating IP-to-location mapping
  • Determine location most likely to correspond to a
    cluster
  • majority polling
  • average location
  • dispersion is an indicator of our confidence in
    the location estimate
  • What if there is a large geographic spread in
    locations?
  • some clusters correspond to large ISPs and the
    internal subdivisions are not visible at the BGP
    level
  • sub-clustering algorithm keep sub-dividing
    clusters until there is sufficient consensus in
    the individual sub-clusters
  • some clients connect via proxies or firewalls
    (e.g., AOL clients)
  • sub-clustering may help if there are local or
    regional proxies
  • otherwise large dispersion gt no location
    estimate made
  • many tools fail in this regard

23
Performance of GeoCluster
Median errors GeoCluster 30km GeoPing
300km GeoTrack 100km
24
Other database-oriented applications
  • NetGeo and IP2LL
  • based on WHOIS DB
  • not closely regulated
  • the address information often indicates the head
    office of the owner which may be far from the
    actual target
  • Quova
  • Commercial service with thier own database
  • Gtrace
  • using DNS LOC entries

25
Octant framework
  • A very impressive solution

26
Octant overview
  • Combine very different techniques
  • Active and passive
  • Constraint-based
  • Weighted positive and negative constraints
  • Constraint gt region
  • Using Bézier-regions
  • Efficient implementations of clipping and union
    operations are available

27
Octant - Notations
  • bi the region in which the target node is
    located
  • gj a constraint
  • It is a region where the node might be reside
    associated with weight
  • Set of nodes
  • Landmarks physical locations are at least
    partially known (Lj)
  • Every Lj has an estimated location bLj

28
Octant Landmarks and constraints
  • Primary landmark
  • GPS, street address
  • Low error
  • Secondary landmark
  • Position computed by Octant itself
  • Positive constraints ( set ? )
  • Node A is within d miles of Lk
  • g ?(x,y) in bk c(x,y,d), where c(x,y,d) is a
    disc.
  • Negative constraints( set ? )
  • Node A is further than d miles from Lk
  • g ?(x,y) in bk c(x,y,d)

29
Estimated location
  • bi ?Xi??Xi \ ?Xi??Xi

30
Mapping latencies to distances
  • Latency between a target and a landmark
  • bounds thier maximum distance
  • Calculate with speed of light
  • delay2/3c
  • Low precision
  • Octants way
  • Dynamic calibration
  • For each L landmark compute two bounds RL(d) and
    rL(d)
  • where d is the ping time of node i
  • rL(d) ? loc(L) loc(i) ? RL(d)
  • When queuing delays are dominant then rL(d) 0.

31
Mapping latencies to distances
  • Each landmark periodically pings all other
    landmarks gt creating a correlation table
  • Determines the convex hull around the points gt
    R(d) and r(d)
  • It is sufficient when the target has a direct and
    congestion-free path to the landmark
  • Octant introduce a cut off at latency p
  • a tunable percentage of landmark lie to the left
    of p
  • discard the others
  • (z is a fictitious datapoint,
  • placed far away)

32
Mapping latencies to distances
33
Last hop delays
  • Mapping is further complicated by queuing and
    transmission delays associated with the last hop
  • Cable and DSL connections
  • Overloaded PlanetLAB nodes
  • Goal isolate the delay components which
    artificially inflate latencies
  • Detailed maps of the underlying physical network,
    as in network tomography (not in Octant)
  • Octant introduce a simple metric called height

34
Last hop delays in Octant
  • Based on pair-wise latency measurements between
    landmarks
  • Primary landmarks a, b, c
  • Measure thier latencies(RTT) a,b, a,c, b,c
  • The positions of primary landmarks are known -gt
    we can estimate the transmission delays (a,b),
    (a,c), (b,c)
  • Lasthop delay(a,b) a,b - (a,b)
  • Landmark coordinates (alon, alat),

35
Last hop delays in Octant
  • How much of the delays can be attributed to each
    landmark?
  • Denoted by a, b and c // height
  • Similarly, for a target t, we can compute t, as
    an estimation
  • We can solve for
  • t, tlon, tlat

36
Last hop delays
  • tlon and tlat has relatively high error
  • not used in the later stages
  • Given the target and landmark heights
  • Each landmark can shift its
  • RL up if t lt heights of the other landmarks
  • rL down if t gt heights of the other landmarks

37
Indirect routes
  • The preceding assumption
  • Route lengths are proportional to great circle
    distances
  • not the case in practise, due to policy routing
  • Example a subscriber Ithaca, NY -gt Cornell Univ.
    (Ithaca)
  • Syracuse, NY -gt Brockport, IL -gt New York City -gt
    Cornell Univ.
  • 1 mile physical distance VS. 800 miles length
    path

38
Indirect routes discovery
  • Landmarks heigth can indicate
  • Localizing routers on the network path
  • Secondary landmarks
  • Localization by latencies
  • Extract location from router names
  • Reverse DNS lookup undns tool
  • Using ZIP code to determine geographical location

39
Handling uncertainty
  • Filter out errorneous constraint
  • Latency based constraints
  • Weight system that decreases exponentially with
    increasing latency
  • Weight threshold

40
Iterative refinement
  • Two phase
  • First, we use accurate and mostly conservative
    constraints
  • Second, less acurate and more aggressive
    constraints to obtain a better estimation (inside
    the initial estimated region)

41
Results
42
Results
43
Topology-based Geolocation
44
Motivations
  • Problems with CBG
  • Use constraints that are less than speed of light
  • Risk of underestimates
  • When an underestimate occurs, the final region
    does not contain the true location
  • Topology based geolocation
  • using the speed of light to generate constraints
  • inspired by Sensor Network Localization

45
Summary of techniques
  • Traceroute from landmarks
  • Map topology
  • Estimate hop latency
  • Improve accuracy
  • Cluster network interfaces
  • Increase structuring
  • Validate location hints
  • Incorporate location hints
  • Constraint optimization
  • Geolocate targets

46
Estimate hop latencies
  • Using traceroute tool to infer link latency
  • Estimate hop latency from the difference in RTT
    to adjacent routers
  • Accurate only if the link is traversed both
    directions (symmetric routing)
  • How can we discover this property?
  • Three different techniques

47
Estimate hop latencies
  • First, observing the reverse TTL values
  • Most routers initialize the TTL values for thier
    packets from a small set.
  • 30,32,64,128,150,255
  • If TTL values changes significantly from one node
    to the next gt discard the link estimate
  • Second, measuring paths in both direction between
    pairs of landmarks
  • If both paths traverse a particular link gt
    taking the differences of measurements to the two
    endpoints
  • This estimation has high confidence
  • Third, increasing vantage points from which we
    probe a certain link
  • For every link on a path from a landmark we
    probes to both endpoints from all other
    landmarks
  • If these probes pass over the link gt estimate
    for the link

48
Clustering interfaces
  • Clustering interfaces that belong to the same
    router (IP aliases)

49
Clustering interfaces
  • Two IP-aliases techniques
  • Mercator
  • UDP probes are send to high-numbered ports on a
    set of interfaces
  • Routers send back a port-unreachable ICMP message
    with the source address
  • If two diff. interfaces replie with the same
    source address gt aliases
  • Ally
  • Used on pairs of interfaces
  • Sends probes to the two if.
  • Examines the IP-ID
  • Most routers generate the IP-ID using a single
    counter that has incremented after each packet
    has been created

50
Validating location hints
  • DNS names -gt locations
  • Some names are incorrect
  • Missnamed, reconfig, reassignment of IP addresses
  • Topology constraints can be used to verify
    location hints
  • RTT measurements -gt upper bounds
  • Clustering -gt aliases
  • Hop latencies

51
Constraint optimization
  • TargetsX xi LandmarksL li
  • Distance bw i and j d(i,j)
  • Hard delay constraint
  • D(li xj) lt cij // cij Speed of
    light
  • Set of hdc Cd
  • Soft Link Latency Constraints
  • Hop latency bw i and j hij
  • D(xi,xj) hij eij
  • Where eij is some error
  • Set of SLLC Cl

52
Constraint optimization
  • Minimize ?i,j in Cleij
  • Subject to Cd , Cl
  • Not a convex optimization problem
  • But we can recast it as a semidefinite program
  • Using fast solvers
  • SeDuMi
  • Vivaldi

53
Results
54
Results
55
References
  • RADAR and IP2Geo
  • http//eris.prakinf.tu-ilmenau.de/res/papers/coop
    Streaming/padmanabhan01Locating.pdf
  • Bernard Wong, Ivan Stoyanov and Emin Gün Sirer.
  • Octant A Comprehensive Framework for the
    Geolocalization of Internet Hosts.In Proceedings
    of the Symposium on Networked System Design and
    Implementation, Cambridge, Massachusetts, April
    2007.
  • Katz-Bassett, E., John, J. P., Krishnamurthy, A.,
    Wetherall, D., Anderson, T., and Chawathe, Y.
  • Towards IP geolocation using delay and topology
    measurements.
  • In Proceedings of the 6th ACM SIGCOMM on
    internet Measurement (Rio de Janeriro, Brazil,
    October 25 - 27, 2006). IMC '06. ACM Press, New
    York, NY, 71-84.
Write a Comment
User Comments (0)
About PowerShow.com