Title: Traceback
1Traceback
- Pat Burke
- Yanos Saravanos
2Agenda
- Introduction
- Problem Definition
- Benchmarks and Metrics
- Traceback Methods
- Packet Marking
- Hash-based
- Conclusion
- References
3Why Use Traceback?
- General Network Monitoring
- Check users on FTP server
- Network Threats
- SPAM
- DoS
- Insider attacks
4Why Use Traceback?
- Network Threats
- Worms / Viruses
- Code Red (2001) spreading at 8 hosts/sec
- Slammer Worm (2003) spreading at 125 hosts/sec
- Illegal file sharing
5Why Use Traceback?
- Currently very difficult to find spammers, virus
authors - Easy to spoof IPs
- No inherent tracing mechanism in IP
- Blaster virus author left clues in code, was
eventually caught - What if we could trace packets back to point of
origin?
6Packet Tracing
7Packet Tracing
- Monitoring applications currently exist
- Ethereal, tcpdump, ngrep, etc
- Only work with untampered packets
- Worms, viruses, spam are sent with spoofed IPs
from compromised computers - Need solutions to trace all packets
8Preliminary Solutions
- Routers add identifiers to the packet as it moves
along the Internet - Packet size increases with every hop
- Effective throughput decreases very quickly
- Routers keep a log of all the packets that have
been routed - Large overhead required of all routers
- Huge database containing packet information
- When should you clear packet information?
9Benchmarks
- Effect on throughput
- Amount of overhead added to the packets
- False positive rate
- Percentage of paths traced back to the incorrect
source - Computational intensity
- Time required to trace an attack
- Amount of data required to trace an attack
- CPU/memory usage on router
10Benchmarks
- Tracebacks effect on network
- Does it flood?
- Susceptibility to spoofing
- Collisions
- For hash-based traceback methods
11Some Assumptions
- Attackers can create/spoof any packet
- Packets from an attack may take different routes
to victim - Attacker-victim routes are stable
- Routers are not compromised
12 Packet Marking
13Packet Marking
- Add information to the packets so that paths can
be retraced to original source - Methods for marking packets
- Probabilistic
- Node Marking
- Edge Marking
- Deterministic
14Probabilistic Packet Marking (PPM)
- Using probability, router marks a packet
- With router IP address (node marking)
- With edge of paths (edge marking)
- Node marking
- 95 accuracy, requires 300,000 packets
- Edge marking
- More state information required, converges much
faster
15PPM Nodes
- Each router writes its address in a 32-bit field
only with probability p - Address field can be overwritten by routers
closer to the victim - Probability of seeing the mark of a router d hops
away is p(1-p)d-1 - Need many packets before we see a mark from a
distant router
16PPM Nodes Pros
- Not every packet is marked
- Lower overhead on routers
- Higher throughput (packet size remains small)
- Fixed space is required for the packets
- Packet size 32 bits
17PPM Nodes - Cons
- Large number of false positives
- DDoS with 25 hosts requires several days and has
thousands of false positives - Slow convergence rate
- For 95 success, we need 300,000 packets
- Attacker can still inject modified packets into
PPM network (mark spoofing) - This is only for a single attacker
18PPM Edge Sampling
- Reserve distance field and two 32-bit address
fields (start and end) - If router decides to mark a packet, writes its
address in start field and zeroes the distance
field - When a router sees a zero in the distance field,
it writes its address in the end field - If a router decides not to mark a packet,
increments distance field - Must use saturating addition (distance field has
limit)
19PPM Edge Sampling
- Max packets to reconstruct an attack is
ln(d)/p(1-p)d-1 - Requires fewer packets than when marking nodes
- Edge sampling allows reconstruction of the whole
attack tree - Packets have additional overhead
- Encoding start, end, and distance eliminates
compatibility with networks not using PPM
20Deterministic Packet Marking (DPM)
- Every packet is marked
- Spoofed marks are overwritten with correct marks
21DPM
- Incoming packets are marked
- Outgoing packets are unaltered
- Requires more overhead than PPM
- Less computation required
- Probability of generating ingress IP address
(1-p)d-1
22DPM
- 32-bit address is split into two fields (0-15 and
16-31) and a flag - IP populates one of the two fields with
probability of 0.5 - Set flag to 1 if using the higher end bits
- Only part of the address is available to the
attacker - Can be made more secure by using non-uniform
probability distributions
23DPM
- Claimed to have 0 false positives
- Claimed to converge very quickly
- 99 probability of success with 7 packets
- 99.9 probability of success with only 10 packets
- Has not been tested on large networks
- Cannot deal with NAT
24HASH-BASED TRACEBACK Source Path Isolation
Engine (SPIE)
25SPIE - Overview
- Each router along a packets transmission path
computes a set of Hash-codes (digests) associated
with each packet - The time-tagged digests are stored in
router-memory for some time period - Limited by available router resources
- Traceback is initiated only by authenticated
agent requests to the SPIE Traceback Manager
(STM) - Executed by means of a broadcast message
- Results in the construction of a complete attack
graph within the STM
26SPIE - Assumptions
- Packets may be addressed to multiple destinations
- Attackers are aware they are being traced
- Routers may be subverted, but not often
- Routing within the network may be unstable
- Traceback must deal with divergent paths
- Packet size should not grow as a result of
traceback - 1 byte increase in size 1 increase in resource
use - Very controversial self-enabling assumption
- End hosts may be resource constrained
- Traceback is an infrequent operation
- Broadcast messages can have a significant impact
on internet performance - Traceback should return entire path, not just
source
27SPIE - Architecture
DGA (Data Generation Agent) Resident in
SPIE-enhanced routers to produce digests and
store them in time-stamped digest tables.
Implemented as software agents, interface
cards, or dedicated aux boxes
STM (SPIE Traceback Manager) Controls the SPIE
system. Verifies authenticity of a traceback
request, dispatches the request to the
appropriate SCARs, gathers regional attack
graphs, and assembles the complete attack graph.
SCAR (SPIE Collection and Reduction Agents)
Data concentration point for some regional area.
When traceback is requested, SCARs initiate a
broadcast request for traceback and produce
regional attack graphs based upon data from
constituent DGAs
28SPIE - Hashing
LAN .139
WAN .00092
Masked (gray) areas are NOT used in hash-code
calculation
- Multiple hash-codes (hash-codes, different
groupings of fields) are calculated for each
package based on 24 relatively invariant fields
of the first 32 bytes of each packet. - Packet was received if all hashes are positive
- Hash functions can be simple (no cryptographic
hardness required) and relatively fast
29SPIE Implementation Issues
- PRO
- Single packet tracing is feasible
- Automated processing by SPIE-enhanced routers
make spoofing difficult, at best - Relatively low storage required
- Only digests and time are stored
- Does not aid in eavesdropping of payload data
- Payload is not stored
- CON
- Requires specially configured (SPIE-enhanced)
routers. - Probability of detection is directly related to
the number of available SPIE-enhanced routers in
the network in question - Storage in routers is a limiting factor in the
window of time in which a packet may be
successfully traced - May consider some sort of filtering of packets to
be digested - May have the appearance of a loss of anonymity
across the Internet
30Conclusions
- DoS, worms, viruses continuously becoming more
dangerous - Attacks must be shut down quickly and be
traceable - Integrating traceback into next generation
Internet is critical
31Conclusions
- Probabilistic Packet Marking
- Keeps low packet overhead
- Not 100 accurate, traceback is slow
- Deterministic Packet Marking
- No false positives
- Much higher packet overhead, needs more testing
- Hash-based Traceback
- No packet overhead
- New, more capable routers
32Conclusions
- Cooperation is required
- Routers must be built to handle new tracing
protocols - ISPs must provide compliance with protocols
- Internet is no longer anonymous
- Some issues must still be solved
- NATs
- Collisions
33References
- Belenky, A., Ansari, N. IP Traceback with
Deterministic Packet Marking. IEEE
Communications Letter, April 2003. - Savage, S., et al. Practical Network Support
for IP Traceback. Department of Computer
Science, University of Washington. - Snoeren, A., Partridge, Craig, et al.
Single-Packet IP Traceback. IEEE/ACM
Transactions on Networking, December 2002.