Title: Understanding the Network-Level Behavior of Spammers
1Understanding the Network-Level Behavior of
Spammers
By Anirudh Ramachandran and Nick Feamster
Defense Team
- Mike Delahunty
- Bryan Lutz
- Kimberly Peng
- Kevin Kazmierski
- John Thykattil
2Agenda
- Introduction
- Background and Related Work
- Data Collection
- Network-level Characteristics of Spammers
- Spam from Botnets
- Spam from Transient BGP Announcements
- Lessons from Better Spam Mitigation
- Conclusion
3Introduction
- Spam
- Multiple emails sent to many recipients
- Unsolicited commercial messages
- Study based on network level behavior of spammers
- IP address ranges
- Spamming modes (route hijacking, bots, etc.)
- Temporal persistence of spamming hosts
- Characteristics of spamming botnets
- Much attention has been paid to studying the
content of spam
4Introduction Cont.
- Study posits that Network Level properties need
to be investigated in order to determine creative
ways to mitigate spam - Paper analyzes network properties of spam that is
observed at a large spam sinkhole - BGP route advertisements
- Traces of command and control messages of a Bobax
botnet - Legitimate emails
- Surprising Conclusions
- Most spam comes from a small IP address space
(but so does legitimate email) - Most spam comes from Microsoft Windows hosts
bots - Small set of spammers use short-lived route
announcements to remain untraceable
5Background
- Methods and Mitigation
- Spamming Methods
- Direct Spamming via spam friendly ISPs or
dial-up IPs - Open Relays and Proxies mail serves that allow
unauthenticated to relay email - Botnets hijacked machines acting under the
control of centralized botmaster - BGP Spectrum Agility short-lived route
announcements to the IP addresses from which they
send spam hampers traceability - Mitigation Techniques
- Filtering Content based and IP Blacklists
6Related Work
- Related Work Previous Studies
- Packet traces to determine bandwidth bottlenecks
from spam sources - Project Honeypot
- Sink for email traffic and hands out trap email
addresses to determine harvesting behavior and
identity of spammers - Time monitoring from harvesting to receipt of
first spam message - Countries where harvesting infrastructure is
located - Persistence of spam harvesters
7Related Work Cont.
- Mitigation
- SpamAssassin Project reverse engineering via
mail content analysis - DNS blacklist 80 of IPs sending spam were in
the blacklist - Unusual Route Announcements
- Bogus Well-Known addresses
- Suggestions of short lived route announcements
8Data Collection
- Reserve a sinkhole
- Registered domain with no legitimate email
addresses - Establish a DNS Mail Exchange record for it.
- All emails received by the server are spam
- Run metrics on incoming emails
- IP address of the relay also run a traceroute
- TPC fingerprint to get the source OS
- Results of DNS blacklist from 8 different
blacklist servers
9Data Collection Cont.
- Spam received per day at sinkhole (Aug. 2004
Dec. 2005)
10Data Collection Cont.
- Hijack the DNS server for the domain running a
botnet - Have botnet commands go to a known machine
instead. - Monitor the BGP update from the networks where
the spams are received - Collect logs from large email provider (40
million mailboxes) - Allows analysis of network characteristics for
spam and non-spam
11Data Analysis
- Study focuses on network level characteristics
- Distribution of spam across IP address space is
similar to legitimate emails (although not exact) - Spam over IP address range is not uniform
- 12 of all received spam comes from two
Autonomous Systems (AS) - 37 come from top 20 ASes.
- Offers insight into spam prevention
- Classifying spam by country China, Korea, US
dominate - Defense suggestion
- Correlate originating country with IP range to
estimate probability of spam.
12Cumulative Distribution Function (CDF) of Spam
and Legitimate Email
Greater probability of legitimate emails
Big increase in probability of received spam
13Spam Persistence
85 of unique spammers send 10 emails or less
If this is true for all, whats the value in
filtering by a specific IP address?
14Effectiveness of Blacklists
- About 80 of spam listed in at least one major
blacklist
15Effectiveness of Blacklists Cont.
- Most spam bots are detected by at least one
DNSRBL - Only 50 of spammers using transient BGP
announcements detected by one DNSRBL
16Spam from Botnets
- Circumstantial evidence suggests that most spam
originates from bots - Spamming hosts and Bobax drones have very similar
distributions across IP address space - Suggests that much spam received may be due to
botnets such as Bobax
17More on Bots
- Most individual bots send low volume of spam
individually
18Operating Systems Used by Spammers
- Used OS fingerprinting tool p0f in Mail Avenger
- Able to identify OS of 75 of hosts that sent
spam - Of this 75 identifiable segment, 95 run Windows
- Consistent with percentage of hosts on Internet
that run Windows - Only about 4 run other OS, but are responsible
for 8 of received spam. - This goes against common perception that most
spam originates from Windows botnet drones
19Spam from Transient BGP Announcements
- Some spammers briefly hijack large portions of IP
address space (that do not belong to them), send
spam, and withdraw routes immediately after
spamming - Not much known, not well defended against
- Very difficult to trace
- Allows spammer to evade DNSRBLs
- Used 10 or less of the time, as complementary
spamming tactic
20Lessons on Spam Mitigation
- Why should we use network-level information?
- Information is less malleable
- More constant than spam email contents, which
content-based filters monitor - Information is observable in the middle of the
network - Closer to the source of the spam than other
techniques - Will result in more effective spam filters
- When combined with other techniques
- Has potential to stop spam that other techniques
miss
21More Lessons
- Improves knowledge of host identity
- Bases detection techniques on aggregate behavior
- Protects against route hijacking
- BGP spectrum agility
- Other techniques do not
- Uses network-level properties to detect and
filter
22Conclusion
- Studying the network-level behavior of spammers
- Designing better spam filters with network-level
filters - Network-level behavior filters vs. content-based
filters - Should not replace content-based filters, but
complement them
23Questions?