Title: Folie 1
1S w i s s I n t e r n e t A n a l y s i s
2 0 0 2
lthttp//www.swiss-internet-analysis.orggt
ZHW Diploma Thesis Olivier Müller Daniel
Graf SWINOG 6 10. April 2003
2S w i s s I n t e r n e t A n a l y s i s
2 0 0 2
lthttp//www.swiss-internet-analysis.orggt
A few numbers 150000000000 Bytes of data
collected and parsed 56712582 HTTP-Request
s analysed 13971288 E-Mails analysed
5000000 Database rows
331820 Domains 77526 Server-IP-address
es 1985 AS-numbers
20 Providers and News-websites asked
7 Hours on the phone 5 Meetings
with supporters and also
100 Liter Rivella, Sprite, Cola, Pepita and
Valser 25 Different restaurants and
pubs visited in Zürich
3Domain-Analysis
Three Parts
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
4Domain-Analysis
Task description
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
5Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
1 Import der S 2 Auflösung de 3 Vervollständ 4
Überprüfung 5 Überprüfung 6 Beschaffung 7
Beschaffung 8 Speicheru 9 Speicheru
- Final Conclusion
- Next steps
- Questions?
6- Mailservers pro AS-Nummer (Top 50)
- Maildomains pro AS-Nummer (Top 50)
- Mailservers nach Betriebssystem
- Maildomains nach Betriebssystem
- Mailservers nach Software
- Maildomains nach Software
- Domains pro Mailserver
- Nameservers pro AS-Nummer (Top 50)
- Namedomains pro AS-Nummer (Top 50)
- Nameservers nach Betriebssystem
- Namedomains nach Betriebssystem
- Nameservers nach BIND-Version (Top 50)
- Namedomains nach BIND-Version (Top 50)
- Namedomains per Software (Top 50)
- Webservers pro AS-Nummer (Top 50)
- Webdomains pro AS-Nummer (Top 50)
- Webservers nach Betriebssystem
- Webdomains nach Betriebssystem
- Webservers nach Software (Top 30)
Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
7Domain-Analysis
Task description
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
8Domain-Analysis
- Procedure
Unique format
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
date time from size to
9- Tabellen
- Anzahl Empfänger pro E-Mail
- Häufigkeit externer Domains
- Anzahl Domains, die bis zu 50/500/5000 Mal
vorkommen - Top 10 externe Domains mit Sendmail
- Top 10 externe Domains mit Qmail
- Top 10 externe Domains mit X1
- Top 10 externe Domains mit Criticalpath
- Top 10 externe Domains mit Postfix
- Top 10 externe Domains mit UnixRS
- Top 10 externe Domains mit Exim
- Top 10 externe Domains mit Postoffice
- Top 10 externe Domains mit Vopmail
- Top 10 externe Domains mit MSExchange
- Top 10 externe Domains mit Ms_esmtp
- Top 10 externe Domains mit Lotusdomino
- Rangliste der MTA-Typen nach Anzahl E-Mails (Top
30) - Rangliste der MTA-Typen nach unterschiedlichen
Domains (Top 30)
Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
10- Tabellen
- Anzahl Empfänger pro E-Mail
- Häufigkeit externer Domains
- Anzahl Domains, die bis zu 50/500/5000 Mal
vorkommen - Top 10 externe Domains mit Sendmail
- Top 10 externe Domains mit Qmail
- Top 10 externe Domains mit X1
- Top 10 externe Domains mit Criticalpath
- Top 10 externe Domains mit Postfix
- Top 10 externe Domains mit UnixRS
- Top 10 externe Domains mit Exim
- Top 10 externe Domains mit Postoffice
- Top 10 externe Domains mit Vopmail
- Top 10 externe Domains mit MSExchange
- Top 10 externe Domains mit Ms_esmtp
- Top 10 externe Domains mit Lotusdomino
- Rangliste der MTA-Typen nach Anzahl E-Mails (Top
30) - Rangliste der MTA-Typen nach unterschiedlichen
Domains (Top 30)
Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
11- Tabellen
- Anzahl Empfänger pro E-Mail
- Häufigkeit externer Domains
- Anzahl Domains, die bis zu 50/500/5000 Mal
vorkommen - Top 10 externe Domains mit Sendmail
- Top 10 externe Domains mit Qmail
- Top 10 externe Domains mit X1
- Top 10 externe Domains mit Criticalpath
- Top 10 externe Domains mit Postfix
- Top 10 externe Domains mit UnixRS
- Top 10 externe Domains mit Exim
- Top 10 externe Domains mit Postoffice
- Top 10 externe Domains mit Vopmail
- Top 10 externe Domains mit MSExchange
- Top 10 externe Domains mit Ms_esmtp
- Top 10 externe Domains mit Lotusdomino
- Rangliste der MTA-Typen nach Anzahl E-Mails (Top
30) - Rangliste der MTA-Typen nach unterschiedlichen
Domains (Top 30)
Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
12Domain-Analysis
Task description
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
131.8 GB
Domain-Analysis
4.4 GB
- Procedure
- Results
7.7 GB
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
0.7 GB
Conclusion
4.2 GB
- Final Conclusion
- Next steps
- Questions?
0.1 GB
14 1
Domain-Analysis
Unify Logs
- Procedure
- Results
E-Mail-Analysis
The Logfiles are first unified Apache CLF
(Combined Log Format)
- Procedure
- Results
Web-Analysis
127.0.0.1 - frank 10/Oct/2000135536 -0700
"GET /apache_pb.gif HTTP/1.0" 200 2326
"http//www.example.com/start.html" "Mozilla/4.08
en (Win98 I Nav)"
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
Contents of Logfile entries
152
Domain-Analysis
Logfile Size Problem
- Procedure
- Results
Problem Too big files om_at_dapc/home/da2002/LOGS
/da20min/logsgt la -lah -rw-r--r-- 1 da2002
users 266M Oct 15 1428 09-01.access_log
-rw-r--r-- 1 da2002 users 775M Oct 15 1428
09-02.access_log -rw-r--r-- 1 da2002 users 706M
Oct 15 1428 09-03.access_log ... -rw-r--r--
1 da2002 users 229M Oct 15 1429 09-07.access_log
-rw-r--r-- 1 da2002 users 231M Oct 15 1429
09-08.access_log Solution Parsing scripts
using temporary backup in SQL DB.
./web_time_size.pl 05 thu lt ../LOGS/____/day05_log
stats for 05/Sep/2002 (webtime_thu) 1919867
hits, 6070817465 bytes, 8 ignored
./web_time_size.pl 06 fri lt ../LOGS/____/day06_log
stats for 06/Sep/2002 (webtime_fri) 2092395
hits, 6398278966 bytes, 8 ignored
./web_time_size.pl 07 sat lt ../LOGS/____/day07_log
stats for 07/Sep/2002 (webtime_sat) 797078
hits, 3377811037 bytes, 7 ignored
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
16Domain-Analysis
- Anzahl HTTP-Hits pro Tag
- Auszug der Tabelle mit HTTP-Hits and Grösse
- Grössenverteilung der HTTP-Hits in 1 KB-Schritten
(bis 50 KB) - Grössenverteilung der HTTP-Hits in 25
Byte-Schritten (bis 1 KB) - Grössenverteilung der HTTP HEAD-Hits in 25
Byte-Schritten (bis 500 Byte) - Top 30 der AS-Nummern (nach Datenverkehr),
Arbeitszeit - Top 30 der AS-Nummern (nach Datenverkehr),
Freizeit - Rangliste der verwendeten Internet-Browser
- Zusammengefasste Rangliste der Internet-Browser
- Rangliste der Internet-Browser (Arbeitszeit)
- Zusammengefasste Rangliste der Internet-Browser
(Arbeitszeit) - Rangliste der Internet-Browser (Freizeit)
- Zusammengefasste Rangliste der Internet-Browser
(Freizeit) - Rangliste der eingesetzten Client-Betriebssysteme
- Zusammengefasste Rangliste der Client-Betriebssyst
eme - Rangliste der Client-Betriebssysteme
(Arbeitszeit) - Zusammengefasste Rangliste der Client-Betriebssyst
eme (Arbeitszeit) - Rangliste der Client-Betriebssysteme (Freizeit)
- Zusammengefasste Rangliste der Client-Betriebssyst
eme (Freizeit)
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
17HTTP Traffic per IP during Work time
Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
18HTTP Traffic per AS-Number during office time
Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
HTTP Traffic per AS-Number during leisure time
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
19Internet browser market parts
Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
Internet browser market parts office time /
leisure time
- Final Conclusion
- Next steps
- Questions?
20Domain-Analysis
Task description
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
21Domain-Analysis
1
Scanserver Concept
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
tcpdump -i fxp1 -n -s 128 -w - tcp gzip gt
/home/dump/dump1.gz
22Data collection
Domain-Analysis
- Procedure
- Results
http//tagi.ch, facts.ch, sonntagszeitung.ch30.
Sep. 02. Okt.17 GB Data
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
http//20min.ch13. Okt. 15. Okt.10 GB Data
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
ftp//sunsite.cnlab-switch.ch14. Okt. 15.
Okt.100 GB Data
- Next steps
- Questions?
23Analysis Part 1 SYN Packet Parameters
Domain-Analysis
- Procedure
- Results
222739.674763 212.249.x.x.14958 gt 62.12.x.x.80
. tcp sum ok ack 1757957403 win 17520 (DF) (ttl
120, id 785, len 40) 222739.675231 62.12.x.x.80
gt 212.249.x.x.14958 . 17579588631757960323(1460)
ack 846720167 win 6432 (DF) tos 0x8 (ttl 64,
id 55554, len 1500) 222739.675354 62.12.x.x.80
gt 212.249.x.x.14958 . 17579603231757961783(1460)
ack 846720167 win 6432 (DF) tos 0x8 (ttl 64,
id 55555, len 1500) 222739.675478 62.12.x.x.80
gt 212.249.x.x.14958 . 17579617831757963243(1460)
ack 846720167 win 6432 (DF) tos 0x8 (ttl 64,
id 55556, len 1500) 222739.682815
62.46.x.x.57449 gt 62.12.x.x.80 . tcp sum ok
ack 1762078249 win 8760 (DF) (ttl 113, id 50172,
len 40) 222739.690369 212.249.x.x.14958 gt
62.12.x.x.80 . tcp sum ok ack 1757960323 win
17520 (DF) (ttl 120, id 786, len
40) 222739.690917 62.12.x.x.80 gt
212.249.x.x.14958 P 17579632431757964703(1460)
ack 846720167 win 6432 (DF) tos 0x8 (ttl 64,
id 55557, len 1500)
E-Mail-Analysis
while (line ltDUMPgt) if (line
/\(DF\)/) df_set else
df_unset if (line /sackOK/)
sackok_set else sackok_unset
if (line /timestamp \d/)
timestamp_set else
timestamp_unset if (line /wscale
(\d)/) wscale_set wscale1
else wscale_unset if (line /\tos
(....)\/) ecn_set ecn1
else ecn_unset if (line /ttl
(\d)/) ttl1 if (line /win
(\d)/) win1 if (line /mss
(\d)/) mss1
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
24Analysis Part 2 asstat
Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
25Analysis Part 3 tcptrace
Domain-Analysis
- Procedure
TCP connection 105596 host kzjs
80.218.60.973074 host kzjt
62.12.131.7780 complete conn yes
first packet Mon Oct 14 014514.941596 2002
last packet Mon Oct 14 014519.904893
2002 elapsed time 00004.963297
total packets 49 filename
dump-small kzjs-gtkzjt
kzjt-gtkzjs total packets 18
total packets 31 ack pkts
sent 17 ack pkts sent
31 pure acks sent 14
pure acks sent 3
sack pkts sent 0 sack pkts
sent 0 max sack blks/ack
0 max sack blks/ack 0
unique bytes sent 997
unique bytes sent 30796 actual
data pkts 2 actual data pkts
27 actual data bytes
997 actual data bytes 34427
mss requested 1460 bytes mss
requested 1460 bytes max segm
size 628 bytes max segm size
1460 bytes min segm size 369
bytes min segm size 6 bytes
avg segm size 498 bytes avg segm
size 1275 bytes max win adv
8760 bytes max win adv 8164
bytes min win adv 7300 bytes
min win adv 5840 bytes zero win
adv 0 times zero win adv
0 times avg win adv 8528
bytes avg win adv 7195 bytes
max owin 629 bytes max owin
4381 bytes min non-zero owin
1 bytes min non-zero owin 1
bytes ...
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
26Domain-Analysis
What comes next ?
- Procedure
- Results
E-Mail-Analysis
1
- Procedure
Public Documentation
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
27Domain-Analysis
- Procedure
- Results
E-Mail-Analysis
- Procedure
- Results
Web-Analysis
- Procedure
- Results
TCP-Analysis
- Procedure
- Results
Conclusion
- Final Conclusion
- Next steps
- Questions?
http//www.swiss-internet-analysis.org