ConceptDoppler: A Weather Tracker for Internet Censorship - PowerPoint PPT Presentation

About This Presentation
Title:

ConceptDoppler: A Weather Tracker for Internet Censorship

Description:

Joint work with Jedidiah R. Crandall, Michael Byrd, Earl Barr, and Rich East ... 'Fahrenheit 451,' Ray Bradbury. Thousands more? Suppression ' ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 55
Provided by: Jed68
Learn more at: https://www.cs.unm.edu
Category:

less

Transcript and Presenter's Notes

Title: ConceptDoppler: A Weather Tracker for Internet Censorship


1
ConceptDoppler A Weather Tracker for Internet
Censorship
  • Daniel Zinn
  • Joint work with Jedidiah R. Crandall, Michael
    Byrd, Earl Barr, and Rich East

2
Censorship is Not New
Tagesschau Western Germany
Aktuelle Kamera Eastern Germany
3
Chinas Internet Usage will Probably Surpass the
US Soon
4
Internet Censorship in China
  • Called the Great Firewall of China, or Golden
    Shield
  • IP address blocking
  • DNS redirection
  • Legal restrictions
  • etc
  • Keyword filtering
  • Blog servers, chat, HTTP traffic

All probing was performed from outside of China
5
Why is Keyword Filtering Interesting?
  • Chinese government claims to be targeting
    pornography and sedition
  • The keywords provide insights into what material
    the government is targeting with censorship, e.g.
  • ???? --- Dictatorship organs
  • ??? (Hitler), and ???? (Mein Kampf)
  • ??? --- Deauville, a town in France

6
Outline
  • Firewall or Something Else?
  • Where are filtering routers?
  • Who is doing filtering?
  • How reliable is filtering?
  • Blocked Words
  • Which words to select?
  • Which words are blocked?
  • Imprecise Filtering
  • What implications does keyword filtering have?

7
Outline
  • Firewall or Something Else?
  • Where are filtering routers?
  • Who is doing filtering?
  • How reliable is filtering?
  • Blocked Words
  • Which words to select?
  • Which words are blocked?
  • Imprecise Filtering
  • What implications does keyword filtering have?

8
Firewall?
???
?????
??
9
Where Are Filtering Routers
  • Different opinions about where censorship occurs
  • In three big centers in Beijing, Guangzhou, and
    Shanghai
  • At the border
  • Throughout the countrys backbone
  • At a local level
  • An amalgam of the above

10
Filtering With Forged RSTs
  • Clayton et al., 2006.
  • Comcast also uses forged RSTs

Example
11
Dissident Nuns on the Net
ltHTTPgt lt/HTTPgt
GET falun.html
12
Censorship of HTML GET Requests
RST
RST
GET falun.html
13
Censorship of HTML Responses
ltHTTPgt falun
RST
RST
GET hello.html
14
Locating Filtering Routers
ICMP Error
TTL1 falun
15
Locating Filtering Routers
ICMP Error
TTL1 falun
RST
RST
TTL2 falun
16
ConceptDoppler Framework
  • Netfilter (iptables) to capture packets
  • Queue module to handle packets over to user-space
  • Own TCP stack implementation
  • Scapy for constructing custom packets
  • Storing packets in PostgreSQL database
  • Scapy stored procedures in DB

17
Experimental Setup
  • Google site.cn to find random destination
    sites in China
  • Performed TTL-Modulation Experiment
  • Traceroute immediately before blocking test
  • Whois to query ISPs
  • Probed over a two-week period
  • Result Where are the GFC routers? Which ISP?

18
Hops into China Where Filtering Occurs
  • 28 of paths were never filtered over two
    weeks of probing

19
First Hops
  • ChinaNET performed 99.1 of all filtering at the
    first hop (and 83 of all filtering)

20
Outline
  • Firewall or Something Else?
  • Where are filtering routers?
  • Who is doing filtering?
  • How reliable is filtering?
  • Blocked Words
  • Which words to select?
  • Which words are blocked?
  • Imprecise Filtering
  • What implications does keyword filtering have?

21
Slipping Words Through - Diurnal Pattern
Repeat While Falun is not blocked
green red While Test is blocked
wait Forever
22
Slipping Words Through -Diurnal Pattern
Probes
Time ( 0 3pm in Bejing)
23
Firewall?
?????
???
?????
???
??
??
24
Panopticon!
?????
???
  • Imperfect filtering
  • Not strictly at the border
  • Promotes self-censorship
  • Good enough
  • Defeating a Panopticon is different than
    defeating a firewall

??
25
Outline
  • Firewall or Something Else?
  • Where are filtering routers?
  • Who is doing filtering?
  • How reliable is filtering?
  • Blocked Words
  • Which words to select?
  • Which words are blocked?
  • Imprecise Filtering
  • What implications does keyword filtering have?

26
Latent Semantic Analysis (LSA)
  • Deerwester et al., 1988
  • Document summary technique to find relationships
    between documents and words
  • Based on co-occurrence of words in a collection
    of documents

What to use as corpus?
27
Chinese Version of Wikipedia!
28
LSA of Chinese Wikipedia
  • n94863 documents and m942033 terms
  • tf-idf weighting
  • Matrix probably has rank r where kltrltnltm
  • Implicit assumption that Wikipedia authors add
    additive Gaussian noise
  • SVD and rank reduction to rank k

29
10 2 Seed Concepts
30
Words correlated with???? June 4th Events
  • 1 ???? June 4th Events
  • 2 ??????????? - Chongqing high family garden
    Jialing River bridge
  • 3 ???? - Yu Fulo (related to Chinese Eastern
    Han Dynasty)
  • 4 ??? - Li Jianliang
  • 5 ????? - Gaoxiong event (violent political
    event 1979)
  • 6 ??? - Zhao Ziyang (Name, related to China
    travel logistics)
  • 7 ??? - United front activities department
  • 8 ??? - Chen Bingde
  • 9 ????????????????? - Los Angeles Angels of
    Anaheim ..
  • 10 ??? - Li Tielin (Government official)
  • 11 ??? - Deng Liqun (Chinese politician)
  • 12 ???? - Chinese politics
  • 13 ????? - The Chinese Communist Party 14th
  • 14 ???? - Reform and open policy
  • 15 ?? - The newspaper endures
  • . to 2500

31
Efficient Probing
Epoch Times
Random Words
Blocked words
Blocked words
250-word-bins
250-word-bins
4
37
vs.
32
Blocked Words (122 discovered)
  • Pornography
  • ?? --- Pornography
  • ????? --- Virgin prostitution law case
  • Politics
  • ???? --- Crime against humanity
  • ?? --- Dictatorship (party), also ????, ??, ????,
    ??
  • ???? --- Red Terror
  • ???? --- June 4th events (1989 Tiananmen Square
    protests)
  • ?? --- Tibet Independence Movement
  • Others
  • ?? --- Block
  • ???? --- (Qinghai) Qiaotou power plant
  • ????????? --- Ludovico Ariosto

33
Outline
  • Firewall or Something Else?
  • Where are Filtering Routers?
  • Who is doing Filtering?
  • How Reliable is Filtering?
  • Blocked Words
  • Which words to select?
  • Which words are blocked?
  • Imprecise Filtering
  • What implications does keyword filtering have?

34
Imprecise Filtering
Because ?? (Sounds like Falun Gong) ?? (student
federation) ?? (multidimensional)
  • Filtered are
  • ???-????? (Nordrhein-Westfalen German state)
  • ????????? (International geological scientific
    federation)
  • ????????? (Ludovico Ariosto Italian Poet)

35
Keyword-based Censorship
  • Censor the Wounded Knee Massacre in the Library
    of Congress
  • Remove Bury my Heart at Wounded Knee and a few
    other select books?
  • Remove every book containing the keyword
    massacre in its text?

36
Massacre
  • Dantes Inferno
  • The War of the Worlds by H. G. Wells
  • King Richard III, and King Henry VI,
    Shakespeare
  • Adventures of Tom Sawyer, Mark Twain
  • Jack London, Son of the Sun, The
    Acorn-planter, The House of Pride
  • Thousands more

37
More Imprecision
  • The Economic Consequences of the Peace, John
    Maynard Keynes
  • The U.S. Constitution
  • Origin of Species, by Charles Darwin
  • Computer Organization and Design, P. H.
  • Virtually every book about World War II
  • White Fang, The Sea Wolf, and The Call of
    the Wild, Jack London
  • Crime against humanity
  • Dictatorship
  • Suppression
  • Block
  • Hitler
  • Strike

Hypothetical?
38
Actually Blocked
39
Future Work
  • ConceptDoppler A Censorship Weather Report
    What words are censored today?
  • Track the blacklist over a period of time, to
    correlate with current events
  • Named entity extraction, online learning
  • Scale up (bigger corpus, more words, advanced
    document summary techniques)

40
Future Work
  • What are the effects of keyword filtering?
  • What content is being targeted?
  • What content is collateral damage due to
    imprecise filtering?
  • Where exactly is filtering implemented?
  • More sources
  • Topological considerations
  • IP tunneling, IPv6, IXPs,

41
Conclusions
  • Firewall vs. Panopticon
  • GFC implemented mostly at the borders by
    Chinanet, but also inner routers do filter
  • Filtering is NOT reliable
  • Routes without GFC routers
  • Slip through during busy periods of the day
  • Blocked words
  • Blocked more than pornography and sedition
  • LSA can help to increase probing efficiency
  • Imprecise Filtering
  • You block a whole lot more than you probably want
    to

42
Questions?
Thank You.
http//www.conceptdoppler.org
43
Unsponsored add University of New Mexico CS
dept. is hiring for 2 junior level positions and
1 senior level position.
44
Thanks, Jed michael!
45
0 is 3pm in Beijing
Graph again? -- colored?
46
Crime against humanity
  • The Economic Consequences of the Peace, John
    Maynard Keynes
  • Thousands more?

47
Dictatorship
  • The U.S. Constitution
  • Thousands more?

48
Traitor
  • Fahrenheit 451, Ray Bradbury
  • Thousands more?

49
Suppression
  • Origin of Species, by Charles Darwin
  • Thousands more?

50
Block
  • Computer Organization and Design, Patterson and
    Hennessy
  • Artificial Intelligence 4th Edition, George F.
    Luger
  • Millions more?

51
Hitler
  • Virtually every book about World War II

52
Strike
  • White Fang, The Sea Wolf, and The Call of
    the Wild, Jack London
  • Millions more?

53
Outline
  • Implications of Imprecise Filtering
  • What are consequences of key-word-based
    filtering?
  • Panopticon vs. Firewall
  • How is filtering implemented?
  • Where is filtering implemented?
  • How reliable is filtering?
  • Blocking Words
  • How to efficiently discover blocked words?
  • What words are blocked?

54
Outline
  • Firewall or Something Else?
  • Where are Filtering Routers?
  • Who is doing Filtering?
  • How Reliable is Filtering?
  • Blocked Words
  • Which words to select?
  • Which words are blocked?
  • Imprecise Filtering
  • What implications has keyword filtering?

55
Outline
  • Implications of Imprecise Filtering
  • What are consequences of key-word-based
    filtering?
  • Panopticon vs. Firewall
  • How is filtering implemented?
  • Where is filtering implemented?
  • How reliable is filtering?
  • Blocking Words
  • How to efficiently discover blocked words?
  • What words are blocked?

56
Outline
  • Implications of Imprecise Filtering
  • What are consequences of key-word-based
    filtering?
  • Panopticon vs. Firewall
  • How is filtering implemented?
  • Where is filtering implemented?
  • How reliable is filtering?
  • Blocking Words
  • How to efficiently discover blocked words?
  • What words are blocked?

57
Latent Semantic Analysis (LSA)
  • Deerwester et al., 1988
  • Uses a large corpus of documents to analyze
    relationships between documents and terms
  • In a Nutshell
  • Jack goes up a hill, Jill stays behind this time
  • B is 8 Furlongs away from C
  • C is 5 Furlongs away from A
  • B is 5 Furlongs away from A

58
LSA in a Nutshell
A
5 5
B
C
8
59
Latent Semantic Analysis (LSA)
  • A, B, and C are all three on a straight, flat,
    level road.

60
LSA in a Nutshell
9
B
C
A
4.5 4.5
61
This Research has Two Parts
  • Where is the keyword filtering implemented?
  • Internet measurement techniques to locate the
    filtering routers
  • What words are being censored?
  • Efficient probing via document summary techniques

62
Firewall?
?????
???
?????
???
??
??
63
Outline
  • Why is keyword filtering interesting?
  • How does keyword filtering work?
  • Where in the Chinese Internet is it implemented?
  • How can we reverse-engineer the blacklist of
    keywords?

64
Outline
  • Why is keyword filtering interesting?
  • How does keyword filtering work?
  • Where in the Chinese Internet is it implemented?
  • How can we reverse-engineer the blacklist of
    keywords?

65
Outline
  • Why is keyword filtering interesting?
  • How does keyword filtering work?
  • Where in the Chinese Internet is it implemented?
  • How can we reverse-engineer the blacklist of
    keywords?

66
Outline
  • Why is keyword filtering interesting?
  • How does keyword filtering work?
  • Where in the Chinese Internet is it implemented?
  • How can we reverse-engineer the blacklist of
    keywords?

67
Rumors
  • The undisclosed aim of the Bureau of Internet
    Monitoringwas to use the excuse of information
    monitoring to lease our bandwidth with extremely
    low prices, and then sell the bandwidth to
    business users with high prices to reap lucrative
    profits.
  • ---a hacker named sinister

68
Rumors
  • At the recent World Economic Forum in Davos,
    Switzerland, Sergey Brin, Google's president of
    technology, told reporters that Internet policing
    may be the result of lobbying by local
    competitors.
  • ---Asia Times, 13 February 2007

69
(No Transcript)
70
Outline
  • Why is keyword filtering interesting?
  • How does keyword filtering work?
  • Where in the Chinese Internet is it implemented?
  • How can we reverse-engineer the blacklist of
    keywords?

71
(No Transcript)
72
More rumors
  • If someone is shouting bad things about me from
    outside my window, I have the right to close that
    window.
  • ---Li Wufeng

73
Conclusions
  • GFC ? Firewall
  • GFC Panopticon
  • With lots of computation/analysis here and a
    little bit of probing of the Chinese Internet, we
    can determine
  • What content is being targeted with keyword-based
    censorship?
  • What are the unintended consequences of
    keyword-based censorship?

74
Outline
  • Implications of Imprecise Filtering
  • What are consequences of key-word-based
    filtering?
  • Panopticon vs. Firewall
  • How is filtering implemented?
  • Where is filtering implemented?
  • How reliable is filtering?
  • Blocking Words
  • How to efficiently discover blocked words?
  • What words are blocked?

75
Same Graph, Different Scale
Blocked Paths
Unique Paths
Depth into China
76
TTL Tomfoolery
ICMP Error
TTL1
77
How traceroute Works
TTL2
TTL3
ICMP Error
TTL1
TTL4
Write a Comment
User Comments (0)
About PowerShow.com