Title: ConceptDoppler: A Weather Tracker for Internet Censorship
1ConceptDoppler A Weather Tracker for Internet
Censorship
- Daniel Zinn
- Joint work with Jedidiah R. Crandall, Michael
Byrd, Earl Barr, and Rich East
2Censorship is Not New
Tagesschau Western Germany
Aktuelle Kamera Eastern Germany
3Chinas Internet Usage will Probably Surpass the
US Soon
4Internet Censorship in China
- Called the Great Firewall of China, or Golden
Shield - IP address blocking
- DNS redirection
- Legal restrictions
- etc
- Keyword filtering
- Blog servers, chat, HTTP traffic
All probing was performed from outside of China
5Why is Keyword Filtering Interesting?
- Chinese government claims to be targeting
pornography and sedition - The keywords provide insights into what material
the government is targeting with censorship, e.g. - ???? --- Dictatorship organs
- ??? (Hitler), and ???? (Mein Kampf)
- ??? --- Deauville, a town in France
6Outline
- Firewall or Something Else?
- Where are filtering routers?
- Who is doing filtering?
- How reliable is filtering?
- Blocked Words
- Which words to select?
- Which words are blocked?
- Imprecise Filtering
- What implications does keyword filtering have?
7Outline
- Firewall or Something Else?
- Where are filtering routers?
- Who is doing filtering?
- How reliable is filtering?
- Blocked Words
- Which words to select?
- Which words are blocked?
- Imprecise Filtering
- What implications does keyword filtering have?
8Firewall?
???
?????
??
9Where Are Filtering Routers
- Different opinions about where censorship occurs
- In three big centers in Beijing, Guangzhou, and
Shanghai - At the border
- Throughout the countrys backbone
- At a local level
- An amalgam of the above
10Filtering With Forged RSTs
- Clayton et al., 2006.
- Comcast also uses forged RSTs
Example
11Dissident Nuns on the Net
ltHTTPgt lt/HTTPgt
GET falun.html
12Censorship of HTML GET Requests
RST
RST
GET falun.html
13Censorship of HTML Responses
ltHTTPgt falun
RST
RST
GET hello.html
14Locating Filtering Routers
ICMP Error
TTL1 falun
15Locating Filtering Routers
ICMP Error
TTL1 falun
RST
RST
TTL2 falun
16ConceptDoppler Framework
- Netfilter (iptables) to capture packets
- Queue module to handle packets over to user-space
- Own TCP stack implementation
- Scapy for constructing custom packets
- Storing packets in PostgreSQL database
- Scapy stored procedures in DB
17Experimental Setup
- Google site.cn to find random destination
sites in China - Performed TTL-Modulation Experiment
- Traceroute immediately before blocking test
- Whois to query ISPs
- Probed over a two-week period
- Result Where are the GFC routers? Which ISP?
18Hops into China Where Filtering Occurs
- 28 of paths were never filtered over two
weeks of probing
19First Hops
- ChinaNET performed 99.1 of all filtering at the
first hop (and 83 of all filtering)
20Outline
- Firewall or Something Else?
- Where are filtering routers?
- Who is doing filtering?
- How reliable is filtering?
- Blocked Words
- Which words to select?
- Which words are blocked?
- Imprecise Filtering
- What implications does keyword filtering have?
21Slipping Words Through - Diurnal Pattern
Repeat While Falun is not blocked
green red While Test is blocked
wait Forever
22Slipping Words Through -Diurnal Pattern
Probes
Time ( 0 3pm in Bejing)
23Firewall?
?????
???
?????
???
??
??
24Panopticon!
?????
???
- Imperfect filtering
- Not strictly at the border
- Promotes self-censorship
- Good enough
- Defeating a Panopticon is different than
defeating a firewall
??
25Outline
- Firewall or Something Else?
- Where are filtering routers?
- Who is doing filtering?
- How reliable is filtering?
- Blocked Words
- Which words to select?
- Which words are blocked?
- Imprecise Filtering
- What implications does keyword filtering have?
26Latent Semantic Analysis (LSA)
- Deerwester et al., 1988
- Document summary technique to find relationships
between documents and words - Based on co-occurrence of words in a collection
of documents
What to use as corpus?
27Chinese Version of Wikipedia!
28LSA of Chinese Wikipedia
- n94863 documents and m942033 terms
- tf-idf weighting
- Matrix probably has rank r where kltrltnltm
- Implicit assumption that Wikipedia authors add
additive Gaussian noise - SVD and rank reduction to rank k
2910 2 Seed Concepts
30Words correlated with???? June 4th Events
- 1 ???? June 4th Events
- 2 ??????????? - Chongqing high family garden
Jialing River bridge - 3 ???? - Yu Fulo (related to Chinese Eastern
Han Dynasty) - 4 ??? - Li Jianliang
- 5 ????? - Gaoxiong event (violent political
event 1979) - 6 ??? - Zhao Ziyang (Name, related to China
travel logistics) - 7 ??? - United front activities department
- 8 ??? - Chen Bingde
- 9 ????????????????? - Los Angeles Angels of
Anaheim .. - 10 ??? - Li Tielin (Government official)
- 11 ??? - Deng Liqun (Chinese politician)
- 12 ???? - Chinese politics
- 13 ????? - The Chinese Communist Party 14th
- 14 ???? - Reform and open policy
- 15 ?? - The newspaper endures
- . to 2500
31Efficient Probing
Epoch Times
Random Words
Blocked words
Blocked words
250-word-bins
250-word-bins
4
37
vs.
32Blocked Words (122 discovered)
- Pornography
- ?? --- Pornography
- ????? --- Virgin prostitution law case
- Politics
- ???? --- Crime against humanity
- ?? --- Dictatorship (party), also ????, ??, ????,
?? - ???? --- Red Terror
- ???? --- June 4th events (1989 Tiananmen Square
protests) - ?? --- Tibet Independence Movement
- Others
- ?? --- Block
- ???? --- (Qinghai) Qiaotou power plant
- ????????? --- Ludovico Ariosto
33Outline
- Firewall or Something Else?
- Where are Filtering Routers?
- Who is doing Filtering?
- How Reliable is Filtering?
- Blocked Words
- Which words to select?
- Which words are blocked?
- Imprecise Filtering
- What implications does keyword filtering have?
34Imprecise Filtering
Because ?? (Sounds like Falun Gong) ?? (student
federation) ?? (multidimensional)
- Filtered are
- ???-????? (Nordrhein-Westfalen German state)
- ????????? (International geological scientific
federation) - ????????? (Ludovico Ariosto Italian Poet)
35Keyword-based Censorship
- Censor the Wounded Knee Massacre in the Library
of Congress - Remove Bury my Heart at Wounded Knee and a few
other select books? - Remove every book containing the keyword
massacre in its text?
36Massacre
- Dantes Inferno
- The War of the Worlds by H. G. Wells
- King Richard III, and King Henry VI,
Shakespeare - Adventures of Tom Sawyer, Mark Twain
- Jack London, Son of the Sun, The
Acorn-planter, The House of Pride - Thousands more
37More Imprecision
- The Economic Consequences of the Peace, John
Maynard Keynes - The U.S. Constitution
- Origin of Species, by Charles Darwin
- Computer Organization and Design, P. H.
- Virtually every book about World War II
- White Fang, The Sea Wolf, and The Call of
the Wild, Jack London
- Crime against humanity
- Dictatorship
- Suppression
- Block
- Hitler
- Strike
Hypothetical?
38Actually Blocked
39Future Work
- ConceptDoppler A Censorship Weather Report
What words are censored today? - Track the blacklist over a period of time, to
correlate with current events - Named entity extraction, online learning
- Scale up (bigger corpus, more words, advanced
document summary techniques)
40Future Work
- What are the effects of keyword filtering?
- What content is being targeted?
- What content is collateral damage due to
imprecise filtering? - Where exactly is filtering implemented?
- More sources
- Topological considerations
- IP tunneling, IPv6, IXPs,
41Conclusions
- Firewall vs. Panopticon
- GFC implemented mostly at the borders by
Chinanet, but also inner routers do filter - Filtering is NOT reliable
- Routes without GFC routers
- Slip through during busy periods of the day
- Blocked words
- Blocked more than pornography and sedition
- LSA can help to increase probing efficiency
- Imprecise Filtering
- You block a whole lot more than you probably want
to
42Questions?
Thank You.
http//www.conceptdoppler.org
43Unsponsored add University of New Mexico CS
dept. is hiring for 2 junior level positions and
1 senior level position.
44Thanks, Jed michael!
450 is 3pm in Beijing
Graph again? -- colored?
46Crime against humanity
- The Economic Consequences of the Peace, John
Maynard Keynes - Thousands more?
47Dictatorship
- The U.S. Constitution
- Thousands more?
48Traitor
- Fahrenheit 451, Ray Bradbury
- Thousands more?
49Suppression
- Origin of Species, by Charles Darwin
- Thousands more?
50Block
- Computer Organization and Design, Patterson and
Hennessy - Artificial Intelligence 4th Edition, George F.
Luger - Millions more?
51Hitler
- Virtually every book about World War II
52Strike
- White Fang, The Sea Wolf, and The Call of
the Wild, Jack London - Millions more?
53Outline
- Implications of Imprecise Filtering
- What are consequences of key-word-based
filtering? - Panopticon vs. Firewall
- How is filtering implemented?
- Where is filtering implemented?
- How reliable is filtering?
- Blocking Words
- How to efficiently discover blocked words?
- What words are blocked?
54Outline
- Firewall or Something Else?
- Where are Filtering Routers?
- Who is doing Filtering?
- How Reliable is Filtering?
- Blocked Words
- Which words to select?
- Which words are blocked?
- Imprecise Filtering
- What implications has keyword filtering?
55Outline
- Implications of Imprecise Filtering
- What are consequences of key-word-based
filtering? - Panopticon vs. Firewall
- How is filtering implemented?
- Where is filtering implemented?
- How reliable is filtering?
- Blocking Words
- How to efficiently discover blocked words?
- What words are blocked?
56Outline
- Implications of Imprecise Filtering
- What are consequences of key-word-based
filtering? - Panopticon vs. Firewall
- How is filtering implemented?
- Where is filtering implemented?
- How reliable is filtering?
- Blocking Words
- How to efficiently discover blocked words?
- What words are blocked?
57Latent Semantic Analysis (LSA)
- Deerwester et al., 1988
- Uses a large corpus of documents to analyze
relationships between documents and terms - In a Nutshell
- Jack goes up a hill, Jill stays behind this time
- B is 8 Furlongs away from C
- C is 5 Furlongs away from A
- B is 5 Furlongs away from A
58LSA in a Nutshell
A
5 5
B
C
8
59Latent Semantic Analysis (LSA)
- A, B, and C are all three on a straight, flat,
level road.
60LSA in a Nutshell
9
B
C
A
4.5 4.5
61This Research has Two Parts
- Where is the keyword filtering implemented?
- Internet measurement techniques to locate the
filtering routers - What words are being censored?
- Efficient probing via document summary techniques
62Firewall?
?????
???
?????
???
??
??
63Outline
- Why is keyword filtering interesting?
- How does keyword filtering work?
- Where in the Chinese Internet is it implemented?
- How can we reverse-engineer the blacklist of
keywords?
64Outline
- Why is keyword filtering interesting?
- How does keyword filtering work?
- Where in the Chinese Internet is it implemented?
- How can we reverse-engineer the blacklist of
keywords?
65Outline
- Why is keyword filtering interesting?
- How does keyword filtering work?
- Where in the Chinese Internet is it implemented?
- How can we reverse-engineer the blacklist of
keywords?
66Outline
- Why is keyword filtering interesting?
- How does keyword filtering work?
- Where in the Chinese Internet is it implemented?
- How can we reverse-engineer the blacklist of
keywords?
67Rumors
- The undisclosed aim of the Bureau of Internet
Monitoringwas to use the excuse of information
monitoring to lease our bandwidth with extremely
low prices, and then sell the bandwidth to
business users with high prices to reap lucrative
profits. - ---a hacker named sinister
68Rumors
- At the recent World Economic Forum in Davos,
Switzerland, Sergey Brin, Google's president of
technology, told reporters that Internet policing
may be the result of lobbying by local
competitors. - ---Asia Times, 13 February 2007
69(No Transcript)
70Outline
- Why is keyword filtering interesting?
- How does keyword filtering work?
- Where in the Chinese Internet is it implemented?
- How can we reverse-engineer the blacklist of
keywords?
71(No Transcript)
72More rumors
- If someone is shouting bad things about me from
outside my window, I have the right to close that
window. - ---Li Wufeng
73Conclusions
- GFC ? Firewall
- GFC Panopticon
- With lots of computation/analysis here and a
little bit of probing of the Chinese Internet, we
can determine - What content is being targeted with keyword-based
censorship? - What are the unintended consequences of
keyword-based censorship?
74Outline
- Implications of Imprecise Filtering
- What are consequences of key-word-based
filtering? - Panopticon vs. Firewall
- How is filtering implemented?
- Where is filtering implemented?
- How reliable is filtering?
- Blocking Words
- How to efficiently discover blocked words?
- What words are blocked?
75Same Graph, Different Scale
Blocked Paths
Unique Paths
Depth into China
76TTL Tomfoolery
ICMP Error
TTL1
77How traceroute Works
TTL2
TTL3
ICMP Error
TTL1
TTL4