Late in 2011, a systems administrator noticed suspicious entries in his SSH log files. The payloads did not conform to the protocol—instead they were just long random-looking byte strings. Careful analysis of the log files revealed a pattern: IP addresses in China sent these strange payloads, and the triggering event was a genuine SSH login, by a real user, from a different Chinese IP address. The administrator concluded that the probes must be related to censorship by the Great Firewall of China (GFW) and moved on. His writeup of these events became the first public documentation of what we call active probing, a critical component in the real-time, versatile, and nation-scale traffic classification system commonly known as the “Great Firewall.”
Active probing is the most recent step in the ongoing arms race of Internet censorship. Users set up proxies to circumvent blocks; censors responded by identifying and blocking proxies by deep packet inspection (DPI); and circumventors made proxy protocols more difficult to detect in turn. Deprived of its capacity for easy, passive protocol identification, the censor now goes straight to the source and interrogates the server directly after it sees a potentially suspicious connection. The censor acts like a user by issuing its own connections to a suspected proxy server, as illustrated in the diagram to the right. If the server responds using a prohibited protocol, then the censor now takes some blocking action, such as adding its IP address to a blacklist.
In this research project, we improve on existing knowledge and study the following aspects of the GFW:
Our results show that the system operates in real-time, but suspends regularly for a short amount of time. It currently blocks at least five circumvention protocols and is upgraded regularly. We show that the system makes use of a vast amount of IP addresses, provide evidence that all these IP addresses are controlled by a central system, and we determined the location of the Great Firewall's sensors. We also publish our datasets and code to stimulate more research.
This material is based upon work supported in part by the National Science Foundation under grant nos. #1223717, #1518918, #1540066, and #1518882. This work was also supported in part by funding from the Open Technology Fund through the Freedom2Connect Foundation and from the US Department of State, Bureau of Democracy, Human Rights and Labor. The opinions in this work are those of the authors and do not necessarily reflect those of any funding agency or governmental organization.
Our research paper was presented at the Internet Measurement Conference 2015 in Tokyo, Japan. We also presented our work at the 32nd Chaos Communication Congress in Hamburg, Germany.
Examining How the Great Firewall Discovers Hidden Circumvention Servers [pdf, bib, IMC slides, 32C3 slides]
852ad06879d41b4614ad4e6f7658c371e16bcd27
git clone https://github.com/NullHypothesis/active-probing-tools.git
c245bb3c2f4b080a32878c192ca39a0c82adbc9d
git clone https://www.bamsoftware.com/git/active-probing.git
There are a few simple things you can do to check your own computer systems for evidence of active probing. Did you find something interesting? Let us know!
The IP address 202.108.181.70 is disproportionately involved in active probing (sending half of all probes in one study), for reasons we do not understand.
The pattern POST /vpnsvc/connect.cgi
indicates a
SoftEther probe. The pattern
GET /twitter.com
indicates an AppSpot probe.
Host
header.
An unexpected Host
header, especially one pointing to a subdomain of
appspot.com
, is possible evidence of an AppSpot probe. Your web server may not
log the Host
header by default. In Apache, you can enable
mod_log_forensic
to see request headers.
The obfs2 and obfs3 protocols look like random binary noise by design. They tend to stand out in application logs. For example, here is an obfs2 probe seen in an Apache log:
192.0.2.1 - - [13/Jul/2015:05:56:50 -0600] "\xba\xf4\xf1gy\x9e\xe7O9..." 400 0 "-" "-"
Try grepping your logs for escaped bytes. (Be aware that there may be many false positives;
for example \x16\x03
usually simply indicates a TLS connection to a non-TLS
port.)
grep '\\x' application.log
In the paper we describe a number of probe types that the GFW sends. Here are detailed probe payloads that we did not include in the paper for a lack of space.
The Great Firewall probes for Tor servers using a TLS connection containing a single
Tor VERSIONS
cell (see
Section 4.1 of the linked specification). The VERSIONS
cell declares support
for versions 1 and 2 of the Tor protocol. In hexadecimal, the payload is this:
00 00 07 00 04 00 01 00 02
The p0f TLS fingerprint of Tor probes is:
3.1:39,38,35,16,13,a,33,32,2f,5,ff:23:compr
Apart from a few anomalies such as occasionally repeated payloads, the active probers' implementation of obfs2 and obfs3 complies with the protocol specification (obfs2 spec, obfs3 spec). Because the protocols appear random by design, no single probe sample characterizes them. For a better understanding of how they work, see a visual explanation of obfs2 and a visual explanation of obfs3.
SoftEther probes resemble the HTTPS-based client handshake of SoftEther VPN, a multi-protocol VPN client.
POST /vpnsvc/connect.cgi HTTP/1.1 Connection: Keep-Alive Content-Length: 1972 Content-Type: image/jpeg GIF89a...
The value of the Content-Length
header may vary. In the official SoftEther
protocol, the Content-Length
reflects a random amount of padding following the
fixed part of the body. The body of the SoftEther probe we saw also included random padding,
but because we only recovered one example in full detail, we cannot say for sure whether the
length varies.
Despite the Content-Type
header, the POST body is a GIF image, not a JPEG,
1,411 bytes in size. In the SoftEther source code, the file is found in
src/Cedar/Watermark.c.
As an image, it looks like this:
The HTTPS request differs from that of the official SoftEther client. In July 2014, the
official client
added
a Host
header that is not reflected in the active probes. The probe's
p0f
TLS fingerprint is:
3.1:39,38,35,16,13,a,33,32,2f,5,4,15,12,9,14,11,8,6,3::compr
This differs from that of the official client, which in version 4.15 had the fingerprint:
3.1:c014,c00a,39,38,88,87,c00f,c005,35,84,c012,c008,16,13,c00d,c003,a,c013,c009,33,32,9a,99,45,44,c00e,c004,2f,96,41,c011,c007,c00c,c002,5,4,15,12,9,ff:?0,b,a,f:compr
The AppSpot probe type has taken on a few different forms. What they all have in common is a
special Host: webncsproxyXX.appspot.com
header, where
XX
is a two-digit number. We believe that this kind of request is
intended to discover unknown Google servers that are capable of providing access to a proxy
running on Google App Engine. The User-Agent
string is fairly distinctive,
reflecting a version of the Chromium web browser that was
current
for two weeks in April 2014. The User-Agent
is faked, as the rest of the
header does not match what that version of Chromium sends (for example, genuine Chromium
would send Accept-Encoding: gzip
).
Beginning on August 20, 2014, the AppSpot probe was a request for
/
:
GET / HTTP/1.1 Accept-Encoding: identity Connection: close Host: webncsproxyXX.appspot.com Accept: */* User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36
Between September 4, 2014 and March 3, 2015, the probe changed to request
/twitter.com
instead. (Such a request would cause the webncsproxy app to
display the twitter.com home page.)
GET /twitter.com HTTP/1.1 Accept-Encoding: identity Connection: close Host: webncsproxyXX.appspot.com Accept: */* User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36
From March 3, 2015, onward, the probe changed back to requesting /
:
GET / HTTP/1.1 Accept-Encoding: identity Connection: close Host: webncsproxyXX.appspot.com Accept: */* User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36
Starting on July 6, 2015, the probes come in pairs, separated by few seconds. The two probes
in a pair do not come from the same IP address, and the number in the Host
headers are different. The second probe has a shorter header.
GET / HTTP/1.1 Accept-Encoding: identity Connection: close Host: webncsproxyXX.appspot.com Accept: */* User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36
GET / HTTP/1.1 Accept: */* Content-Type: text/html Proxy-Connection: Keep-Alive Content-length: 0 Host: webncsproxyYY.appspot.com
The p0f TLS fingerprint of the AppSpot probes is
3.1:39,38,88,87,35,84,16,13,a,33,32,9a,99,45,44,2f,96,41,5,4,15,12,9,14,11,8,6,3,ff:23:compr
It differs markedly from the TLS fingerprint of the version of Chromium it purports to be:
3.2:c00a,c009,c013,c014,c007,c011,33,32,39,2f,35,a,5,4:?0,ff01,a,b,23,3374,10,7550,5,12:ver,rtime
If you have any questions or feedback, please get in touch with us!
Last updated: 2016-12-01