Undocumented wiring made a technician cut my apartment DSL when connecting a new neighbor. I didn't know yet which basement cabinet to peek into and only later on read upon how a digital multimeter can help measure the line connectivity. So with a weekend to look forward to the technician appointment I quickly burned through a smallish mobile data plan, discovering the bandwith cap after depletion to be 2 Kbytes/s, putting me right back into the early 00s in my personal account of time. That was right when exchanging
xmms, talking trash on IRC and listening to sweet but nowadays questionable electronic music. Until 2004 I used to surf on 8 Kbytes/s with ISDN and at a double in duplex mode in the nights. My parental household managed to do so until 4G LTE came around almost a decade later.
CLI http browser compression difference
Adapting to the remote desert and feeling like a Voyager spaceprobe uncovered a significant difference in popular text cli browsers:
w3m did at time of writing not advertise HTTP/1.1 compatability, making the default debian Nginx gzip configuration not respond with compressed content. At first I was blind to the protocol difference in the accesslog and searched exhaustively for lacking compiletime flags around
zlib in those two clients. For compression can make a difference of many seconds at low bandwith and cpu should be abundant at both ends.
links got the gzipped version with 1388 instead of 4743 bytes, what translates to probably having the content in 1 instead of 3 seconds.
"GET / HTTP/1.1" 200 1388 "-" "Links (2.12; GNU C 5.2.1; text)" "GET / HTTP/1.0" 200 4743 "-" "Lynx/2.8.9dev.8 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/3.4.9" "GET / HTTP/1.0" 200 4743 "-" "w3m/0.5.3+git20151119"
after changing the nginx config to
gzip_http_version 1.0;, all clients get a gzipped version of course
"GET / HTTP/1.1" 200 1388 "-" "Links (2.12; GNU C 5.2.1; text)" "GET / HTTP/1.0" 200 1376 "-" "Lynx/2.8.9dev.8 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/3.4.9" "GET / HTTP/1.0" 200 1376 "-" "w3m/0.5.3+git20151119"
You can read the opinion in this Serverfault comment if Keep-Alive is worth it to not send compressed answers to HTTP/1.0 clients.
Simulating low bandwith
If you wonder how surfing feels with
links on low bandwith connections,
tc can do traffic control. I'll quote the heart of a script that mostly worked for me:
down=16kbit; up=16kbit; if=eno1; ip="192.168.0.2"; tc qdisc add dev $if root handle 1: htb default 30 tc class add dev $if parent 1: classid 1:1 htb rate $down tc class add dev $if parent 1: classid 1:2 htb rate $up # filter the intended interface, see `man tc-u32` for more uni32="tc filter add dev $if protocol ip parent 1:0 prio 1 u32" $uni32 match ip dst $if/32 flowid 1:1 $uni32 match ip src $if/32 flowid 1:2 # reset tc qdisc del dev $if root
Cyberciti.biz has another introduction, explaining what that
htb bit ("Hierarchy Token Bucket") is about.
Bandwith savings survey
How to estimate potential bandwith savings between using
links on the "top n" sites - just for the sake of it? httparchive.org offers its historical data via Googles BigQuery and in SQL dumps. In the summary table
runs a column for compression savings can be found, and detailed crawling data is in the complete
har-type recordings with Accept-Encoding and even http-version json data. The organisation crawls via a private webpagetest.org. For my personal quest the data does not hold much insight as I can't compare HTTP 1.0 to 1.1 requests side-by-side. Still, here's an exploratory query and one that will show potential summary gzip savings. Now I should create my own httparchive.org-like run on a day of browser history, comparing HTTP/1.0 to 1.1 compatability. I found Apache Nutch to be an alternative crawler to using the webpagetest.org approach.
select reqHttpVersion, respHttpVersion, status, req_accept_encoding, resp_content_encoding, resp_content_length, resp_content_type, _gzip_save, url from requests where status = '200' and resp_content_type like 'text/html%' limit 1000;
select count(1) as req_num, sum(_gzip_save) as savings, replace(substring_index(substring_index(url, '/', 3), '/', -1), 'www.', '') as domain from requests where status = '200' and resp_content_type like 'text/html%' group by domain order by savings desc, req_num desc;
Still surfing sand dunes in the desert, I looked up which Email protocols offer native compression:
IMAP even allows to not download attachments. You can also use
ssh -C with tunneling to compress arbitrary protocols if you can login on the remote.
- graphical remote browsing tool brow.sh fits my usecase squarely (reduced bandwith, having a remote server)
- Blogpost "Accept-Encoding vary important"
- Google Dev Documentation "Optimize Encoding and Transfer"
- the Accept-Encoding HTTP Header
- top 1M websites csv files
- on a tangent, the ecosystem of decentralized communication: scuttblebutt seems interesting for a time of being offline and syncing whenever having connectivity again
Useful commands for HTTP/1.0 /1.1 debugging:
curl -v --http1.0 -H 'Accept-Encoding: gzip, deflate, compress, br, x-compress, x-gzip' --location --max-redirs 2 --user-agent 'http-header accept-encoding internet survey' 'https://jify.de' >/dev/null
w3m -dump_head https://jify.de w3m -show-option w3m -o accept_encoding=gzip -header "Accept-Encoding: gzip" https://jify.de
lynx -verbose https://jify.de cat lynx.cfg