May 29, 2018

Musings while surfing the web on 16 Kbit/s

Undocumented wiring made a technician cut my apartment DSL when connecting a new neighbor. I didn't know yet which basement cabinet to peek into and only later on read upon how a digital multimeter can help measure the line connectivity. So with a weekend to look forward to the technician appointment I quickly burned through a smallish mobile data plan, discovering the bandwith cap after depletion to be 2 Kbytes/s, putting me right back into the early 00s in my personal account of time. That was right when exchanging mIRC for xchat and winamp for xmms, talking trash on IRC and listening to sweet but nowadays questionable electronic music. Until 2004 I used to surf on 8 Kbytes/s with ISDN and at a double in duplex mode in the nights. My parental household managed to do so until 4G LTE came around almost a decade later.

CLI http browser compression difference

Adapting to the remote desert and feeling like a Voyager spaceprobe uncovered a significant difference in popular text cli browsers: lynx and w3m did at time of writing not advertise HTTP/1.1 compatability, making the default debian Nginx gzip configuration not respond with compressed content. At first I was blind to the protocol difference in the accesslog and searched exhaustively for lacking compiletime flags around zlib in those two clients. For compression can make a difference of many seconds at low bandwith and cpu should be abundant at both ends. links got the gzipped version with 1388 instead of 4743 bytes, what translates to probably having the content in 1 instead of 3 seconds.

"GET / HTTP/1.1" 200 1388 "-" "Links (2.12; GNU C 5.2.1; text)"
"GET / HTTP/1.0" 200 4743 "-" "Lynx/2.8.9dev.8 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/3.4.9"
"GET / HTTP/1.0" 200 4743 "-" "w3m/0.5.3+git20151119"

after changing the nginx config to gzip_http_version 1.0;, all clients get a gzipped version of course

"GET / HTTP/1.1" 200 1388 "-" "Links (2.12; GNU C 5.2.1; text)"
"GET / HTTP/1.0" 200 1376 "-" "Lynx/2.8.9dev.8 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/3.4.9"
"GET / HTTP/1.0" 200 1376 "-" "w3m/0.5.3+git20151119"

You can read the opinion in this Serverfault comment if Keep-Alive is worth it to not send compressed answers to HTTP/1.0 clients.

Simulating low bandwith

If you wonder how surfing feels with links on low bandwith connections, tc can do traffic control. I'll quote the heart of a script that mostly worked for me:

down=16kbit; up=16kbit; if=eno1; ip="";

tc qdisc add dev $if root handle 1: htb default 30
tc class add dev $if parent 1: classid 1:1 htb rate $down
tc class add dev $if parent 1: classid 1:2 htb rate $up

# filter the intended interface, see `man tc-u32` for more

uni32="tc filter add dev $if protocol ip parent 1:0 prio 1 u32"
$uni32 match ip dst $if/32 flowid 1:1
$uni32 match ip src $if/32 flowid 1:2

# reset
tc qdisc del dev $if root has another introduction, explaining what that htb bit ("Hierarchy Token Bucket") is about.

Bandwith savings survey

How to estimate potential bandwith savings between using w3m/lynx and links on the "top n" sites - just for the sake of it? offers its historical data via Googles BigQuery and in SQL dumps. In the summary table runs a column for compression savings can be found, and detailed crawling data is in the complete har-type recordings with Accept-Encoding and even http-version json data. The organisation crawls via a private For my personal quest the data does not hold much insight as I can't compare HTTP 1.0 to 1.1 requests side-by-side. Still, here's an exploratory query and one that will show potential summary gzip savings. Now I should create my own run on a day of browser history, comparing HTTP/1.0 to 1.1 compatability. I found Apache Nutch to be an alternative crawler to using the approach.

select reqHttpVersion, respHttpVersion, status, req_accept_encoding,
       resp_content_encoding, resp_content_length, resp_content_type, _gzip_save, url
  from requests
 where status = '200'
   and resp_content_type like 'text/html%'
 limit 1000;
select count(1) as req_num,
       sum(_gzip_save) as savings,
       replace(substring_index(substring_index(url, '/', 3), '/', -1), 'www.', '') as domain 
  from requests
 where status = '200'
   and resp_content_type like 'text/html%'
group by domain
order by savings desc, req_num desc;

Other protocols

Still surfing sand dunes in the desert, I looked up which Email protocols offer native compression:

protocol type
IMAP yes¹
POP no²

IMAP even allows to not download attachments. You can also use ssh -C with tunneling to compress arbitrary protocols if you can login on the remote.



Useful commands for HTTP/1.0 /1.1 debugging:

curl -v --http1.0 -H 'Accept-Encoding: gzip, deflate, compress, br, x-compress, x-gzip' --location --max-redirs 2 --user-agent 'http-header accept-encoding internet survey' '' >/dev/null
w3m -dump_head
w3m -show-option
w3m -o accept_encoding=gzip -header "Accept-Encoding: gzip"
lynx -verbose
cat lynx.cfg