Apr 12, 2016

Simple Latency Graphs with the help of Curl

I was told answers of a http host were spotty at best when mass downloading images. Believing this to be unlikely I gathered some graphs for a crusade. I wanted to tiptoe into simple python plotting, to have it at hand for other purposes. The hosts didn't have monitoring of this type or in this detail. I used a basic mechanism, in that a curl script requests defined URL paths in intervals and measures timings. I was mostly interested in the response_code and if successful, its time_total. Metric is milliseconds. The test conditions and plots have shortcomings I need to attend to some time:

  • as there's no time axis, the interval is hidden
  • http response codes should be categorized, as non-200 status can occur
  • missing axis units
  • missing total count of measurements in legend

Values were gathered over 40 hours in 15min intervals with 38 urlpaths to test at once, what made north of 6k measurements. I checked if failing http requests exist before plotting. The distribution mostly hovers around 90-120 ms and in this case, the problem was ultimately of different origin.

Get metrics and plot

Place this script in the location noted in the comment and reference it in your /etc/crontab

1
2
3
4
5
6
7
8
#!/bin/bash
#
# crontab: */5 * * * * user /home/user/latency/measure.sh >> /home/user/latency/curl.log 2>&1

for url in $(cat /home/user/latency/urlpaths.csv); do \
    echo -ne "$(date "+%s")\t"; echo -ne "$(date "+%Y-%m-%d %H:%M:%S")\t"; \
    curl -L -w 'HTTP\t%{response_code}\tLookup:\t%{time_namelookup}\tConnect:\t%{time_connect}\tPretransfer:\t%{time_pretransfer}\tStarttransfer:\t%{time_starttransfer}\tTotal:\t%{time_total}\tUrl:\t%{url_effective}\n' -o /dev/null -s "$url";
done

After a few hours/days, filter this into a file that's used for plotting

cut -f14 ~/latency/curl.log > ~/latency/results.csv

get the neccessary python libs for plotting, either with pip or apt

apt install python3-seaborn python3-pandas python3-matplotlib

then create two .py scripts opening the .csv file: python3 seaborn.py ~/latency/results.csv

this plots a histogram showing the distribution

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/usr/bin/env python3

import sys
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

with open(sys.argv[1], newline='') as csvfile:

    data = pd.read_csv(csvfile, index_col=False, header=0)

    sns.set(color_codes=True)
    grid = sns.distplot(data, bins=2000, kde=False);
    grid.set(xscale='log')

    plt.savefig(filename='results-hist-log.png',dpi=150)

distribution-histogram-logarithmic

the "Burj Khalifa" among histograms.

plotting just the data points, with or without logarithmic y-scale

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#!/usr/bin/env python3

import sys
import pandas as pd
import matplotlib.pyplot as plt

with open(sys.argv[1], newline='') as csvfile:
    df = pd.read_csv(csvfile, index_col=None, header=0)

    # use point plot, no legend
    df.plot(style='o', legend=None)

    # get current axes and..
    ax = plt.gca()
    # set logarithmic y-axis
    ax.set_yscale('log')

    plt.savefig(filename='results-log.png',dpi=150)

dotplot-linear dotplot-logarithmic

PS: if it's hard to pinpoint why a service is down, check for nearby ATLAS Probes, activity on the outages mailing list and if somebody is angry at downdetector/allestörungen. BGP hijacks/config errors are popular recently too.