Non-icky website statistics

TL;DR: If all you'd like to know is how to gain some insight into how your website is used without selling any of your own or your visitors' souls, feel free to skip the first part and scroll down to where I introduce my setup.

Please take a moment to appreciate the following situation:

As of October 2022, when visiting a website, there is a high chance of being greeted by an overlay informing you of various different tracking measures that are performed while you browse these websites. These overlays are often designed along so-called "dark patterns", design guidelines to produce interfaces that are as confusing as possible and nudge a user into consenting to practices they would, were they well-informed and knew what they were doing, not consent to.[1] These consent popups are oftentimes offered by third parties, so called "Consent Management Providers", that offer an automated analysis of the different tracking mechanisms embedded into a given web page, because keeping track of all the different kinds of tracking that happen manually is a hard problem which requires technical expertise.

One way to deal with these overlays is by using a browser plugin that automatically fills out these overlayed forms according to your preferences.[2] These browser plugins require regular maintenance because the overlays Consent Management Providers' overlays regularly change and break the mechanisms by which the plugins try to detect and fill out the form displayed.

And all you wanted to do was visit a web site.

I want to spare you an extensive analysis of the different market interests at play here. I also don't want to spend too much time thinking about whether finding yourself on the battle field of the TV show Robot Wars would offer an apt metaphor for this. I feel like there is already too much work going into all of this, and the time spent on keeping this running could be so much better spent on things that do not frustrate people but actually produced joy (this goes for both workers and users). I'd like to instead propose another way as an option to sidestep this at least for some people who find themselves running websites.

Data scarcity

The single question I find most useful when collecting personal data is do I really need it?

Not "may I need it later" or "could it increase the value of my product", but "do I need it for the specific problem I am trying to solve". You may not need website statistics. You probably do not need detailed insights into how people move their eyeballs while browsing the information you offer. My personal experience is that at least 90% of the analytics e-mails in companies I have worked for go unread.

I'd like to make a point that most tracking is not just unnecessary, it is harmful. Extensive data collection opens the door for abuse. It is impossible to know which kind of insights can be deduced from data gathered and the effective safeguard against this is to only collect data for a strictly defined purpose. I am in a nerdy way drawn towards statistical measures and a quantification of self and things around me. I still try to stop and think what exactly I would like to measure and for what purpose and I invite you to do the same.

Can you please just tell me how I'd do that in a way that satisfies your ridiculously high moral standards

Glad you're asking!

Option A

There are quite a few tools which you can (but don't have to) self-host that offer website analytics without creeping on your visitors. It's a good sign if a tool offers an option host it yourself because it hints at an awareness that it may be desirable to not share everything with third parties. The hosted offerings are usually available in exchange for a small monthly fee and you don't need any special knowledge to set them up.

Examples along those lines are Goatcounter, Umami or Plausbile. There is also Matomo, formerly known as Piwik, that offers hosted or self-hosted versions and is the oldest Google Analytics alternative I know. Please know that I am not affiliated with any of those services and have not used all of them extensively. You will have to make up your own mind (see that most of them offer live previews).

I personally stopped using client-side tracking altogether, which brings me to my own setup.

Option B

Saving the best for last: goaccess! Goaccess analyzes your server logs and presents useful insights to you in the form of a dashboard. It does not require adding any JavaScript to your page whatsoever! It does require you to be able to access said logs, which may not be easy if you are on a shared server.

These logs are most likely stored on your server anyways and are usually deleted after a short period of around 7 days. Goaccess supports the log formats of different web servers out of the box and because these are just text files you can use common command line tools like grep to filter the logs before piping them into goaccess to analyze them. There is a command-line interface you can use to gain some insights from the logs, or goaccess can output an HTML file with different graphs and visualizations. This is what I use to look at this page's statistics:

cat "logs/webserver/access_log*" | \
grep "^arnes.space" | \
goaccess --log-format=VCOMBINED

The logs are stored in different files, which are rotated daily. All of them are joined via cat. They contain logs for different services, so I filter out only those lines for the host arnes.space via grep. Finally, I pipe it into goaccess and tell it to parse the input as Apache's Virtual-Host Based log format. The result looks like this (click or tap the image to make it larger):

Goaccess Terminal Interface

The interface shows which browsers and operating systems are used, estimated access location, 404 responses and a lot more. It helps me make decisions about which browsers to support, and how much time to spend writing mobile styles. It lets me know if my posts are read or maybe somewhere there's a link I forgot to update or misspelled by accident. Maybe you even spotted that all of those IP addresses are anonymized, because I don't need to know all blocks. You can configure your server to do the same! If you want to take a look, an example of the HTML version (containing extensive, but not all available information) is linked to on the goaccess page.

Fin

Thank you for reading this far! If you have feedback to share, feel free to write me an e-mail or contact me on the fediverse.


  1. You can take a look at opt-in rates on newer iOS versions to get an impression what informed consent rates could look like: Ars Technica has an article on that and Statista provides statistics that look higher but still put it to about one in four users. ↩︎

  2. Consent-O-Matic is the one I use. In addition to other addons like Privacy Badger and uBlock Origin, of course. ↩︎