Feature #14922

Integrate download metrics in the new download page

Added by sajolida 2017-11-04 16:21:58 . Updated 2019-02-25 17:12:29 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Installation
Target version:
Start date:
2017-11-04
Due date:
% Done:

0%

Feature Branch:
web/14922-download-metrics
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description


Files


Subtasks


Related issues

Related to Tails - Bug #12127: Make "notes" more structured in the description of our mirror pool Confirmed 2017-01-10
Related to Tails - Bug #15312: "Tor check" button is badly aligned and looks buggy Resolved 2018-02-15
Blocks Tails - Feature #16080: Core work 2018Q4 → 2019Q2: User experience Resolved 2018-10-29
Blocks Tails - Bug #16009: Metrics for USB Image Resolved 2018-09-28

History

#1 Updated by sajolida 2017-12-14 16:33:36

I won’t have time to do this on the budget for Bug #12328.

#2 Updated by Anonymous 2018-01-15 15:33:39

@sajolida: could you explain what you want to do on this ticket please? thanks!

#3 Updated by sajolida 2018-01-21 11:20:00

Sure. With the new structure of the download page, I thought we could maybe hit a counter using JavaScript when an ISO image is downloaded or verified. That would give us a quite good download metric.

#4 Updated by sajolida 2018-01-21 11:23:52

  • related to Bug #12127: Make "notes" more structured in the description of our mirror pool added

#5 Updated by sajolida 2018-02-09 09:51:47

  • Target version set to Tails_3.9

I want to have some data before 3.11, so let’s target 3.9.

#6 Updated by sajolida 2018-07-09 11:54:46

  • Target version changed from Tails_3.9 to Tails_3.10.1

#7 Updated by sajolida 2018-08-19 17:32:47

  • Assignee changed from sajolida to intrigeri
  • QA Check set to Ready for QA
  • Feature Branch set to web/14922-download-metrics

Here is a branch that does that. I tested it on a counter for which I could see the logs of the web server.

I managed hit the counter on the result of the verification.

I didn’t manage to hit it when the download button is clicked (ISO or BitTorrent).

My understanding is that it’s because when the download button is clicked the browser opens the link that points to the download and JavaScript on the page is interrupted at that time. Someone with a better understanding of the internals of JavaScript in browsers might be able to fix that. Until then, people doing the OpenPGP verification only or skipping the verification won’t be counted.

I’m also passing along:

  • The scenario, so we know the base OS of people downloading Tails
  • The version, if that becomes interesting for some reason

intrigeri: Do you mind helping me identify who would be a suitable worker to review this (u doesn’t want to do that kind of stuff anymore)? They will be paid on my UX budget. See fundraising.git:drafts/ISC/1/unofficial.mdwn.

#8 Updated by intrigeri 2018-08-20 09:31:57

  • Assignee changed from intrigeri to sajolida

> I didn’t manage to hit it when the download button is clicked (ISO or BitTorrent). My understanding is that it’s because when the download button is clicked the browser opens the link that points to the download and JavaScript on the page is interrupted at that time.

Indeed, to fix that we would need to open the download URL via JS too, after hitting the counter. Should not be too hard except once you add real world constraints such as our use-mirror-pool handling and graceful fallback when JS is disabled.

> intrigeri: Do you mind helping me identify who would be a suitable worker to review this

Happy to help with that!

  • Short-term: I see nobody else than me with all the info in mind to review this so I’ll do it as part of reviewing code contributions that are on nobody else’s plate
  • Long-term: as we’ve noticed when doing the budget forecasting last year, you need a team-mate for this sort of things. I think Cody is the person that works closest to this area (website, IA) so I would say best is to somehow get him up-to-speed so he can do this kind of reviews.

So in this case, I propose you get a first review by Cody (so we make progress on solving the root cause of the problem) and then I’ll take a last quick look before merging (let’s call it a safety net). Works for you?

#9 Updated by sajolida 2018-09-09 13:04:57

ignifugo offered me to help with that. She doesn’t have an account so for the time being I’ll only add more info on how to test this but I can’t reassign it to her:

1. Build a local build of the website on my branch (web/14922-download-metrics). See https://tails.boum.org/contribute/build/website/.
2. Change counter_url in wiki/src/install/inc/js/download.js to an URL for which you can see the server’s log.
3. Uncomment showVerificationResult("successful"); at the bottom of wiki/src/install/inc/js/download.js.
4. Visit /install/download on your local build.

  • You should see a hit on the counter_url with status=successful.
  • But if you click on the “Download Tails” button, you don’t get a hit with status=download-iso.

See also the screenshots in attachment.

#10 Updated by sajolida 2018-10-20 23:55:00

  • Target version changed from Tails_3.10.1 to Tails_3.11

No news from ignifugo since the summit despite 2 pings, I’m asking Chris if he’s interested.

#11 Updated by lamby 2018-10-22 02:34:56

> I’m asking Chris if he’s interested.

I’m happy to review this as well as try and get the “normal” download to work as you write in note-7 above. Please assign back to me if you’d like me to go ahead, including the maximum number of hours I should exert on this (if I need more, I’ll get back to you). Thanks!

#12 Updated by sajolida 2018-10-22 21:35:39

  • Assignee changed from sajolida to lamby

Excellent, thanks for taking this one!

I spent myself 2.5 hours on the code (I’m slow and have a very poor training at JS). So I guess that 1 hour of review should be enough and maybe another hour to try to fix the normal download. Total 2 hours max.

#13 Updated by lamby 2018-10-22 21:37:03

ACK, on my TODO.

#14 Updated by lamby 2018-10-24 21:54:56

Please find the following attached patches:

0001-Ensure-that-we-call-our-hit-counter-before-following.patch fixes the issue where the hit counter was not being recorded as outlined in c9d24f225310073a95e28da773b4c051ef033ff5 or in #note-7.

0002-Ensure-that-failing-to-record-a-hit-does-not-cause-o.patch improves the robustness of the hit counter, ensuring that if we fail to record the hit (eg. if viewing the documentation locally and one’s content policy prevents the “ping” hitting an external server, or it fails to parse the url etc. etc.) then no “important” code fails to execute.

#15 Updated by sajolida 2018-10-27 15:11:06

  • QA Check changed from Ready for QA to Pass

Excellent! I applied your patches and merged. Exception handling is still out of my skillset, so thanks for spotting that!

lamby: How much time did you spend on this?

I’m leaving this ticket open to remind me to compute some first stats in some weeks to make sure this works as intended in production.

#16 Updated by sajolida 2018-10-27 15:52:04

Oops, actually, this doesn’t work on the production website. See screencast in attachment.

When click the download button I get a NetworkError instead of hiting the counter. I don’t get this error when I click the “I already downloaded Tails” link.

I reverted the whole branch because I initially thought that it also triggered Bug #16078 but it’s not the case. So if you need it for debugging, I could merged again the branch.

I also added 4563af94cf on top of your commits since the console is disabled elsewhere. But I think it’s unrelated to this new issue…

#17 Updated by lamby 2018-10-28 23:01:42

What exactly do you think is the issue here? :) We get a NetworkError 404 as the https://tails.boum.org/install/download/counter location does not actually exist; this was the expected behaviour as I understood it from your comment in note-7. Marking as Info Neeeded to match.

However, I can see that that is not ideal and a little misleading, therefore please find 0001-Add-a-dummy-counter-file-to-ensure-that-regular-oper.patch attached which adds such a file.

(In addition, we should probably cachebust these HTTP GET requests. For this, please see 0002-Cachebust-hits-to-the-counter.-re.-14922.patch.)

> lamby: How much time did you spend on this?

2 hours total.

#18 Updated by lamby 2018-10-29 01:56:16

Please also find the following 0001-Use-relative-URI-for-download-counter.-re.-14922.patch attached:

From 99012ca495e5ab2e74c79488baf6adf35b429915 Mon Sep 17 00:00:00 2001
From: Chris Lamb <chris@chris-lamb.co.uk>
Date: Sun, 28 Oct 2018 21:54:17 -0400
Subject: [PATCH] Use relative URI for download counter. (re. <del><a class='issue tracker-2 status-3 priority-4 priority-default closed child' href='/code/issues/14922' title='Integrate download metrics in the new download page'>Feature #14922</a></del>)

This will let it work over Tor (etc. as well as to ease local development)
due to cross-site request prevention / CORS, etc. etc.
---
 wiki/src/install/inc/js/download.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

#19 Updated by sajolida 2018-10-29 14:16:27

> What exactly do you think is the issue here? :) We get a NetworkError 404 as the https://tails.boum.org/install/download/counter location does not actually exist; this was the expected behaviour as I understood it from your comment in note-7. Marking as Info Neeeded to match.

I thought about that as well but I discarded this hyopthesis because I
was not getting the same NetworkError when clicking on the “I already
downloaded” link. Still, Bug #16078 is unrelated and the NetworkError
doesn’t prevent download, I applied your new patcher.

On the production website, I still have a NetworkError (and no GET in
the network activity) when clicking the download button, while I have a
GET and no NetworkError when clicking the “I already downloaded” link.

See screencast in attachment.

Since I don’t have real-time access to the server log, I can’t check
right-now if the GET is actually performed when clicking the download
button. I guess it’s not but I could download the logs tomorrow if
that’s helpful.

#20 Updated by sajolida 2018-10-29 14:17:28

#21 Updated by lamby 2018-10-29 14:50:06

Ahhh it’s a nice race condition that doesn’t occur locally for me due to network, etc. This should be fixed in 0001-Use-window.open-over-setting-window.location-to-avoi.patch

Please also find 0002-Avoid-a-traceback-if-mirror-dispatcher.js-is-not-ava.patch which (unrelated) prevents an ugly error when viewing the site locally due to the lack of mirror-dispatcher.js.

#22 Updated by sajolida 2018-10-29 15:15:27

  • Status changed from Confirmed to Resolved
  • QA Check changed from Ready for QA to Pass

Excellent! It works now :)

I’ll wait to apply 0002-Avoid-a-traceback-if-mirror-dispatcher.js-is-not-ava.patch until we have to modify page/template.tmpl for something else since it will trigger a full rebuild of the website on the server.

Marking this ticket as Pass but not closing it so I remember to computer some first stats in some weeks to make sure that the result is fine.

#23 Updated by lamby 2018-10-29 15:17:03

> Excellent! It works now :)

Silly race conditions… grr.

> Marking this ticket as Pass but not closing it so I […]

ACK.

#24 Updated by sajolida 2018-10-29 15:24:16

  • Status changed from Resolved to In Progress

#25 Updated by sajolida 2018-11-29 17:22:37

  • blocks Feature #16080: Core work 2018Q4 → 2019Q2: User experience added

#26 Updated by sajolida 2018-12-10 15:44:19

  • Target version changed from Tails_3.11 to Tails_3.12

#27 Updated by sajolida 2019-01-08 15:51:08

  • related to Bug #15312: "Tor check" button is badly aligned and looks buggy added

#28 Updated by sajolida 2019-01-08 19:30:04

  • Assignee changed from sajolida to intrigeri
  • QA Check changed from Pass to Info Needed

I pushed to internal.git:4c0b620 a first version of the script that I want to run on the logs to have download stats.

You can run it on one or several gzip or uncompressed log file. Like this:

ruby download.rb ~/Persistent/logs/access.log-2018-11-.*

You can also filter the dates:

ruby download.rb -d Nov ~/Persistent/logs/access.log-2018-11-01.gz

I found some of the results surprising so I investigated a bit more…

Only 42% of the hits of real browser on the Torrent file are counted by the counter.

I used:

  • Real browsers: zegrep 'GET /torrents/files/tails-amd64-.+\.torrent' ~/Persistent/logs/access.log-2018-11-*.gz | egrep 'Mozilla'
  • Counter: zegrep 'status=download-torrent' ~/Persistent/logs/access.log-2018-11-*.gz | egrep 'Mozilla'

(All “real” browsers have ‘Mozilla/5.0’ at the beginning of their user agent.)

| All hits      | 27486 | 100% |
| Real browsers | 11004 |  40% |
| Counter       |  4690 |  42% / 17% |

I tried to compare the logs in details and couldn’t understand why the counter was hit only in some occasions. It could be people without JavaScript but that would be a lot of them!

I also compared the counter on downloads with the stats we have from mirrors:

In November 2018 we had 16164 hits on /tails/stable/tails-amd64-3.10.1/tails-amd64-3.10.1.iso on hivane.net. Multiplied by 34 mirrors.weight, that’s 549236 hits/month.

I only had logs for the first 10 days, with no important release event, so let’s say 33% of the month → 183078 hits.

I have 24349 hits on the counter for downloads. That’s 13% of the hits on mirrors. A fraction similar to the one I have for the Torrents ~15%). So at least that’s consistent…

I also have a good diversity of user-agents hiting the counter so I don’t think that the mechanism is broken for some browsers and not for others.

I’m all ears if you have other ideas on how to investigate what might be causing this huge difference…

I’d also be interested in seeing how the script runs on the logs we have for December.

Ah, and I think you’re the one who said that starting from Tails 3.9, the user agent would reflect the OS but mine in 3.10.1 is “Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0” which doesn’t make it clear that Tails is Linux. How is that?

Here is what I get when running my script on the first 10 days of November:

By scenario:

|              |    Dl |  Torr |  Skip |  Succ |  Fail |   Dl2 | Fail2 |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Download     |  9274 |  1747 |  3478 |   829 |   101 |     8 |     0 |
|              |   38% |   37% |   24% |    8% |    1% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Debian       |  2472 |   322 |  1629 |   485 |    56 |     8 |     0 |
|              |   10% |    6% |   11% |   19% |    2% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Win          |  8612 |  2014 |  6873 |   883 |   174 |    17 |     0 |
|              |   35% |   42% |   47% |   10% |    2% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Linux        |   493 |    89 |   303 |    84 |    14 |     3 |     0 |
|              |    2% |    1% |    2% |   17% |    2% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Vm-download  |  1538 |   278 |   724 |    74 |     6 |     0 |     0 |
|              |    6% |    5% |    5% |    4% |    0% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Mac          |   467 |    70 |   470 |    62 |    21 |     6 |     0 |
|              |    1% |    1% |    3% |   13% |    4% |    1% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Dvd-download |   948 |   128 |   513 |   102 |     8 |     1 |     0 |
|              |    3% |    2% |    3% |   10% |    0% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Upgrade      |   545 |    43 |   397 |    89 |    25 |     4 |     0 |
|              |    2% |    0% |    2% |   16% |    4% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Total        | 24349 |  4691 | 14387 |  2608 |   405 |    47 |     0 |
|              |   83% |   16% |       |   10% |    1% |    0% |    0% |

By version:

|              |    Dl |  Torr |  Skip |  Succ |  Fail |   Dl2 | Fail2 |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| 3.10.1       | 24349 |  4691 | 14374 |  2608 |   405 |    47 |     0 |
|              |  100% |  100% |   99% |   10% |    1% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Total        | 24349 |  4691 | 14387 |  2608 |   405 |    47 |     0 |
|              |   83% |   16% |       |   10% |    1% |    0% |    0% |

By browser:

|              |    Dl |  Torr |  Skip |  Succ |  Fail |   Dl2 | Fail2 |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Opera        |   880 |   370 |   442 |     4 |     1 |     0 |     0 |
|              |    3% |    7% |    3% |    0% |    0% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Chrome       | 10392 |  3950 |  7253 |    83 |    19 |    10 |     0 |
|              |   42% |   84% |   50% |    0% |    0% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Firefox      | 12996 |   297 |  6596 |  2519 |   385 |    37 |     0 |
|              |   53% |    6% |   45% |   19% |    2% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Edge         |     9 |    36 |     3 |     2 |     0 |     0 |     0 |
|              |    0% |    0% |    0% |   22% |    0% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Safari       |    69 |    35 |    91 |     0 |     0 |     0 |     0 |
|              |    0% |    0% |    0% |    0% |    0% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| IE           |     0 |     0 |     2 |     0 |     0 |     0 |     0 |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Total        | 24346 |  4688 | 14387 |  2608 |   405 |    47 |     0 |
|              |   83% |   16% |       |   10% |    1% |    0% |    0% |

By OS:

|              |    Dl |  Torr |  Skip |  Succ |  Fail |   Dl2 | Fail2 |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Linux        |  4395 |   636 |  2259 |   791 |    89 |    17 |     0 |
|              |   18% |   13% |   15% |   17% |    2% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Macos        |  1225 |   246 |   819 |   115 |    15 |     6 |     0 |
|              |    5% |    5% |    5% |    9% |    1% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Windows      | 18232 |  3643 | 10686 |  1699 |   300 |    24 |     0 |
|              |   74% |   77% |   74% |    9% |    1% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Android      |   494 |   163 |   623 |     3 |     1 |     0 |     0 |
|              |    2% |    3% |    4% |    0% |    0% |    0% |    0% |
+--------------+-------+-------+-------+-------+-------+-------+-------+
| Total        | 24346 |  4688 | 14387 |  2608 |   405 |    47 |     0 |
|              |   83% |   16% |       |   10% |    1% |    0% |    0% |

#29 Updated by sajolida 2019-01-09 13:09:18

  • Private changed from No to Yes

#30 Updated by intrigeri 2019-01-09 17:11:38

  • Assignee changed from intrigeri to sajolida

Two questions:

  • When do you need answers from me on these questions? It would be really nice if this could wait until after 3.12.
  • Did you really intend to make this entire ticket private, or did you only mean to make your last comment private?

#32 Updated by sajolida 2019-01-10 09:25:32

  • Private changed from Yes to No

#33 Updated by sajolida 2019-01-10 09:33:19

> * When do you need answers from me on these questions? It would be really nice if this could wait until after 3.12.

The success metrics for USB images are due on May 31. So working on this
again after 3.12 definitely works.

I wanted to work on the script some months before 3.12 to make sure that
we had a baseline to compare with once USB images are released.

In general, it would be great to understand better the difference
between the hits on the server and the hits on counter to improve the
quality of the absolute numbers. But studying the variation of the
counter would be enough as success metrics for this project I think.

Furthermore, it would be too late now anyway to change anything in the
way to counter is triggered; otherwise our baseline wouldn’t be comparable.

#34 Updated by intrigeri 2019-01-10 15:18:27

  • Assignee changed from sajolida to intrigeri
  • Target version changed from Tails_3.12 to Tails_3.13

Postponing then (if I got it wrong, please correct).

#35 Updated by sajolida 2019-01-12 17:49:32

Ok!

#36 Updated by intrigeri 2019-01-27 10:57:24

#37 Updated by intrigeri 2019-02-06 17:54:51

  • Assignee changed from intrigeri to sajolida

> Only 42% of the hits of real browser on the Torrent file are counted by the counter.

> I used:

> * Real browsers: zegrep 'GET /torrents/files/tails-amd64-.+\.torrent' ~/Persistent/logs/access.log-2018-11-*.gz | egrep 'Mozilla'
> * Counter: zegrep 'status=download-torrent' ~/Persistent/logs/access.log-2018-11-*.gz | egrep 'Mozilla'

> (All “real” browsers have ‘Mozilla/5.0’ at the beginning of their user agent.)

> […]

> I’m all ears if you have other ideas on how to investigate what might be causing this huge difference…

Lots of bots also have this prefix so you’re counting them in “real browsers”. I bet they’ll never hit the counter. E.g. on the other websites we host, 40% of the hits that have this prefix are bots. So I think we need more precise heuristics to detect what’s a real browser. I bet there are Ruby libraries to do that, e.g. ruby-voight-kampff or ruby-device-detector :)

> I’d also be interested in seeing how the script runs on the logs we have for December.

I suspect it’s not worth the effort as long as the script does not correctly sorts out bots, is it?

Also, if we’ll have to run this script regularly, we should add it to puppet-tails.git and deploy it on the system where the logs live.

> Ah, and I think you’re the one who said that starting from Tails 3.9, the user agent would reflect the OS but mine in 3.10.1 is “Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0” which doesn’t make it clear that Tails is Linux.

Right, I wrote this on Bug #16010 (IIRC this came from https://trac.torproject.org/projects/tor/ticket/26146). I confirm with Tor Browser 8.0.5 on sid (not confined with AppArmor) the behavior you’ve observed.

> How is that?

The “Tor Browser 8.0.1 — September 24 2018” changelog entry reads “Bug 26146: Spoof HTTP User-Agent header for desktop platforms” so I guess they fixed the bug I thought they would leave alone. Too bad :/

#38 Updated by sajolida 2019-02-12 19:56:03

> Lots of bots also have this prefix so you’re counting them in “real browsers”. I bet they’ll never hit the counter. E.g. on the other websites we host, 40% of the hits that have this prefix are bots. So I think we need more precise heuristics to detect what’s a real browser.

Wow, I had no clue! That would explain such a big difference…

> I bet there are Ruby libraries to do that, e.g. ruby-voight-kampff or ruby-device-detector :)

And ruby-device-detector is in Debian so I’ll try to use that.

>> I’d also be interested in seeing how the script runs on the logs we have for December.
>
> I suspect it’s not worth the effort as long as the script does not correctly sorts out bots, is it?

If the difference I observe is accounted by bots, then my script might
be running fine actually and only counting real browsers (with
JavaScript enabled). So it would actually make sense to run it…

The only thing is that we wouldn’t be able to compare it with some of
the previous stats that we had. For example, the downloads of the
OpenPGP signatures for example, since they count every hit, including bots.

But actually since then I wrote another script that I think it much
better. So hold on :)

> Also, if we’ll have to run this script regularly, we should add it to puppet-tails.git and deploy it on the system where the logs live.

When I’ll need some metrics to report on some deliverables, I’ll ask you
for the log and compute them on my machine.

But indeed, if at some point we want to compute some of these metrics on
a regular basis, we should deploy it on our infra.

My new script is super generic and could allow us to compute different
metrics using the same code. This would be much more sustainable. So
again, hold on :)

> The “Tor Browser 8.0.1 — September 24 2018” changelog entry reads “Bug 26146: Spoof HTTP User-Agent header for desktop platforms” so I guess they fixed the bug I thought they would leave alone. Too bad :/

Thanks for investigating this!

#39 Updated by sajolida 2019-02-18 11:50:51

Now that I integrated device-detector in logparser, I should check how robots are handled by it.

#40 Updated by sajolida 2019-02-25 17:12:29

  • Status changed from In Progress to Resolved
  • Assignee deleted (sajolida)
  • QA Check deleted (Info Needed)

Done in internal.git:32e98f2.