Bug #16010

Have metrics on the hits on https://tails.boum.org/install/win/usb/#back

Added by Anonymous 2018-09-28 13:07:41 . Updated 2018-12-30 10:32:57 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Target version:
Start date:
2018-09-28
Due date:
% Done:

80%

Feature Branch:
web/16010-metrics-on-back
Type of work:
Website
Blueprint:

Starter:
Affected tool:
Deliverable for:
316

Description

…as a rough way of knowing how many people successfully installed Tails (possibly first time users) and tried to reboot it

This implies replacing ‘#’ by ‘?’. Otherwise we won’t see this in the server logs.

Hint: starting with Tor Browser 8.0 — included in Tails 3.9 — the User-Agent correctly reflects the platform aka. OS being used.


Files


Subtasks


History

#1 Updated by sajolida 2018-10-25 05:57:44

I need to do that before we release the USB images in order to see the improvement.

#2 Updated by sajolida 2018-11-06 01:16:34

  • Description updated

#3 Updated by sajolida 2018-11-17 17:47:25

  • Assignee deleted (sajolida)
  • QA Check set to Info Needed
  • Feature Branch set to web/16010-metrics-on-back

I implemented the replacement of ‘#’ by ‘?’ only to realize that, with ‘?’, people won’t be redirected to the ‘#back’ anchor deep in the steps (which is the main purpose of this redirection). D’oh!

So instead I’m now pointing people to, for example, /win/usb/back and using .htaccess to redirect them to /win/usb/#back.

We should see the hit in the web server logs.

My work is split into 2 parts:

  • web/16010-metrics-on-back in tails.git and based on master, so it can be merged before the release of the USB images project in order to compute the baseline.
  • 0001-Allow-computing-analytics-on-people-switching-device.patch a patch to be applied by our sysadmins on puppet-tails.git.

The next steps should be:

  1. Someone reviews both parts.
  2. Our sysadmins apply 0001-Allow-computing-analytics-on-people-switching-device.patch.
  3. I test that the redirection works well and that it reflects as a hit in the logs.
  4. I merge web/16010-metrics-on-back into master.
  5. I merge web/16010-metrics-on-back into doc/16006-usb-images (this will produce conflicts).
  6. I update the patch that I submitted on Feature #16130 (this will produce conflicts as well).

Ulrike, as our project manager, could you help me identify someone to do the review in step 1?

This work was not planned initially but is needed to compute the metrics to measure the impact of the project and that we sold to the funder. I have some budget for computing the metrics but we don’t have anybody for doing this kind of review. This should be merged at least some weeks before the release of the USB images project.

Skills needed are:

  • Regular expressions (I had to patch qrcode-encode).
  • Nginx rewrite rules (Apache would do).
  • Ikiwiki basics (no need to understand the installation assistant!).

I don’t think that a careful review will take more than 30 min max.

#5 Updated by intrigeri 2018-11-17 19:41:55

> * Regular expressions (I had to patch qrcode-encode).
> * Nginx rewrite rules (Apache would do).

I think either groente or I would do for these (and either way, I’d like the other one to take a quick look as well: we often get this sort of things wrong so more eyes are useful).

#6 Updated by sajolida 2018-11-29 16:55:21

I’m also super fine with deciding to drop these success metrics and sending a bit of the work I already did to the trash (I would still cherry-pick the improvements I did to the QR-Code scripts). It would still be less work yet to be done.

But I don’t think I can take this decision alone either.

#7 Updated by sajolida 2018-12-01 18:11:37

… or merge it without review.

#8 Updated by intrigeri 2018-12-01 20:45:51

  • Assignee set to intrigeri
  • QA Check changed from Info Needed to Ready for QA

Happy to do the review.

#9 Updated by intrigeri 2018-12-02 08:05:17

  • Assignee changed from intrigeri to sajolida
  • QA Check changed from Ready for QA to Dev Needed

> I implemented the replacement of ‘#’ by ‘?’ only to realize that, with ‘?’, people won’t be redirected to the ‘#back’ anchor deep in the steps (which is the main purpose of this redirection). D’oh!

Indeed.

> So instead I’m now pointing people to, for example, /win/usb/back and using .htaccess to redirect them to /win/usb/#back.

Ouch, too bad we did not discuss our options before you implemented this one, because I have some concerns about it:

  • It makes the coupling between the website content and the webserver config tighter, which increases ongoing work and stress.
  • I’m a little bit worried about the discrepancy this causes between the ikiwiki srcdir & dstdir namespaces and the HTTP query string namespace: this technique permanently squats /install/clone/back in the HTTP query string namespace, which will prevent us from ever using this in the ikiwiki srcdir namespace. This concern goes away if the added redirections are temporary (i.e. not cached by browsers).
  • This technique makes it more complicated to reason about our nginx config, in particular wrt. our trick to add the trailing slash.

None of these are blockers IMO but I find it a bit sad to go this way given I think other options would probably work just fine, without introducing these problems, e.g.:

  • Pass both ?back (to get a hit in our logs) and #back (so that users land in the right place) → if we don’t mind the resulting destination URL to be a bit ugly, no rewrite rules needed at all, so the amount of coupling between website content & webserver config remains unchanged.
  • Only pass ?back, as planned initially, and add one single generic rewrite rule (using $arg_back, see https://tangollama.wordpress.com/2013/02/19/referencing-query-parameters-in-nginx-rewrite-rules/) to redirect such queries to the correct anchor. I think that’s the best option: only 1 generic rewrite rule that likely won’t need to be updated every time you change the IA in depth, and won’t need coordination between website content changes and corresponding webserver config adjustments. If you buy it, then I would be happy to implement & test the nginx part of it. I think your part would boil down to “revert most things and replace #back with ?back”.

> # Someone reviews both parts.

Note: Some of my code review bits below are moot if we switch to the other implementation I’m proposing.

Regarding the tails.git branch:

  • I’ve pushed a few fixes to the regexp. I’ve explained in the commit message what I understood from the intent of the regexp and why the previous one did not correctly achieve that intent. If I got the intent wrong, let me know.
  • git grep '\?back' still finds some occurrences. Is that on purpose or is it a mere oversight?

Regarding the puppet-tails.git patch:

  • ^/install/clone/back?$ will match any string whose full content is /install/clone/bac followed by any single char. I guess that’s not what you mean and the ? quantifier is not needed here.

#10 Updated by sajolida 2018-12-04 10:37:06

> Ouch, too bad we did not discuss our options before you implemented this one

No problem, it wasn’t a lot of work anyway.

I’m not enough of a sysadmin to understand your concerns but I hear them :)

> * Pass both ?back (to get a hit in our logs) and #back (so that users land in the right place) → if we don’t mind the resulting destination URL to be a bit ugly, no rewrite rules needed at all, so the amount of coupling between website content & webserver config remains unchanged.

I don’t want to ask users to type 5 extra characters and 2 weird bits of
URL when they want to open the link on the other device. I would rather
not have the metrics.

> * Only pass ?back, as planned initially, and add one single generic rewrite rule (using $arg_back, see https://tangollama.wordpress.com/2013/02/19/referencing-query-parameters-in-nginx-rewrite-rules/) to redirect such queries to the correct anchor. I think that’s the best option: only 1 generic rewrite rule that likely won’t need to be updated every time you change the IA in depth, and won’t need coordination between website content changes and corresponding webserver config adjustments. If you buy it, then I would be happy to implement & test the nginx part of it. I think your part would boil down to “revert most things and replace #back with ?back”.

Sounds good. Please try it out if it’s not too much work. Otherwise I’m
still fine dropping the whole metrics.

> * I’ve pushed a few fixes to the regexp. I’ve explained in the commit message what I understood from the intent of the regexp and why the previous one did not correctly achieve that intent. If I got the intent wrong, let me know.

Excellent!

> * git grep '\?back' still finds some occurrences. Is that on purpose or is it a mere oversight?

Fixed in 3f1fbef17c.

> Regarding the puppet-tails.git patch:
>
> * ^/install/clone/back?$ will match any string whose full content is /install/clone/bac followed by any single char. I guess that’s not what you mean and the ? quantifier is not needed here.
Indeed :( But then I’m not sure to understand the functioning of the ‘?’
that with have in all our other rewrite rules. See 368fb2b97a and
31414866fd. If it’s not obvious to you either, I’ll investigate this
further outside of the scope of this ticket.

I understand that these lines will go away if you implement the
$arg_back approach, right?

#11 Updated by sajolida 2018-12-04 10:37:28

  • Assignee changed from sajolida to intrigeri
  • QA Check changed from Dev Needed to Info Needed

#12 Updated by intrigeri 2018-12-05 10:30:29

Hi!

sajolida wrote:
> intrigeri wrote:
>> Ouch, too bad we did not discuss our options before you implemented this one

> No problem, it wasn’t a lot of work anyway.

OK, good to know, this helps me relax a bit :)

>> * Pass both ?back (to get a hit in our logs) and #back (so that users land in the right place) → if we don’t mind the resulting destination URL to be a bit ugly, no rewrite rules needed at all, so the amount of coupling between website content & webserver config remains unchanged.

> I don’t want to ask users to type 5 extra characters and 2 weird bits of URL when they want to open the link on the other device.

Ah, right, I forgot some users will need to type these URLs by hand. Glad you’re on top of these things!

>> * Only pass ?back, as planned initially, and add one single generic rewrite rule (using $arg_back, see https://tangollama.wordpress.com/2013/02/19/referencing-query-parameters-in-nginx-rewrite-rules/) to redirect such queries to the correct anchor. I think that’s the best option: only 1 generic rewrite rule that likely won’t need to be updated every time you change the IA in depth, and won’t need coordination between website content changes and corresponding webserver config adjustments. If you buy it, then I would be happy to implement & test the nginx part of it. I think your part would boil down to “revert most things and replace #back with ?back”.

> Sounds good. Please try it out if it’s not too much work. Otherwise I’m still fine dropping the whole metrics.

OK, I’ll give it a try. TBH I’m quite excited to learn more about this aspect of nginx ;) But it’s also good to know there’s no big pressure.

>> Regarding the puppet-tails.git patch:
>>
>> * ^/install/clone/back?$ will match any string whose full content is /install/clone/bac followed by any single char.

Oops, scratch that, it won’t match that at all! Sorry, I guess this did not help you understand what I meant :/

> Indeed :( But then I’m not sure to understand the functioning of the ‘?’ that with have in all our other rewrite rules. See 368fb2b97a and 31414866fd. If it’s not obvious to you either, I’ll investigate this further outside of the scope of this ticket.

OK, you get a small part of my “regexp 101” crash course for free!

“?” is a quantifier: it makes the preceding pattern element match zero or one time. The “preceding pattern element” can be a single char or any kind of grouping (e.g. using parenthesis or a bracket expression). So:

  • In our other rewrite rules, in /?$, “?” applies to the literal character “/”. We write this in order to match the full URL regardless of whether it has a trailing slash or not.
  • In ^/install/clone/back?$, “?” applies to the preceding pattern element, which is the literal character “k”. So that regexp will match exactly two URLs: /install/clone/back (correct) and /install/clone/bac (not what you mean).

Clearer?

> I understand that these lines will go away if you implement the $arg_back approach, right?

Yes. Regardless, I think it’s useful that you understand basics the regexps such as “?”, because you’re often writing rewrite rules, which is why I’m spending a bit of time to explain it here :)

#13 Updated by intrigeri 2018-12-05 10:31:06

  • QA Check deleted (Info Needed)

#14 Updated by intrigeri 2018-12-05 11:16:03

  • Assignee changed from intrigeri to sajolida

>>> * Only pass ?back, as planned initially, and add one single generic rewrite rule (using $arg_back, see https://tangollama.wordpress.com/2013/02/19/referencing-query-parameters-in-nginx-rewrite-rules/) to redirect such queries to the correct anchor. I think that’s the best option: only 1 generic rewrite rule that likely won’t need to be updated every time you change the IA in depth, and won’t need coordination between website content changes and corresponding webserver config adjustments. If you buy it, then I would be happy to implement & test the nginx part of it. I think your part would boil down to “revert most things and replace #back with ?back”.

>> Sounds good. Please try it out if it’s not too much work. Otherwise I’m still fine dropping the whole metrics.

> OK, I’ll give it a try. TBH I’m quite excited to learn more about this aspect of nginx ;) But it’s also good to know there’s no big pressure.

Deployed with one (minor?) caveat: the most straightforward way to do this in a robust manner with nginx is to give the query parameter a value, so the URLs you’ll write shall use ?back=1 instead of the promised ?back. If that’s a blocker I can adjust the config I’ve prepared to remove this limitation and support ?back; the resulting config will be a tad more complex but I can live with that; my main concern is that it will make it harder to add support for additional query parameters if we need any later on, so I’d rather stick to back=1 that is way more extensible.

You can try for example https://tails.boum.org/install/debian/usb?back=1 which seems to do the job :)

#15 Updated by sajolida 2018-12-05 12:12:18

> OK, you get a small part of my “regexp 101” crash course for free!

Thanks :)

> “?” is a quantifier: it makes the preceding pattern element match zero or one time. The “preceding pattern element” can be a single char or any kind of grouping (e.g. using parenthesis or a bracket expression). So:
>
> * In our other rewrite rules, in /?$, “?” applies to the literal character “/”. We write this in order to match the full URL regardless of whether it has a trailing slash or not.
> * In ^/install/clone/back?$, “?” applies to the preceding pattern element, which is the literal character “k”. So that regexp will match exactly two URLs: /install/clone/back (correct) and /install/clone/bac (not what you mean).
>
> Clearer?

Hmm… It’s weird because I kind of knew this already. I wonder what
happened to me yesterday as it seemed it was clear to me when I wrote
31414866fd. I think it got momentarily confused by what the “preceding
pattern element” applied too in the middle of natural language strings.

Hopefully you explanation will make it stick a bit better to my brain.

#16 Updated by sajolida 2018-12-13 15:01:58

  • Target version changed from Tails_3.11 to Tails_3.12

#17 Updated by sajolida 2018-12-18 13:12:25

  • Assignee changed from sajolida to intrigeri
  • QA Check set to Ready for QA
  • Type of work changed from Research to Website

Ok, let’s go for “?back=1”. I implemented this in 740af44c90.

Please have a look and merge into master if I got it right.

#18 Updated by intrigeri 2018-12-29 10:35:59

  • Assignee changed from intrigeri to sajolida
  • QA Check changed from Ready for QA to Dev Needed

> Ok, let’s go for “?back=1”. I implemented this in 740af44c90.

> Please have a look and merge into master if I got it right.

Sorry for the delay! This looks good, you go it right.

From my PoV, the only blocker for merging is that some English strings are modified without the corresponding translated string being marked as fuzzy (or mechanically updated). For example, see the one that introduces https://tails.boum.org/install/clone?back=1 in wiki/src/install/inc/steps/restart_first_time.inline.de.po. If I merged this as-is, in non-English the IA will be broken because the user will click links such as https://tails.boum.org/install/clone/back/index.de.html.

#19 Updated by sajolida 2018-12-30 00:05:46

  • Assignee changed from sajolida to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

Good catch! What about 9a2d190445 now?

#20 Updated by intrigeri 2018-12-30 10:10:13

  • Status changed from Confirmed to Resolved
  • % Done changed from 0 to 100

Applied in changeset commit:tails|83e33af3423bbaa6126a628bbb055e79afc617ac.

#21 Updated by intrigeri 2018-12-30 10:17:46

  • Status changed from Resolved to In Progress
  • Assignee changed from intrigeri to sajolida
  • Priority changed from Normal to High
  • % Done changed from 100 to 80
  • QA Check deleted (Ready for QA)

Ouch, I’ve merged this without building first and I realized only after pushing that this breaks stuff because the Markdown references non-existing images: https://tails.boum.org/install/clone/.

I could have fixed these [[!img]] directives easily but that might hide the problem: one should also check whether the qrcode in the images we do have points to the right place. So I’ll let you do that. Given this breaks the IA, bumping priority & feel free to push your fix straight to master, I’ll review it later.

#22 Updated by sajolida 2018-12-30 10:32:57

  • Status changed from In Progress to Resolved
  • Assignee deleted (sajolida)

I fixed this in 6a10f6e5e6, a frequent mistake of mine…

Sorry for all the mess on this branch :(