Bug #17307

nginx "504 Gateway Time-out" while refreshing the website at Tails release time

Added by intrigeri 2019-12-03 17:09:50 . Updated 2020-04-15 07:59:11 .

Status:
Resolved
Priority:
Elevated
Assignee:
intrigeri
Category:
Infrastructure
Target version:
Start date:
Due date:
% Done:

0%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

This happened to kibi today while pushing the web/release-4.1 branch before publishing the release. Note that in theory, there’s no reason why this push should trigger a website refresh.

This can also happen when pushing the master branch, which puts ikiwiki in a broken state, that can only be recovered from by rebuilding the website, which requires sysadmin privs. Can we increase this timeout?


Subtasks


Related issues

Related to Tails - Bug #17361: Streamline our release process Confirmed
Related to Tails - Bug #17363: Ensure only pushes to the master branch trigger a website refresh Resolved

History

#1 Updated by intrigeri 2019-12-15 11:24:52

  • Description updated

#2 Updated by intrigeri 2019-12-17 12:00:57

  • related to Bug #17361: Streamline our release process added

#3 Updated by intrigeri 2019-12-18 12:00:46

  • related to Bug #17363: Ensure only pushes to the master branch trigger a website refresh added

#4 Updated by intrigeri 2019-12-18 13:07:48

  • Status changed from Confirmed to Needs Validation
  • Assignee changed from intrigeri to zen

Done in https://git.tails.boum.org/puppet-tails/commit/?id=61a5763e8f1843b0ad4f9a18df5fb70bfa1bcbd8 and while I was at it, I also did https://git.tails.boum.org/puppet-tails/commit/?id=c63ec06ee4d17b75c8d0370a487cfc812cf755a3. I took the liberty of pushing this straight to production.

Please review :)

#5 Updated by CyrilBrulebois 2020-01-07 18:00:43

  • Target version changed from Tails_4.2 to Tails_4.3

#6 Updated by anonym 2020-02-11 15:26:28

  • Target version changed from Tails_4.3 to Tails_4.4

#7 Updated by CyrilBrulebois 2020-03-12 09:56:02

  • Target version changed from Tails_4.4 to Tails_4.5

#8 Updated by zen 2020-03-16 21:27:36

  • Assignee changed from zen to intrigeri

I’ve never done this workflow myself, but from the context I’m assuming that what happens is:

* Person pushes to Git repo.
* Git hook uses curl to make HTTP request to ikiwiki.cgi triggering rebuild.
* Ikiwiki is busy for some reason, user waits patiently.
* HTTP request times out after 5 minutes.
* Git hook get’s a 504 from curl and shows it to the user.

Can you please confirm that this understanding is correct?

If that is correct, I agree that increasing timeout out prevent inconsistent state of the website, but I don’t understand why turning off buffering would improve UX, as curl will still wait for the content of the request before returning anything to the user. Is it the case that the progress information will be shown to the user? Maybe we want to use -I as an option for curl?

It is also still not clear why a push to a non-master branch would trigger the rebuild, as the Git hook explicitly avoids that.

Maybe I need to better understand the details of how the problem expresses itself to be able to review the proposed solutions.

#9 Updated by intrigeri 2020-03-19 08:48:41

  • Status changed from Needs Validation to Resolved

Hi!

> I’ve never done this workflow myself, but from the context I’m assuming that what happens is:
>
> * Person pushes to Git repo.
> * Git hook uses curl to make HTTP request to ikiwiki.cgi triggering rebuild.
> * Ikiwiki is busy for some reason, user waits patiently.
> * HTTP request times out after 5 minutes.
> * Git hook get’s a 504 from curl and shows it to the user.
>
> Can you please confirm that this understanding is correct?

Yep, I think that’s it, from the PoV of the person who does the Git push.

And on top of that, I think that when the request times out, ikiwiki may get killed, which results in the website being in broken state, that can be repaired only by our sysadmins.

> If that is correct, I agree that increasing timeout out prevent inconsistent state of the website,

OK, I’m glad this part is validated :)
It was actually the most important aspect of this ticket ⇒ closing as resolved.
The UX part was a bonus, “while I’m at it” attempt.

> but I don’t understand why turning off buffering would improve UX, as curl will still wait for the content of the request before returning anything to the user. Is it the case that the progress information will be shown to the user?

I’ve just tested it and it seems that you’re right: empirically, it seems that the output is stuck on “Requesting update of ”$“:https://tails.boum.org/…” until ikiwiki has finished refreshing the website, at which point I see the output. It’s not 100% clear to me at this point if that’s caused by:

  1. the way we use curl
  2. how we pipe the output to perl
  3. how ikiwiki.cgi behaves
  4. how nginx behaves

In order to dismiss one of the curl-related potential culprits, I’ve passed it the --no-buffer option. This did not change anything (empirically).
I’ve also verified that when piping something through perl -p, perl processes lines one after the other and outputs the result incrementally, so that’s not it either.
That’s not much progress but it narrows a little bit the scope of the investigation :)

> Maybe we want to use -I as an option for curl?

I was not sure:

  • Why -I aka. --head would change anything: the reply header includes the success/error HTTP code, which presumably is unknown until the ikiwiki operation completes, so the same problems (lack of progress output to the user) should occur.
  • Whether ikiwiki.cgi?do=ping would do anything when it receives a HEAD HTTP command (as opposed to a GET).

Also, I would find it sad to hide the non-header part of the HTTP response, which is sometimes useful.

But anyway, I tested it. The good news is that ikiwiki.cgi does its job; the bad news is that the output is stil displayed in one batch at the end, so this does not help wrt. UX ⇒ reverted.

If you have other cheap ideas to try & improve the UX aspect, let’s try them. But IMO this is not important enough to warrant tracking this as an issue on Redmine.

> It is also still not clear why a push to a non-master branch would trigger the rebuild, as the Git hook explicitly avoids that.

I think you mean files/gitolite/hooks/www_website_ping-post-update.hook and I agree. Either something else, that I don’t understand yet, is going on. Or the “while pushing the web/release-4.1 branch” part of the bug report was incorrect. Without more info, I think we should close this issue and ask kibi to report back next time this happens.

#10 Updated by intrigeri 2020-04-15 07:59:11

Hi again,

>> but I don’t understand why turning off buffering would improve UX, as curl will still wait for the content of the request before returning anything to the user. Is it the case that the progress information will be shown to the user?
>
> I’ve just tested it and it seems that you’re right: empirically, it seems that the output is stuck on “Requesting update of ”$“:https://tails.boum.org/…” until ikiwiki has finished refreshing the website, at which point I see the output.

I’m coming back to this buffering topic today. My goal is to ensure this issue records correct information, in case we have to come back to this later. I don’t think any action is warranted.

So, my test results quoted above were relevant insofar as the output was small enough. For a larger output, e.g. what this issue is about, I see totally different results.

I’m currently pushing the branch for Bug #17005, which rebuilds essentially all our website, and I see the output arriving on my terminal in batches, a few dozen lines at a time. This looks like buffering to me.

The buffering seems to be character-based, not line based, because right now I see (last line is incomplete; it was already output’ed by ikiwiki but apparently it was not transmitted to my Git client yet):

remote: | building doc/first_steps/persistence/warnings.de.po
remote: | building doc/first_steps/persistence/warnings.fa.po
remote: | bu

I’ve measured how large 2 of these batches were. If I remove the “remote: ” prefix (which I believe is added by my Git client here), each of these batches was 8192 bytes large. That’s suspiciously close to 8 KiB.

I suppose something in the pipeline, between ikiwiki and my Git client, sends the output in 8 KiB batches.

In https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html#fastcgi_buffering I see:

When buffering is disabled, the response is passed to a client synchronously, immediately as it is received. nginx will not try to read the whole response from the FastCGI server. The maximum size of the data that nginx can receive from the server at a time is set by the fastcgi_buffer_size directive. 

So https://nginx.org/en/docs/http/ngx_http_fastcgi_module.html#fastcgi_buffer_size matters when fastcgi_buffering is disabled. The default value is 4K or 8K, depending on the platform. I guess it’s 8K on 64-bit platforms, which would explain the behavior I’m seeing.