Feature #14404

Remove Vagrant cache-mechanism from build configuration as it might break on some filesystems

Added by muri 2017-08-21 15:10:55 . Updated 2018-09-05 16:21:26 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Build system
Target version:
Start date:
2017-08-21
Due date:
% Done:

100%

Feature Branch:
bugfix/14404-remove-vagrant-disk-cache-setting
Type of work:
Code
Blueprint:

Starter:
Affected tool:
Deliverable for:

Description

on some filesystems (i.e. zfs) its only possible to build tails with the libvirt cache attribute set to ‘writethrough’ instead of ‘none’ in the Vagrantfile. i guess thats a corner case, but maybe that should be configurable via env.


Subtasks


Related issues

Blocks Tails - Feature #15334: Core work 2018Q3: Foundations Team Resolved 2018-02-20

History

#1 Updated by intrigeri 2017-08-25 16:47:50

  • Assignee set to muri
  • QA Check set to Info Needed

Is this solely about:

domain.storage(
:file, size: '15G', allow_existing: true, cache: 'none',
path: 'apt-cacher-ng-data.qcow2'
)

… or anything else?

#2 Updated by muri 2017-08-26 06:58:46

  • Assignee changed from muri to intrigeri
  • QA Check deleted (Info Needed)

intrigeri wrote:
> Is this solely about:
>
> […]
>
> … or anything else?

no, its only about that!

#3 Updated by intrigeri 2017-08-26 07:35:24

  • Assignee changed from intrigeri to muri
  • QA Check set to Info Needed

> no, its only about that!

Thanks.

Ideally libvirt/QEMU should figure out what is best for the current environment, and we would not have to make this configurable.

So, does it work for you to entirely remove the cache: 'none' thing? (instead of setting cache: 'writethrough')

#4 Updated by muri 2017-08-26 08:09:11

  • Assignee changed from muri to intrigeri
  • QA Check deleted (Info Needed)

intrigeri wrote:
[…]
>
> So, does it work for you to entirely remove the cache: 'none' thing? (instead of setting cache: 'writethrough')

oh, i didn’t even think of that! yes it does!

#5 Updated by intrigeri 2017-08-26 15:33:23

  • Assignee changed from intrigeri to anonym

Thanks! anonym, can you retitle this ticket accordingly and handle it as part of your Foundations Team work, e.g. in 2017Q4?

#6 Updated by Anonymous 2018-01-15 16:05:37

This looks like a trivial fix which should be handled by either intrigeri or anonym.

#7 Updated by Anonymous 2018-01-15 16:06:23

  • Subject changed from Let vagrant cache-mechanism be configurable to Remove vagrant cache-mechanism from build configuration as it might break on some filesystems

#8 Updated by intrigeri 2018-02-13 13:12:53

#9 Updated by intrigeri 2018-02-13 13:13:07

  • Subject changed from Remove vagrant cache-mechanism from build configuration as it might break on some filesystems to Remove Vagrant cache-mechanism from build configuration as it might break on some filesystems
  • Target version set to Tails_3.6

#10 Updated by anonym 2018-02-19 13:55:06

  • Target version changed from Tails_3.6 to Tails_3.7

#11 Updated by intrigeri 2018-02-20 08:43:59

  • blocked by deleted (Feature #13245: Core work 2018Q1: Foundations Team)

#12 Updated by intrigeri 2018-02-20 08:44:03

#13 Updated by intrigeri 2018-04-08 15:00:19

  • Assignee deleted (anonym)

I’m adding this to the list of candidates for new FT members because that’s a good way to try your ISO build environment once you’ve set it up, and to better understand how our Vagrant build system works.

I would:

  1. delete the problematic 2 words
  2. test that it still works on my system
  3. ask a couple other devs to confirm it does not break stuff for them
  4. ask anonym or intrigeri to test if it still works on Jenkins
  5. ask muri to confirm it fixes his problem

#14 Updated by intrigeri 2018-04-13 14:29:22

  • Assignee set to CyrilBrulebois
  • Estimated time set to 2 h

#15 Updated by bertagaz 2018-05-10 11:09:21

  • Target version changed from Tails_3.7 to Tails_3.8

#16 Updated by intrigeri 2018-06-26 16:27:57

  • Target version changed from Tails_3.8 to Tails_3.9

#17 Updated by CyrilBrulebois 2018-06-28 13:24:30

  • Assignee deleted (CyrilBrulebois)
  • QA Check set to Ready for QA
  • Feature Branch set to kibi/wip/bugfix/14404-adjust-vagrant-cache-setting

I’ve pushed a branch on my repository, with a single commit:
https://mraw.org/git/?p=tails.git;a=commit;h=efbf044324f190ef60e6545ad035fdd26c468ee0

(Should I get write access to the central git repository at some point?)

I wasn’t exactly sure whether it was preferred to remove the setting entirely, or to make it configurable; so I went for the latter since the former seemed a bit too easy.

As far as I can tell, not setting cache=none could lead to some performance penalty, because caching would happen at multiple levels. So for this first attempt, I’ve kept none the default, and made it configurable through the TAILS_PROXY_STORAGE_CACHE_TYPE environment variable.

Verified (deleting the VM between attempts):

  • not setting the variable leads to cache=none as seen in the VM settings (from virt-manager).
  • setting TAILS_PROXY_STORAGE_CACHE_TYPE to writethrough leads to cache=writethrough in the VM settings.
  • setting it to an unknown value (e.g. foo) leads to:
Bringing machine 'default' up with 'libvirt' provider...
An action 'up' was attempted on the machine 'default',
but another process is already executing an action on the machine.
Vagrant locks each machine for access by only one process at a time.
Please wait until the other Vagrant process finishes modifying this
machine, then try again.

If you believe this message is in error, please check the process
listing for any "ruby" or "vagrant" processes and kill them. Then
try again.
rake aborted!

I’m not sure it makes sense to try and catch unknown values as those can differ across libvirt version, as documented upstream in: https://libvirt.org/formatdomain.html#elementsDisks

Some questions from my commit message:

FIXME:
 - should we start with a commit to disable the cache setting entirely,
   which seemed to work based on intrigeri's input in the bug log?
 - should we try to autodetect zfs and adjust the default accordingly?
   (the latter seems harder, possibly not entirely reliable?)

The first one was already at the top of this bug log entry: drop entirely, or make it configurable?

The second seems much harder to get right.

What do you think?

Also, pardon my ruby. I’ll try and get better at it in the upcoming weeks…

Also, I guess you folks will have deeper insight as to how to name this particular environment variable.

Follow-up work will likely include documenting it somewhere under doc/, but I guess there are enough questions to resolve first…

#18 Updated by intrigeri 2018-06-28 13:59:56

  • Assignee set to intrigeri

#19 Updated by intrigeri 2018-06-28 15:36:47

  • Assignee changed from intrigeri to CyrilBrulebois
  • QA Check changed from Ready for QA to Dev Needed

> I wasn’t exactly sure whether it was preferred to remove the setting entirely, or to make it configurable; so I went for the latter since the former seemed a bit too easy.

I’d rather remove the setting entirely and trust libvirt’s defaults unless there’s a good reason to carry the complexity of making this configurable (see below).

Now, if I was 100% happy with the way you’ve implemented the configurable version, I would just merge it. But I’m not 100% happy with it for a few reasons:

  • We try to set all build settings via TAILS_BUILD_OPTIONS (https://tails.boum.org/contribute/build/#index5h1) that we translate in Rakefile. Introducing a new envvar is not consistent with the interface we currently provide to developers. I know http_proxy is an exception but it’s a pretty standard one.
  • As you’ve noticed, some doc would be needed.
  • The added Ruby code could be made slightly more idiomatic.

> As far as I can tell, not setting cache=none could lead to some performance penalty, because caching would happen at multiple levels.

I think we should try the original plan (not setting cache= at all) and measure if there’s any such performance penalty. I can measure this in 3 build environments. If there’s no problem to solve, let’s not spend more time on a solution :)

> Also, pardon my ruby. I’ll try and get better at it in the upcoming weeks…

FWIW I’ve found the “Introduction to Ruby” O’Reilly book pretty good :)

#20 Updated by CyrilBrulebois 2018-06-30 18:14:40

  • Assignee deleted (CyrilBrulebois)
  • QA Check changed from Dev Needed to Ready for QA
  • Feature Branch changed from kibi/wip/bugfix/14404-adjust-vagrant-cache-setting to kibi:bugfix/14404-adjust-vagrant-cache-setting

Alright, many thanks for your feedback.

I’ve pushed a new branch (and fixed the namespace:branch syntax in the Feature Branch field) with a single commit: https://mraw.org/git/?p=tails.git;a=commitdiff;h=43d5d0e5b65507a29dd131b88c51bb78325488f6

Deleting everything from libvirt through virt-manager, and restarting a build leads me to a VM that has both storage units with Performance options > Cache mode set to Hypervisor default (same as IO mode), which seems to be the desired result.

Only tested on a single setup, so I’m happy to hear more about measurements with the multiple setups intrigeri mentioned. :)

Switching back to Ready for QA as I think it’s ready to be merged into devel, unless those measurements show some heavy regression?

#21 Updated by intrigeri 2018-06-30 18:58:31

  • Assignee set to intrigeri

#22 Updated by intrigeri 2018-06-30 19:02:15

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 50

Code review passes, will now benchmark in these 3 build envs.

#23 Updated by Anonymous 2018-06-30 19:03:53

  • Status changed from In Progress to Resolved
  • % Done changed from 50 to 100

Applied in changeset commit:43d5d0e5b65507a29dd131b88c51bb78325488f6.

#24 Updated by intrigeri 2018-06-30 20:44:30

  • Status changed from Resolved to In Progress
  • % Done changed from 100 to 50

#25 Updated by intrigeri 2018-06-30 21:09:50

  • Feature Branch changed from kibi:bugfix/14404-adjust-vagrant-cache-setting to bugfix/14404-remove-vagrant-disk-cache-setting

#26 Updated by intrigeri 2018-06-30 21:11:10

Pushed 8790cc8edeb1575c55065eca60dc80f2f5bec114 on top, otherwise this will be hard to test.

#27 Updated by intrigeri 2018-07-01 07:55:21

I’m going to transplant the branch (rebase --onto stable devel) to get more reliable benchmarking results: for builds based on devel, there’s a risk of downloading a full set of updated APT indices in some builds but not all of them, while for builds based on stable, this can happen only for the Debian security archive. Besides, this is a bugfix and I’d rather see it take effect for stable-based builds as soon as it’s ready, without waiting for 3.9 to be released early September :)

#28 Updated by intrigeri 2018-07-01 18:31:03

  • Assignee changed from intrigeri to segfault
  • QA Check changed from Ready for QA to Info Needed

Ouch. Actually, the 3 systems where I wanted to benchmark this vs. stable don’t use the internal apt-cacher-ng, so I can’t actually provide the promised data. I know that segfault does use the internal proxy, so perhaps he could build the branch 3 times and stable 3 times, all this wrapped with time, and give us the data?

#29 Updated by intrigeri 2018-07-22 01:21:25

#30 Updated by intrigeri 2018-07-22 01:21:31

  • blocked by deleted (Feature #15139: Core work 2018Q2: Foundations Team)

#31 Updated by segfault 2018-07-30 19:35:54

  • Assignee changed from segfault to intrigeri
  • QA Check deleted (Info Needed)

bugfix/14404-remove-vagrant-disk-cache-setting:

1. rake build 13.27s user 4.82s system 0% cpu 38:33.80 total
2. rake build 21.08s user 7.37s system 1% cpu 39:44.36 total
3. rake build 10.89s user 5.71s system 0% cpu 37:49.54 total

stable:

1. rake build 22.04s user 7.85s system 1% cpu 37:44.52 total
2. rake build 21.18s user 7.75s system 1% cpu 38:08.97 total
3. rake build 10.50s user 3.03s system 0% cpu 38:29.38 total

#32 Updated by intrigeri 2018-08-01 08:27:53

  • Status changed from In Progress to Fix committed
  • Assignee deleted (intrigeri)
  • % Done changed from 50 to 100

Thanks! Merged into stable and devel.

muri, it would be nice if you could test and confirm that you can now build one of these branches without local hacks.

#33 Updated by muri 2018-08-04 09:56:43

intrigeri wrote:
> muri, it would be nice if you could test and confirm that you can now build one of these branches without local hacks.

i’m sorry, but i don’t have access to a zfs based system any more, so i’m not able to test this

#34 Updated by intrigeri 2018-08-04 13:02:20

> i’m sorry, but i don’t have access to a zfs based system any more, so i’m not able to test this

No problem! :)

#35 Updated by intrigeri 2018-09-05 16:21:26

  • Status changed from Fix committed to Resolved