Feature #6354

Migrate to vagrant-libvirt

Added by anonym 2013-10-10 04:49:41 . Updated 2016-06-08 01:23:02 .

Status:
Resolved
Priority:
High
Assignee:
Category:
Build system
Target version:
Start date:
2014-10-13
Due date:
% Done:

100%

Feature Branch:
feature/6354-vagrant-libvirt
Type of work:
Code
Starter:
0
Affected tool:
Deliverable for:

Description

Since our test suite uses libvirt/KVM, and our current Vagrant-based build system uses Virtualbox, both cannot be run at the same time (modules have to be switched). We should migrate to using a libvirt/KVM provider via the vagrant-libvirt module.


Files


Subtasks

Feature #8086: Test our vagrant-libvirt -based build system with Jessie's Vagrant Resolved

100

Feature #11153: Have vagrant-libvirt in Debian Resolved

100

Feature #11154: Make our vagrant-libvirt -based build system compatible with testing/sid's ruby-net-ssh Resolved

100


Related issues

Related to Tails - Bug #6212: investigate vagrant-libvirt for our build system Resolved 2013-08-07
Related to Tails - Bug #11171: Split build task in Rakefile Confirmed 2016-02-26
Blocked by Tails - Bug #6356: Fix Vagrant basebox building using veewee and KVM Resolved 2013-10-10

History

#1 Updated by BitingBird 2014-06-21 21:33:19

vagrant-libvirt does not seem to be packaged for Debian. Open a ticket for it ?

#3 Updated by BitingBird 2015-01-08 04:30:29

Seems complicated, still not packaged.

#4 Updated by intrigeri 2015-08-13 05:58:32

  • Feature Branch changed from feature/vagrant-libvirt to feature/6354-vagrant-libvirt

#5 Updated by anonym 2016-02-18 20:10:50

I used the following steps to convert our existing base box to a vagrant-libvirt one:

wget http://dl.amnesia.boum.org/tails/project/vagrant/tails-builder-20141201.box

tar xf tails-builder-20141201.box

qemu-img convert -f vmdk -O qcow2 box-disk1.vmdk box-disk1.qcow2

/path/to/vagrant-libvirt-sources/tools/create_box.sh box-disk1.qcow2 \
    tails-builder-20141201+libvirt.box

#6 Updated by anonym 2016-02-18 22:20:14

  • Target version set to Tails_2.3
  • % Done changed from 0 to 40

I’ve force-pushed a complete rework of this branch (the old branch still lives in feature/6354-vagrant-libvirt_old if we get any reason to look at the worthless junk before this branch is merged :)). The approach taken in this new branch should significantly reduce the amount of Vagrant headaches to be expected in the future, which is a big plus.

#7 Updated by anonym 2016-02-18 22:38:57

So, I’ve managed a build an image using this libvirt/KVM Vagrant setup on a current Debian Unstable with:

  • vagrant 1.8.1+dfsg-1
  • ruby-fog-libvirt 0.0.3-1~2.gbpf4abdb (package built from here
  • vagrant-libvirt 0.0.32-2~1.gbpbbeabb (package built from here with my pending pull-request applied
  • I applied a local fix to Debian bug #795603 namely to run gem2deb on the ruby-libvirt 0.5.1 gem file to get the appropriate gemspec file (attached) to but into /usr/share/rubygems-integration./all/specifications/. This suggest that this package just has to be rebuilt with a newer version of gem2deb to fix this bug.

So, it remains to get these two packages into Debian (ITP and fix the ruby-libvirt rubygem integration.

#8 Updated by anonym 2016-02-19 02:00:16

If you add these APT sources (using Tails APT repo’s signing key) you can get the dependencies missing from Debian:

deb http://deb.tails.boum.org/ feature-6354-vagrant-libvirt main
deb-src http://deb.tails.boum.org/ feature-6354-vagrant-libvirt main

and then sudo apt install vagrant-libvirt/feature-6354-vagrant-libvirt ruby-fog-libvirt/feature-6354-vagrant-libvirt.

You’ll still have to workaround Debian bug #795603 as I outlined above.

#9 Updated by intrigeri 2016-02-19 10:46:32

> * I applied a local fix to Debian bug #795603

I thought I had “fixed” this in commit 01a5453ae14d350852ba5c323bcd21db12aba603 in the ruby-fog-libvirt packaging. Seems to work for me without any need for manual kludges. anonym, can you please confirm?

#10 Updated by anonym 2016-02-19 10:55:26

intrigeri wrote:
> > * I applied a local fix to Debian bug #795603
>
> I thought I had “fixed” this in commit 01a5453ae14d350852ba5c323bcd21db12aba603 in the ruby-fog-libvirt packaging. Seems to work for me without any need for manual kludges. anonym, can you please confirm?

Ah, you are correct. So never mind about the “local fix” — to test this one only needs to install the above two packages. Yay!

#11 Updated by intrigeri 2016-02-21 21:44:44

#12 Updated by intrigeri 2016-02-21 22:05:20

#13 Updated by intrigeri 2016-02-21 22:06:12

  • Assignee changed from anonym to intrigeri
  • Target version changed from Tails_2.3 to Tails_2.2
  • QA Check set to Ready for QA

I’ll try it out and if I’m happy this will go into Tails 2.2.

#14 Updated by intrigeri 2016-02-22 00:09:14

  • Assignee changed from intrigeri to anonym
  • Target version changed from Tails_2.2 to Tails_2.3
  • QA Check changed from Ready for QA to Dev Needed

#15 Updated by anonym 2016-02-26 13:59:29

  • related to Bug #11171: Split build task in Rakefile added

#16 Updated by anonym 2016-03-01 10:12:23

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Info Needed

Ok, I actually cannot remember completely the issues you described you had. Do you? Any way, I’ve pushed a lot of small fixes (including upgrading the builder VM to Debian Jessie), and hopefully your issues were fixed.

Also, I’ve tried to run this inside a Debian Jessie VM so the builder was nested. I did not manage to complete a build due to what on the surface looks like a kernel scheduler bug. On the host I get several stack traces like this:

[ 3120.106682] Call Trace:
[ 3120.106693]  [<ffffffff8158667f>] ? schedule+0x2f/0x70
[ 3120.106699]  [<ffffffff8105a5fa>] ? kvm_async_pf_task_wait+0x1aa/0x230
[ 3120.106704]  [<ffffffff810b3210>] ? wait_woken+0x90/0x90
[ 3120.106708]  [<ffffffff8115d000>] ? user_return_notifier_unregister+0x30/0x60
[ 3120.106713]  [<ffffffff81003b2c>] ? prepare_exit_to_usermode+0xcc/0x100
[ 3120.106718]  [<ffffffff8158c5e8>] ? async_page_fault+0x28/0x30[ 3120.106723] INFO: task qemu-system-x86:2849 blocked for more than 120 seconds.

and similar ones for some seemingly unrelated processes like systemd-machine and avahi-daemon.

On the guest I get:

Mar 01 00:52:58 vagrant-jessie kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000003
Mar 01 00:52:58 vagrant-jessie kernel: IP: [<ffffffff812af167>] radix_tree_node_ctor+0x27/0xa0

I tried upgrading to the jessie-backports kernel (linux-image-4.3.0-0.bpo.1-amd64) in both VMs so the kernel would match on every level (my level-0 host runs Debian Unstable), but it didn’t fix anything. This could be problematic for our plans to use this on Jenkins, possibly.

Still, the Tails build starts and manages to progress to the chroot_local-hooks, so I am confident that the build would succeed were it not for the kernel bug. However, currently the following packages has to be installed from Debian Stretch (the versions I got at the time of writing this are in parentheses):

  • vagrant (1.8.1+dfsg-1): Jessie’s version simply cannot find the vagrant-libvirt plugin. To fix this we need Support-system-installed-plugins.patch (introduced in version 1.7.2+dfsg-4) packported to Jessie in some way. But since no plugin is packaged in Jessie, it IMHO does not make sense to introduce the patch in Jessie’s version so we need to backport vagrant, and add a an appropriately high version dependency on vagrant to vagrant-libvirt (same should be done to vagrant-lxc if we want everything to be completely consistent in Debian, even if it never gets backported to Jessie).
  • ruby-excon (0.45.1-2): the vagrant-libvirt gemspec requires “excon (~> 0.45)”, but Jessie has only 0.33.0-2.
  • ruby-fog-core (1.32.1-1): the ruby-fog-libvirt gemspec requires “fog-core >= 1.27.4”, but Jessie has only 1.22.0-1.
  • ruby-fog-xml (0.1.1-5): not in Debian Jessie, but it’s just a new dep of ruby-fog-core introduced in a later version.

AFAICT, installing these packages straight from Stretch works fine, so backporting probably amounts to trivial rebuilds.

So, yay, we’re quite not there yet as far as supporting building on Debian Jessie. I feel tempted to consider only supporting Debian Unstable, but let’s not make the entry to contribution even higher. Right?

The next step, IMHO, is to test this on Debian Jessie running on bare metal to verify that the kernel issue I saw is due to VM nesting and not Debian Jessie itself. I’d also like to try to run it in a Debian Unstable VM (=> VM nesting) for the same reasons.

#17 Updated by anonym 2016-03-01 10:36:18

anonym wrote:
> I did not manage to complete a build due to what on the surface looks like a kernel scheduler bug.

This seems to be the same as Bug #9157.

#18 Updated by anonym 2016-03-02 21:50:50

I’m not hit by the kernel issue when the level-1 host is running Debian Unstable (both kernel and user land). The image built fine, and nesting added 8.5% to the built time, which isn’t too bad. Sounds quite acceptable for our isobuilders! :)

#19 Updated by anonym 2016-03-03 17:04:09

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Info Needed to Ready for QA

Leaving the review aside, bert, can you just test that you can get this running on a Debian Jessie and/or Debian Sid system? Bare metal please! But feel free to reproduce my nested tests in addition to bare metal.

#20 Updated by bertagaz 2016-03-22 17:28:49

  • Assignee changed from bertagaz to anonym
  • QA Check changed from Ready for QA to Dev Needed

anonym wrote:
> Leaving the review aside, bert, can you just test that you can get this running on a Debian Jessie and/or Debian Sid system? Bare metal please! But feel free to reproduce my nested tests in addition to bare metal.

I’ve tested it on Jessie and it works fine from the first try by just following the documentation.

I’ve tried on my own Sid system, but it has no IPv6, and this makes vagrant-libvirt fail probably because it lacks this commit which has not been released yet.

To test it, I’ve had to merge devel back in it (commit:54100edc). There were a bunch of changes in this base branch that lead to a conflict that I resolved. Please have a look if I didn’t mess something. Builds fine on Jenkins

I also added a small change to the Rakefile to notice people why they have to wait for some time (commit:74784ac9). This download is a bit long on my test system btw, I wonder if configuring the Net::Scp call with :encryption => "none" would help?

It seems also that the documentation is lacking a tiny update regarding the drop of the cache build option that disappeared: wiki/src/contribute/build.mdwn:211 is still talking about it.

Apart the merge review mentioned above and the doc update, I think we’re good to merge it. Feel free to RfQA me again. Congrats, and good bye aufs!

#21 Updated by anonym 2016-04-20 10:56:07

  • Target version changed from Tails_2.3 to Tails_2.4

#22 Updated by intrigeri 2016-04-25 02:02:39

#23 Updated by anonym 2016-05-03 15:35:58

There is a problem in Jessie: the new ruby-fog-libvirt package depends on the ruby-libvirt gem, but the current version of ruby-libvirt in Jessie lacks rubygems integration.

Workaround: install ruby-fog-libvirt from our feature-6354-vagrant-libvirt APT suite (this work because it doesn’t depend on the ruby-libvirt gem).

Real solution: Report a bug saying that we want Debian Bug 795603 fixed in Jessie too.

#24 Updated by anonym 2016-05-04 03:27:26

I think this branch is in a good shape now, but we are slightly blocked by the issue described in Feature #6354#note-23. In total, what remains to get the Debian side of things into a perfect state are:

  1. vagrant-libvirt: include tools/create_box.sh
  2. ruby-libvirt: backport rubygems integration fix to Jessie
  3. vagrant-libvirt: backport to jessie

The first one is not so important (it’s only needed when building a base box, which we do rarely, and the workaround is easy), but the two latter ones are important since the ruby/vagrant situations historically has rapidly changed and caused breakage in testing/sid. We need something stable (literally! :)).

However, I don’t think we need to block on the latter two. I propose we document the workaround from Feature #6354#note-23 + install the Stretch packages in Jessie for now, and get this merged ASAP. IMHO the situation is at least as broken in the VirtualBox-based vagrant build system we have, but for this we at least have a plan to get it fixed.

#25 Updated by intrigeri 2016-05-04 03:31:16

anonym wrote:
> However, I don’t think we need to block on the latter two. I propose we document the workaround from Feature #6354#note-23 + install the Stretch packages in Jessie for now, and get this merged ASAP. IMHO the situation is at least as broken in the VirtualBox-based vagrant build system we have, but for this we at least have a plan to get it fixed.

Agreed, as long as the follow-ups are tracked as subtasks of Feature #7526.

#26 Updated by anonym 2016-05-04 04:04:51

  • Assignee changed from anonym to bertagaz
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:
> Agreed, as long as the follow-ups are tracked as subtasks of Feature #7526.

Done in Bug #11396.

bert, the ball is in your court! Please keep an eye out for Bug #11155!

#27 Updated by intrigeri 2016-05-05 02:48:57

  • Assignee changed from bertagaz to anonym
  • QA Check changed from Ready for QA to Dev Needed

The setup instructions tell me to install qemu-tools, that does not exist AFAICT.

#28 Updated by intrigeri 2016-05-05 02:50:29

Is the full blown dnsmasq package (system daemon) really needed? For most VM setups I’ve seen recently, dnsmasq-base was enough, as the virtualization software runs dnsmasq itself.

#29 Updated by anonym 2016-05-07 12:32:26

  • Assignee changed from anonym to bertagaz
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:
> The setup instructions tell me to install qemu-tools, that does not exist AFAICT.

You must have checked out this branch in the short window where that error was in there.

> Is the full blown dnsmasq package (system daemon) really needed? For most VM setups I’ve seen recently, dnsmasq-base was enough, as the virtualization software runs dnsmasq itself.

Yup, correct. Fixed now, thanks!

Re-assigning to bert, but if you want to take this one, feel free to. If not, just a report from running it on your system would be awesome! :)

#30 Updated by anonym 2016-05-10 12:14:35

  • Status changed from In Progress to Fix committed
  • % Done changed from 88 to 100

Applied in changeset commit:2fc40a67f479172778199bd333cfdf9f1f71dbc6.

#31 Updated by intrigeri 2016-05-11 05:22:59

  • related to Bug #11410: vagrant-libvirt's chosen CPU cannot be emulated on Intel Core i7-4600U added

#32 Updated by intrigeri 2016-05-11 05:46:17

  • related to deleted (Bug #11410: vagrant-libvirt's chosen CPU cannot be emulated on Intel Core i7-4600U)

#33 Updated by intrigeri 2016-05-12 03:31:11

This works fine for me, and I did a code review already, so perhaps we don’t need another one and can mark this as QA check = pass?

#34 Updated by hybridwipe 2016-05-15 21:58:54

I know debian unstable isn’t technically supported, but it’s not working:

austin@debian-desktop:~/src/tails$ rake —trace vm:destroy

Invoke vm:destroy (first_time)

Execute vm:destroy
The provider ‘libvirt’ could not be found, but was requested to
back the machine ‘default’. Please use a provider that exists.
rake aborted!
VagrantCommandError: ‘vagrant [“destroy”, “—force”]’ command failed: 1
/home/austin/src/tails/Rakefile:50:in `run_vagrant’
/home/austin/src/tails/Rakefile:379:in `block (2 levels) in <top (required)>’
/usr/lib/ruby/vendor_ruby/rake/task.rb:240:in `block in execute’
/usr/lib/ruby/vendor_ruby/rake/task.rb:235:in `each’
/usr/lib/ruby/vendor_ruby/rake/task.rb:235:in `execute’
/usr/lib/ruby/vendor_ruby/rake/task.rb:179:in `block in invoke_with_call_chain’
/usr/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize’
/usr/lib/ruby/vendor_ruby/rake/task.rb:172:in `invoke_with_call_chain’
/usr/lib/ruby/vendor_ruby/rake/task.rb:165:in `invoke’
/usr/lib/ruby/vendor_ruby/rake/application.rb:150:in `invoke_task’
/usr/lib/ruby/vendor_ruby/rake/application.rb:106:in `block (2 levels) in top_level’
/usr/lib/ruby/vendor_ruby/rake/application.rb:106:in `each’
/usr/lib/ruby/vendor_ruby/rake/application.rb:106:in `block in top_level’
/usr/lib/ruby/vendor_ruby/rake/application.rb:115:in `run_with_threads’
/usr/lib/ruby/vendor_ruby/rake/application.rb:100:in `top_level’
/usr/lib/ruby/vendor_ruby/rake/application.rb:78:in `block in run’
/usr/lib/ruby/vendor_ruby/rake/application.rb:176:in `standard_exception_handling’
/usr/lib/ruby/vendor_ruby/rake/application.rb:75:in `run’
/usr/bin/rake:27:in `


Tasks: TOP => vm:destroy

austin@debian-desktop:~/src/tails$ rake —trace build

Invoke build (first_time)

Invoke parse_build_options (first_time)

Execute parse_build_options
rake aborted!
VagrantCommandError: ‘vagrant [“status”]’ command failed: 1
/home/austin/src/tails/Rakefile:61:in `capture_vagrant’
/home/austin/src/tails/Rakefile:84:in `vm_state’
/home/austin/src/tails/Rakefile:117:in `enough_free_memory_for_ram_build?’
/home/austin/src/tails/Rakefile:144:in `block in <top (required)>’
/usr/lib/ruby/vendor_ruby/rake/task.rb:240:in `block in execute’
/usr/lib/ruby/vendor_ruby/rake/task.rb:235:in `each’
/usr/lib/ruby/vendor_ruby/rake/task.rb:235:in `execute’
/usr/lib/ruby/vendor_ruby/rake/task.rb:179:in `block in invoke_with_call_chain’
/usr/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize’
/usr/lib/ruby/vendor_ruby/rake/task.rb:172:in `invoke_with_call_chain’
/usr/lib/ruby/vendor_ruby/rake/task.rb:201:in `block in invoke_prerequisites’
/usr/lib/ruby/vendor_ruby/rake/task.rb:199:in `each’
/usr/lib/ruby/vendor_ruby/rake/task.rb:199:in `invoke_prerequisites’
/usr/lib/ruby/vendor_ruby/rake/task.rb:178:in `block in invoke_with_call_chain’
/usr/lib/ruby/2.3.0/monitor.rb:214:in `mon_synchronize’
/usr/lib/ruby/vendor_ruby/rake/task.rb:172:in `invoke_with_call_chain’
/usr/lib/ruby/vendor_ruby/rake/task.rb:165:in `invoke’
/usr/lib/ruby/vendor_ruby/rake/application.rb:150:in `invoke_task’
/usr/lib/ruby/vendor_ruby/rake/application.rb:106:in `block (2 levels) in top_level’
/usr/lib/ruby/vendor_ruby/rake/application.rb:106:in `each’
/usr/lib/ruby/vendor_ruby/rake/application.rb:106:in `block in top_level’
/usr/lib/ruby/vendor_ruby/rake/application.rb:115:in `run_with_threads’
/usr/lib/ruby/vendor_ruby/rake/application.rb:100:in `top_level’
/usr/lib/ruby/vendor_ruby/rake/application.rb:78:in `block in run’
/usr/lib/ruby/vendor_ruby/rake/application.rb:176:in `standard_exception_handling’
/usr/lib/ruby/vendor_ruby/rake/application.rb:75:in `run’
/usr/bin/rake:27:in `


Tasks: TOP => build => parse_build_options

Should this be a separate issue?

#35 Updated by intrigeri 2016-05-16 01:44:02

> Should this be a separate issue?

Yes, please: this one is “fix committed” :)

#36 Updated by hybridwipe 2016-05-17 07:07:26

Sure. I mostly wasn’t sure if I should file one for unstable ;)

https://labs.riseup.net/code/issues/11425

#37 Updated by bertagaz 2016-06-06 04:42:25

  • Assignee deleted (bertagaz)
  • QA Check changed from Ready for QA to Pass

intrigeri wrote:
> This works fine for me, and I did a code review already, so perhaps we don’t need another one and can mark this as QA check = pass?

I’ve finally managed to sit down long enough to test it, and it works fine here too. So marking this ticket as done, congrats!

#38 Updated by anonym 2016-06-08 01:23:02

  • Status changed from Fix committed to Resolved