Bug #8506

Fix virtual network problems on lizard

Added by bertagaz 2015-01-01 12:59:16 . Updated 2016-03-12 10:09:23 .

Status:
Resolved
Priority:
Elevated
Assignee:
Category:
Infrastructure
Target version:
Start date:
2015-01-01
Due date:
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
0
Affected tool:
Deliverable for:

Description

When using this kernel, the firewalling configuration seems messy for the less, resulting in quite random virtual network problems. We face bugs in the DNAT where VMs supposed to be reachable from the outside ain’t anymore, or situations where the VMs can’t get a DHCP lease.

There is probably some conflicts between our shorewall iptables rules and libvirt’s ones, but that’s surprising to hit this bug only on this kernel.


Subtasks


History

#1 Updated by bertagaz 2015-01-01 12:59:36

  • Priority changed from Normal to Elevated

#2 Updated by bertagaz 2015-01-01 13:01:46

  • related to Feature #7631: Get a server able to run our automated test suite added

#3 Updated by bertagaz 2015-01-01 15:10:16

After some investigations, it seems that our shorewall rules are blocking the dhcp leases for ou VMs. So if it starts too soon, the VMs can’t have an IP.

It seems it was a known problem with the old kernel too, although I’ve rebooted it several time without seeing any noticeable troubles like that.

If shorewall is started late enough, after the VMs completed their DHCP lease, then there are chances that they can’t get it after it expires. That never happened with the Wheezy kernel.

In the next hours/days, we’ll have to monitor if this last scenario happens, and if necessary adapt our shorewall configuration to accept DHCP traffic for the VMs.

#4 Updated by intrigeri 2015-01-13 12:56:16

  • Subject changed from Investigate virtual network problems when booting the Wheezy-backports kernel to Fix virtual network problems when booting the Wheezy-backports kernel
  • % Done changed from 0 to 50

Status update: bertagaz has DHCP and mangle table firewall rules that fix the problem.

#5 Updated by intrigeri 2015-01-13 12:57:52

  • Target version changed from Tails_1.2.3 to Tails_1.3

#6 Updated by BitingBird 2015-01-13 16:33:15

  • Status changed from Confirmed to In Progress

#7 Updated by bertagaz 2015-01-14 15:05:32

  • Assignee changed from bertagaz to intrigeri

We discussed an implementation which was to patch the shorewall shared puppet module, and add an $accept_dhcp option to shorewall::rules::libvirt::host, that would by default be true.

But I wonder if we shouldn’t rather enable this if $shorewall::interface::vmz::dhcp is set to true. It would IMO make more sense than an option to the libvirt::host class.

#8 Updated by intrigeri 2015-01-14 16:12:28

bertagaz wrote:
> But I wonder if we shouldn’t rather enable this if $shorewall::interface::vmz::dhcp is set to true. It would IMO make more sense than an option to the libvirt::host class.

As discussed elsewhere, in shorewall::rules::libvirt::host we can’t access parameters that were passed to Shorewall::Interface['virbr0'], so this doesn’t work.

#9 Updated by bertagaz 2015-01-14 16:31:46

  • Assignee changed from intrigeri to bertagaz

#10 Updated by intrigeri 2015-01-15 03:54:47

  • related to deleted (Feature #7631: Get a server able to run our automated test suite)

#11 Updated by intrigeri 2015-01-15 03:54:58

#12 Updated by bertagaz 2015-01-15 14:56:33

  • Assignee changed from bertagaz to intrigeri

After some “commit and test” rounds, I’m stuck with some limitations of our shorewall version (4.5.5.3-3)

According to http://shorewall.net/manpages/shorewall-mangle.html, the `mangle` file responsible for managing the mangle table has been introduced in version 4.6.0 only.

Before that, the recommendation was to use the `tcrules` files, but according to http://shorewall.net/manpages/shorewall-tcrules.html, this one supports the CHECKSUM target only since version 4.5.9.

So I’m not sure we’ll be able to fix this issue using our current shorewall version. Debian Jessie will ship version 4.6.x, so it will be easier to cook some puppet recipes for that. In the meantime, I guess we’ll have to find another solution. Not sure how to implement that though.

#13 Updated by bertagaz 2015-01-15 15:41:20

Sounds like we could use the extension scripts feature of shorewall (http://shorewall.net/shorewall_extension_scripts.htm) and drop a file which would add the rules we need after it starts. That would be a temporary solution until we upgrade to Jessie. Thoughts?

#14 Updated by intrigeri 2015-01-15 15:59:50

At least at the APT level, apt-get install shorewall/jessie shorewall-core/jessie iptables/wheezy works just fine in a Wheezy chroot, so I doubt that the version of shorewall in Wheezy is a real blocker. Not tested if the resulting shorewall installation works, and I’d rather see this tested (e.g. in a VM) before deploying on lizard, though.

#15 Updated by bertagaz 2015-01-15 16:11:17

  • Assignee changed from intrigeri to bertagaz

intrigeri wrote:
> Not tested if the resulting shorewall installation works, and I’d rather see this tested (e.g. in a VM) before deploying on lizard, though.

Awesome, sounds like this ticket will hardly be closed, as I’m unsure about when I’ll have time to install a Wheezy VM, install our shorewall config there, an test the upgrade. I admit I’m a bit afraid of all the issues such an upgrade would bring, given the changes between shorewall 4.5 and 4.6. We’ll see…

#16 Updated by intrigeri 2015-01-15 16:17:12

> Awesome, sounds like this ticket will hardly be closed,

I sure hope it will!

> as I’m unsure about when I’ll have time to install a Wheezy VM, install our shorewall config there, an test the upgrade.

I don’t think it’s required to actually test our actual shorewall config. Just testing that basic functionality works should be enough.
And oh well, if you can’t do that, then go ahead, cross fingers, and hopefully we won’t need OOB access.

> I admit I’m a bit afraid of all the issues such an upgrade would bring, given the changes between shorewall 4.5 and 4.6. We’ll see…

Well:

  1. I’m using the same Puppet stuff to manage shorewall on sid, and it works fine.
  2. We’ll have to go through this upgrade soon anyway, when upgrading to Jessie, so it’s not as if it was wasted time or unnecessary risks.

#17 Updated by bertagaz 2015-02-25 09:53:57

  • Target version changed from Tails_1.3 to Tails_1.3.2

Wasn’t able to complete that, differing for the next release…

#18 Updated by bertagaz 2015-02-26 19:22:27

  • Assignee changed from bertagaz to intrigeri
  • QA Check set to Ready for QA

I’ve deployed a fix by upgrading shorewall and using it to configure the missing firewall rule to fill the checksum of DHCP requests on virbr0. Now the VMs don’t loose their network anymore.

If you’re satisfied with the fix, I’d be in favor to close this ticket and open a new one to upstream the changes in the shorewall shared puppet module. Any directions regarding doing so will help. Shall I request a pull on this redmine instance or on gitlab?

#19 Updated by intrigeri 2015-02-26 21:16:21

  • Assignee changed from intrigeri to bertagaz
  • % Done changed from 50 to 80
  • QA Check changed from Ready for QA to Dev Needed

bertagaz wrote:
> I’ve deployed a fix by upgrading shorewall and using it to configure the missing firewall rule to fill the checksum of DHCP requests on virbr0. Now the VMs don’t loose their network anymore.

Woohoo!

I’ve reviewed it and it looks all right except two minor code formatting nitpicking notes:

Wrong equal signs indentation in:

   $vmz           = 'vmz',
   $masq_iface    = 'eth0',
   $debproxy_port = 8000,
+  $accept_dhcp = true,
+  $vmz_iface = 'virbr0',

And a issues in shorewall::mangle (missing space around the equal sign + wrong indentation). Granted, many other files in that module have the same problem, but it’s no reason to introduce more :)

> If you’re satisfied with the fix, I’d be in favor to close this ticket and open a new one to upstream the changes in the shorewall shared puppet module. Any directions regarding doing so will help. Shall I request a pull on this redmine instance or on gitlab?

GitLab is the way to go for the shared modules nowadays. See the “review process” thread from last December on https://lists.riseup.net/www/arc/shared-modules/2014-12/ for the details. Please also consider subscribing to that mailing-list.

#20 Updated by bertagaz 2015-02-26 22:18:57

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:
> I’ve reviewed it and it looks all right except two minor code formatting nitpicking notes:

Force pushed and deployed fixes for these.

> GitLab is the way to go for the shared modules nowadays. See the “review process” thread from last December on https://lists.riseup.net/www/arc/shared-modules/2014-12/ for the details. Please also consider subscribing to that mailing-list.

Thanks! Seems I was naturally following that process. :)

#21 Updated by bertagaz 2015-02-27 14:47:48

Opened two merge requests to upstream our change at https://gitlab.com/shared-puppet-modules-group/shorewall

#22 Updated by intrigeri 2015-02-28 13:34:38

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 80 to 100
  • QA Check changed from Ready for QA to Pass

Wooo!

#23 Updated by BitingBird 2015-03-22 12:11:05

  • Target version changed from Tails_1.3.2 to Tails_1.3.1

#24 Updated by intrigeri 2016-03-12 10:09:23

  • Subject changed from Fix virtual network problems when booting the Wheezy-backports kernel to Fix virtual network problems on lizard