Feature #8656

Write code that generates a set of Jenkins jobs for all branches we want to automatically build

Added by intrigeri 2015-01-09 17:41:48 . Updated 2015-08-05 09:42:32 .

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Continuous Integration
Target version:
Start date:
2015-01-09
Due date:
2015-07-15
% Done:

100%

Feature Branch:
Type of work:
Sysadmin
Blueprint:

Starter:
Affected tool:
Deliverable for:
266

Description

(Using the results of the Feature #8655 discussion, thanks to the code from Feature #8657.)


Files


Subtasks


Related issues

Related to Tails - Feature #8482: Survey usual committers about dropping the collective pseudonym for signing commits Resolved 2014-12-23 2015-01-30
Blocked by Tails - Feature #8657: Write library code that implements the branch to build parameters mapping Resolved 2015-01-09 2015-07-15
Blocks Tails - Feature #8658: Deploy the "build all active branches" system Resolved 2015-01-09

History

#1 Updated by intrigeri 2015-01-09 17:41:58

  • blocked by Feature #8655: Lead the discussion to specify what kind of branch needs to be automatically built, and when added

#2 Updated by intrigeri 2015-01-09 17:42:36

  • Description updated

#3 Updated by intrigeri 2015-01-09 17:44:54

  • blocks deleted (Feature #8655: Lead the discussion to specify what kind of branch needs to be automatically built, and when)

#4 Updated by intrigeri 2015-01-09 17:45:47

  • blocked by Feature #8657: Write library code that implements the branch to build parameters mapping added

#5 Updated by intrigeri 2015-01-09 17:46:07

  • Description updated

#6 Updated by intrigeri 2015-01-09 17:48:55

  • blocks Feature #8658: Deploy the "build all active branches" system added

#7 Updated by intrigeri 2015-01-09 17:49:27

  • related to Feature #8482: Survey usual committers about dropping the collective pseudonym for signing commits added

#8 Updated by bertagaz 2015-01-10 06:28:58

  • Target version changed from Tails_1.4 to Tails_1.3

#9 Updated by intrigeri 2015-01-10 10:10:55

See e.g. http://anonscm.debian.org/cgit/qa/jenkins.debian.net.git/tree/job-cfg/lvc.yaml.py for an example of how to generate jobs programmatically.

#10 Updated by bertagaz 2015-01-10 17:09:06

  • Target version changed from Tails_1.3 to Tails_1.3.2

#11 Updated by bertagaz 2015-03-14 15:08:46

Here’s a firt rough draft of what I came up with. I tried to use a more pythonic way of generating the yaml, but I’d like to have reviews whether it makes sense, and if I should go on this way.

#12 Updated by intrigeri 2015-03-16 19:52:44

Please put that script into some appropriate Git repo, to ease the next rounds of review.

First of all, the interface between the library code DrWhax is writing (Feature #8657) and the program you’re writing here was designed and specified at some point. Where can I find the documented outcome of that work? Maybe DrWhax still has it locally and never published it anywhere? My memory isn’t that good, and having the big picture’s documentation handy would help me remember what we’re trying to do here, and how.

Anyway, here’s a bunch of initial comments :)

First, the general idea seems totally sane to me. IIRC that’s indeed what we want.

I’m not convinced by the way the “template” (which actually isn’t really a template, since it only contains the static parts, and not the bits we’ll customize on a per-branch basis) is loaded, and then incrementally fed with data. I’m more used to the good old:

  1. compute and stash the needed data;
  2. load a template written in some template language (whatever you prefer, as long as its special chars don’t conflict too much with the actual template content we want, that’s itself in YAML; it would be too painful if we have to escape too much stuff);
  3. pass the stashed data to the template engine, that will return something useful.

… which separates computing/stashing data from feeding it into the template engine.

I also have lots of comments on your OO design, if you’re interested. Here are a few ones about the most important bits.

E.g. it feels wrong that the class constructor doesn’t take the branch name as an argument, since that’s precisely the most important bit that parameterizes its behaviour. If we wanted to reuse the same object to generate job configuration for several branches in a row, then it would make sense to keep the current design, but 1. this object’s construction is cheap; and 2. the current code does not reuse the object => IMO it’s not worth having to manage internal state and moving pieces. KISS! :)

So, I think the constructor should take the following parameters:

  • the branch name (compulsory), stored as a class attribute;
  • optionally, the template data as a string (not turned by yaml.safe_load into a data structure), also stored as a class attribute; this attribute should have a default value (the one you currently store in a constant outside of the class), and merely allow overriding it by passing a different one to the class constructor.

Also, once instantiated, an object of this class is currently useless, in an incomplete state, unless one calls the configure method, that is currently used to compute everything we need and store it in the job class attribute. This feels wrong to me. The way I would love to use this class is:

job_config   = TailsIsoJobConfig(branch_build_data)
job_filename = "%s.yaml" % branch['job_name']
with open(job_filename, 'w') as job_file:
    job_file.write(job_config.dump())

And then, you can decide whether you do the actual work of computing data in the constructor, or in the dump method. The way I would do it in my preferred OO framework is with a set of lazy attributes (name, git_config and so on), so that the needed data is computed on demand (be it when the dump method accesses them, or when something else, e.g. unit tests, do). No idea if/how that can be done in Python. The fallback would be to simply do all that work in a dedicated method (much alike your current configure), that could be called either by hand (e.g. for unit testing) or from the dump method (conditionally, iff. the expected data wasn’t computed yet — memoization would make this nicer, BTW).

In any case, feeding the computed data into the template engine, to get a string as a result, should IMO be done in the dump or __str__ method, since that’s very much part of their job as I see it.

If you want to keep the fancy __str__ trick, then one would instead write:

    job_file.write(str(job_config))

#13 Updated by bertagaz 2015-03-20 12:14:28

intrigeri wrote:
> Please put that script into some appropriate Git repo, to ease the next rounds of review.

I agree a git repo would be nice, I just have hard time finding where it belongs. I see two possibilities:

# In the puppet lizard manifests repo
# In the Tails pythonlib repo, in a `bin/` subdir

2. sounds more meaningful, as it would help to track compatibility between the pythonlib and the script.

> First of all, the interface between the library code DrWhax is writing (Feature #8657) and the program you’re writing here was designed and specified at some point. Where can I find the documented outcome of that work? Maybe DrWhax still has it locally and never published it anywhere? My memory isn’t that good, and having the big picture’s documentation handy would help me remember what we’re trying to do here, and how.

Right, I don’t have a copy of it myself. Will ping DrWhax for that.

> Anyway, here’s a bunch of initial comments :)
>
> First, the general idea seems totally sane to me. IIRC that’s indeed what we want.
>
> I’m not convinced by the way the “template” (which actually isn’t really a template, since it only contains the static parts, and not the bits we’ll customize on a per-branch basis) is loaded, and then incrementally fed with data. I’m more used to the good old:

Renamed as `snippet` in a more recent version yet to push somewhere.

> # compute and stash the needed data;
> # load a template written in some template language (whatever you prefer, as long as its special chars don’t conflict too much with the actual template content we want, that’s itself in YAML; it would be too painful if we have to escape too much stuff);
> # pass the stashed data to the template engine, that will return something useful.
>
> … which separates computing/stashing data from feeding it into the template engine.

I agree the template (now snippet) string is not really a template in a sense where it’s the base of a job yaml on which we append the missing parts to get a full job yaml file. What is closer to a template here is the BuildTailsISOJob class by itself, which internal representation fits a yaml job one as per the yaml python module, as it has the ability to dump an object as a yaml file (see the python yaml module documentation for the object <-> yaml binding).

Appending in this context is far more easy than using a full templating system. Keep in mind that the BuildTailsISOJob object has mainly one attribute, `self.job`, which is a dict of the job yaml first level parameters. Its is not really an object with as many attributes as parameters in the yaml file, there’s not really a full recursive introspection.

Using a templating system will also bring more dependencies and as you say will probably add a lot of complexity because of the special chars. We don’t have that much parameters to change in the job yaml file, so I thought appending would help in having a KISS script.

> I also have lots of comments on your OO design, if you’re interested. Here are a few ones about the most important bits.
>
> E.g. it feels wrong that the class constructor doesn’t take the branch name as an argument, since that’s precisely the most important bit that parameterizes its behaviour. If we wanted to reuse the same object to generate job configuration for several branches in a row, then it would make sense to keep the current design, but 1. this object’s construction is cheap; and 2. the current code does not reuse the object => IMO it’s not worth having to manage internal state and moving pieces. KISS! :)
>
> So, I think the constructor should take the following parameters:
>
> * the branch name (compulsory), stored as a class attribute;
> * optionally, the template data as a string (not turned by yaml.safe_load into a data structure), also stored as a class attribute; this attribute should have a default value (the one you currently store in a constant outside of the class), and merely allow overriding it by passing a different one to the class constructor.

Ack, makes sense.

But there’s a limitation there. We can’t really add attributes to this class as we want, or they’ll end up in the yaml output. Also if we add some that aren’t expected in the yaml, we’d break the object <-> yaml binding. But we can use local variables in the class.

> Also, once instantiated, an object of this class is currently useless, in an incomplete state, unless one calls the configure method, that is currently used to compute everything we need and store it in the job class attribute. This feels wrong to me. The way I would love to use this class is:
>
> […]
>
> And then, you can decide whether you do the actual work of computing data in the constructor, or in the dump method. The way I would do it in my preferred OO framework is with a set of lazy attributes (name, git_config and so on), so that the needed data is computed on demand (be it when the dump method accesses them, or when something else, e.g. unit tests, do). No idea if/how that can be done in Python. The fallback would be to simply do all that work in a dedicated method (much alike your current configure), that could be called either by hand (e.g. for unit testing) or from the dump method (conditionally, iff. the expected data wasn’t computed yet — memoization would make this nicer, BTW).
>
> In any case, feeding the computed data into the template engine, to get a string as a result, should IMO be done in the dump or __str__ method, since that’s very much part of their job as I see it.

As said above, we can’t really write e.g `self.job.name` and use fancy setters for job attributes.
I guess it makes more sense to have the BuildTailsISOJob instances being a full representation of the job yaml right at the initialization, rather than in a class method. So I believe we should configure it from branch in the `init` rather than `str` method. After all, `str` is just the string representation of the object.

#14 Updated by intrigeri 2015-03-20 22:42:22

> I agree a git repo would be nice, I just have hard time finding where it belongs. I see two possibilities:

> # In the puppet lizard manifests repo
> # In the Tails pythonlib repo, in a `bin/` subdir

> 2. sounds more meaningful, as it would help to track compatibility between the pythonlib and the script.

I dislike starting to stuff scripts into a library repo for various reasons. And the compatibility argument feels like a fine recipe for encouraging bad practices to me: a library implements an API, and should avoid breaking it (possibly by providing a transition period with deprecation warnings) so that we’re encouraged to use it more, and in turn to put more useful stuff in it. If we start shipping consumers of that API in the very same repo, then it seems clear to me that we won’t be able to guarantee anything wrt. API stability, and in the end that “library” will simply be a (nicely factored out) part of the script you’re writing. As long as the only code in that lib is about Git branches stuff your script needs, this may be just fine. But I hope we’ll put more stuff in there as we go, and then API stability starts to be important. E.g. we’ll soon (hopefully) have another consumer for that library, namely the Git branches deletion script.

I also happen to dislike the Puppet lizard manifests repo idea: it’s currently private, won’t be published soon (and actually, if/when we start using Hiera, we may actually not need to publish it at all), and of course there’s no reason why the code you’re writing should be hosted in a private place.

Now, this topic raises the question of where we’re actually going to run this code. Apparently we can’t use multiple source repos for jenkins-job-builder, especially since we want it delete any existing job that it doesn’t know about. So, the generated jobs need to land in our existing jenkins-jobs repo somehow. Also, as I see it we should track in a VCS the generated jobs, as a way to easily extract stats later on, and to debug stuff as needed. So we should have a cronjob that:

  • runs git rm generated_*.yaml in our jenkins-jobs repo
  • generates the jobs we want to generated_*.yaml
  • runs git add generated_*.yaml in our jenkins-jobs repo
  • commits and pushes

This cronjob could run in basically any of our VMs that we trust enough to have read-write access to our jenkins-jobs repo. Possibly jenkins.lizard (probably the most logical option), possibly elsewhere, I’m not sure.

IMO this calls for creating a tails::jenkins::jobs_generator Puppet class in our tails module, that could host this cronjob and the script you’re writing. That class could be declared on jenkins.lizard or elsewhere more appropriate.

Makes sense? What do you think?

(I’ll look into your apparently in-depth answer to my code review later.)

#15 Updated by bertagaz 2015-03-21 10:22:05

intrigeri wrote:
> > I agree a git repo would be nice, I just have hard time finding where it belongs. I see two possibilities:
>
> > # In the puppet lizard manifests repo
> > # In the Tails pythonlib repo, in a `bin/` subdir
>
> > 2. sounds more meaningful, as it would help to track compatibility between the pythonlib and the script.
>
> I dislike starting to stuff scripts into a library repo for various reasons. And the compatibility argument feels like a fine recipe for encouraging bad practices to me: a library implements an API, and should avoid breaking it (possibly by providing a transition period with deprecation warnings) so that we’re encouraged to use it more, and in turn to put more useful stuff in it. If we start shipping consumers of that API in the very same repo, then it seems clear to me that we won’t be able to guarantee anything wrt. API stability, and in the end that “library” will simply be a (nicely factored out) part of the script you’re writing. As long as the only code in that lib is about Git branches stuff your script needs, this may be just fine. But I hope we’ll put more stuff in there as we go, and then API stability starts to be important. E.g. we’ll soon (hopefully) have another consumer for that library, namely the Git branches deletion script.

Makes sense. Sure the compatibility arg is not a good one. It’s just that it happens that library repos ship a bin/ folder with scripts, so I was wondering. But I get your point of view.

> I also happen to dislike the Puppet lizard manifests repo idea: it’s currently private, won’t be published soon (and actually, if/when we start using Hiera, we may actually not need to publish it at all), and of course there’s no reason why the code you’re writing should be hosted in a private place.

My bad, I’ve been fuzzy when talking about the specific lizard manifest, when I was thinking about “somewhere in a puppet manifest”.

> Now, this topic raises the question of where we’re actually going to run this code. Apparently we can’t use multiple source repos for jenkins-job-builder, especially since we want it delete any existing job that it doesn’t know about. So, the generated jobs need to land in our existing jenkins-jobs repo somehow. Also, as I see it we should track in a VCS the generated jobs, as a way to easily extract stats later on, and to debug stuff as needed.

Yes, and considering our jobs are automatically deployed when we push on the jenkins-jobs repo, we sort of have to use it to store our automatically generated jobs. Well, we could maybe use a git submodule and all, but that sounds over-engineering to me, and brings some fragility in the deployment probably.

> So we should have a cronjob that:
>
> * runs git rm generated_*.yaml in our jenkins-jobs repo
> * generates the jobs we want to generated_*.yaml
> * runs git add generated_*.yaml in our jenkins-jobs repo
> * commits and pushes
>
> This cronjob could run in basically any of our VMs that we trust enough to have read-write access to our jenkins-jobs repo. Possibly jenkins.lizard (probably the most logical option), possibly elsewhere, I’m not sure.
>
> IMO this calls for creating a tails::jenkins::jobs_generator Puppet class in our tails module, that could host this cronjob and the script you’re writing. That class could be declared on jenkins.lizard or elsewhere more appropriate.
>
> Makes sense? What do you think?

You’re fast, that’s the next question I wanted to raise. :)

We have IMO two options:

# run the script with a cronjob
# run the script in a git hook on the Tails repo on puppet-git

I’m not sure which one is the best option, which one integrates best with jenkins.

I agree to start with a cronjob. If we make the script generic enough (having two arguments: a path to a jenkins-job repo, a path to a Tails repo), we can later switch to the hook option if the cronjob one raises issues anyway.

> (I’ll look into your apparently in-depth answer to my code review later.)

No rush, take your time.

#16 Updated by bertagaz 2015-03-30 12:14:22

  • % Done changed from 10 to 20

The code has been published in the Tails puppet module (https://git-tails.immerda.ch/puppet-tails/), as `files/jenkins/master/generate_build_tails_iso_jobs`, and reviewed live.

It is now using the Tails python lib written in Feature #8657 to get the list of branches (in commit 3231996). This can help to test the library code while writing it.

To do so, one just have to

cp -a /path/to/tails_pythonlib/tailslib /usr/local/lib/python3/dist-packages/

To run the script, one has to pass two arguments:

./generate_build_tails_iso_jobs /path/to/tails.git/.git /path/to/jenkins-jobs/output

DrWhax, at line 72 of the script, you’ll find an example of what it expects as input data and format.

#17 Updated by bertagaz 2015-04-01 13:04:29

  • Target version changed from Tails_1.3.2 to Tails_1.4

#18 Updated by intrigeri 2015-05-09 02:06:52

  • Target version changed from Tails_1.4 to Tails_1.4.1

Postponing.

#19 Updated by intrigeri 2015-05-29 11:36:55

  • blocks #8668 added

#20 Updated by intrigeri 2015-06-18 07:32:41

What’s the status here? Is the code ready to be reviewed?

#21 Updated by bertagaz 2015-06-18 08:26:52

  • Assignee changed from bertagaz to intrigeri

intrigeri wrote:
> What’s the status here? Is the code ready to be reviewed?

As stated in note 16:

bertagaz wrote:
> The code has been published in the Tails puppet module (https://git-tails.immerda.ch/puppet-tails/), as `files/jenkins/master/generate_build_tails_iso_jobs`, and reviewed live.

AFAIK, this ticket is concerning the generate_build_tails_iso_jobs that you have reviewed already when we were in the same room. The rest of the code (cron wrapper and inclusion in a manifest) is being implemented in Feature #8658 and its branch in the puppet-tails git repo. It still needs a bit of work though.

So maybe this ticket can be considered as done I guess.

#22 Updated by intrigeri 2015-06-18 08:35:27

> So maybe this ticket can be considered as done I guess.

Seems to make sense. Note that it’s marked as blocked by Feature #8657, which is itself blocked by Feature #8655 => I’ll let you update all of those together.

#23 Updated by intrigeri 2015-06-18 08:35:35

  • Assignee changed from intrigeri to bertagaz

#24 Updated by bertagaz 2015-06-18 12:13:16

  • Status changed from In Progress to Resolved
  • Assignee deleted (bertagaz)
  • % Done changed from 20 to 100

Ok, I guess we’re at the stage where we can consider this tickets closed, and (hopefully not) reopen some if we hit some bugs.

#25 Updated by sajolida 2015-08-05 09:42:32

  • Due date set to 2015-07-15