Bug #12579

Feature #5630: Reproducible builds

reproducibly_build_Tails_ISO_* Jenkins job are broken

Added by intrigeri 2 months ago. Updated about 2 months ago.

Status:ResolvedStart date:05/22/2017
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:Continuous Integration
Target version:Tails_3.0
QA Check:Pass Blueprint:
Feature Branch: Easy:
Type of work:Sysadmin Affected tool:

Description

It would be nice to have CI again for reproducible builds, given we would like 3.0 to be reproducible (BTW I'm going to create a similar job for feature/stretch).

See e.g. https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-5630-deterministic-builds/1/console: it seems that mv tails-* build-artifacts/ should be adjusted to the place where artifacts now land.


Related issues

Related to Tails - Bug #12599: /var/lib/libvirt/images gets filled on isobuilders Resolved 05/25/2017

History

#1 Updated by intrigeri 2 months ago

Two notes:

#2 Updated by intrigeri 2 months ago

  • Subject changed from reproducibly_build_Tails_ISO_feature-5630-deterministic-builds Jenkins job is broken to reproducibly_build_Tails_ISO_* Jenkins job is broken

#3 Updated by intrigeri 2 months ago

  • Subject changed from reproducibly_build_Tails_ISO_* Jenkins job is broken to reproducibly_build_Tails_ISO_* Jenkins job are broken

#4 Updated by intrigeri 2 months ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10

intrigeri wrote:

it seems that mv tails-* build-artifacts/ should be adjusted to the place where artifacts now land.

At least that part has been fixed :)

#5 Updated by bertagaz 2 months ago

intrigeri wrote:

Two notes:

  • See commit 1b319e879c50eda576d4971f4521b164e477ac5e in puppet-tails.

I raised the diffoscope options because of this result which didn't sound meaningful: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-5630-deterministic-builds/3/artifact/build-artifacts/tails-diffoscope.html. I didn't see other results before though.

Yes, I've seen that. I'm a bit surprised as the script is set -e. I'll workaround that. Also it kinda trigger a memory of mine were you were complaining during the sprint about the diffoscope version in Debian. I wonder if we should try the one in experimental, that contains an item in the changelog mentioning Tails (saying it's faster for us now).

it seems that mv tails-* build-artifacts/ should be adjusted to the place where artifacts now land.

Yes, but I still need to do some polishing here.

#6 Updated by intrigeri 2 months ago

Yes, I've seen that. I'm a bit surprised as the script is set -e. I'll workaround that. Also it kinda trigger a memory of mine were you were complaining during the sprint about the diffoscope version in Debian. I wonder if we should try the one in experimental, that contains an item in the changelog mentioning Tails (saying it's faster for us now).

Yes, please :)

#7 Updated by intrigeri 2 months ago

I raised the diffoscope options because of this result which didn't sound meaningful: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-5630-deterministic-builds/3/artifact/build-artifacts/tails-diffoscope.html.

Note that in most cases, a more complete binary diff of the ISO file itself provides essentially no value: the useful info will likely be about the content of the ISO and SquashFS. This output feels incomplete though, but I doubt raising the diff numbers will fix it (I might be wrong though).

#9 Updated by bertagaz 2 months ago

intrigeri wrote:

Here's a different and interesting failure mode: https://jenkins.tails.boum.org/job/reproducibly_build_Tails_ISO_feature-stretch/18/console

It seems to have happened sometimes already as indicated in the Jenkins builds logs. I bet the vagrant box is paused at some point, and the probable cause is lack of disk space either in /var/lib/jenkins or /var/lib/libvirt/images. The later seems to be most likely considering we host more baseboxes with the recent change in #12409#note-34. I'll have a look.

#10 Updated by bertagaz 2 months ago

intrigeri wrote:

Yes, I've seen that. I'm a bit surprised as the script is set -e. I'll workaround that. Also it kinda trigger a memory of mine were you were complaining during the sprint about the diffoscope version in Debian. I wonder if we should try the one in experimental, that contains an item in the changelog mentioning Tails (saying it's faster for us now).

Yes, please :)

I've installed it by hand on isobuilder2 to test it. It leads to two conclusions: we'll need more space in the system partition, as this version pulls a lot more packages, and we'll probably need to mount /tmp/ as a tmpfs, as this version fails to run because it lacks disk space, as shown here

#11 Updated by bertagaz 2 months ago

  • Related to Bug #12599: /var/lib/libvirt/images gets filled on isobuilders added

#12 Updated by intrigeri 2 months ago

we'll need more space in the system partition, as this version pulls a lot more packages,

ACK

and we'll probably need to mount /tmp/ as a tmpfs, as this version fails to run because it lacks disk space, […]

Can't we point its TMPDIR to some place that already has enough disk space, e.g. in the workspace of the current Jenkins job?

Rationale: I'd rather not invest RAM into this yet — we're short on RAM, and these jobs don't run that often, so most of the time the added memory would be wasted. If we ever need to optimize I/O for diffoscope, we can reconsider (as part of #11680), but let's make it work first, and think about making it faster later, if needed.

#13 Updated by bertagaz 2 months ago

intrigeri wrote:

Can't we point its TMPDIR to some place that already has enough disk space, e.g. in the workspace of the current Jenkins job?

Rationale: I'd rather not invest RAM into this yet — we're short on RAM, and these jobs don't run that often, so most of the time the added memory would be wasted. If we ever need to optimize I/O for diffoscope, we can reconsider (as part of #11680), but let's make it work first, and think about making it faster later, if needed.

I'll investigate if diffoscope respects TMPDIR, but note that it does not necessary means having to buy more RAM: we already have around 14G of it assigned to isobuilders, that are not used when diffoscope runs, so it may be that we don't have to add some. But I agree my proposal may be overkill anyway. Let's try yours.

#14 Updated by intrigeri 2 months ago

I'll investigate if diffoscope respects TMPDIR

IIRC it does but I might be confused :)

but note that it does not necessary means having to buy more RAM: we already have around 14G of it assigned to isobuilders, that are not used when diffoscope runs, so it may be that we don't have to add some.

Good news :)

But I agree my proposal may be overkill anyway. Let's try yours.

Well, with this info in hand: whatever, pick the one that's easiest to implement :)

#15 Updated by bertagaz 2 months ago

  • Assignee changed from bertagaz to intrigeri
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

intrigeri wrote:

I've pushed fixes in puppet-tails' master branch (referencing this ticket), that installs diffoscope from experimental on all isobuilders and make it so that the build fails if diffoscope doesn't report success. I think that was the last issue of this ticket, the others (disk space issues) are already tracked by #12574, #12595 or #12599, so let's put that ticket RfQA.

#16 Updated by bertagaz 2 months ago

I forgot to mention it has run there already

#17 Updated by intrigeri 2 months ago

I forgot to mention it has run there already

\o/

#18 Updated by intrigeri 2 months ago

  • QA Check changed from Ready for QA to Info Needed

I've pushed fixes in puppet-tails' master branch (referencing this ticket),

Great, thanks :)

that installs diffoscope from experimental on all isobuilders

  • I've pushed commit:ebb0b29 on top.
  • Why do we need to pin all packages from experimental to 100?
  • A few days ago I also did commit 26353652539a31902734a0ab19386c12e875a131 in the jenkins-jobs repo, but that's not enough for the --html-dir to be archived. So I did 42738a9 there again. If that doesn't work either I'll simply do --html-dir "${ARTIFACTS_DIR}". I'll track & handle this, not a blocker for this ticket.

and make it so that build fails if diffoscope doesn't report success.

Looks great.

I think that was the last issue of this ticket, the others (disk space issues) are already tracked by #12574, #12595 or #12599, so let's put that ticket RfQA.

OK! Do you think we can close this ticket once the single question above is addressed (we can still reopen it if we notice issues specific to these jobs i.e. that don't happen on build_Tails_ISO_* again)? Or mark it as blocked by the tickets that track root causes of failures, so we don't close it until it's fully resolved in practice?

#19 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to bertagaz
  • % Done changed from 50 to 70

#20 Updated by intrigeri 2 months ago

I've also pushed [master ee1d87d] Reproducible ISO builds: clean old baseboxes before building (refs: #12579). to jenkins-jobs.git, let's see if this helps. Let me know if there was a good reason not to do it, and sorry if that's the case!

#21 Updated by bertagaz about 2 months ago

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Info Needed to Ready for QA

intrigeri wrote:

  • Why do we need to pin all packages from experimental to 100?

I didn't know there was a default pining for experimental. Pushed a commit that will remove it from isobuilders. Will remove this lines later.

I think that was the last issue of this ticket, the others (disk space issues) are already tracked by #12574, #12595 or #12599, so let's put that ticket RfQA.

OK! Do you think we can close this ticket once the single question above is addressed (we can still reopen it if we notice issues specific to these jobs i.e. that don't happen on build_Tails_ISO_* again)? Or mark it as blocked by the tickets that track root causes of failures, so we don't close it until it's fully resolved in practice?

I think we can close it, and re-open when/if we stumble upon a new issue specific to the reproducible builds setup in Jenkins (or open a new one).

#22 Updated by intrigeri about 2 months ago

The last few builds failed, probably due to my tweaks wrt. diffoscope's --html-dir artifacts, so I've dropped them. Let's see how it goes now.

#23 Updated by intrigeri about 2 months ago

The last failure I've seen was caused by #12618. Trying to build again and see if I can eventually see this job fail for a good reason.

#24 Updated by intrigeri about 2 months ago

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 70 to 100
  • QA Check changed from Ready for QA to Pass

OK, I've not seen failures specific to these jobs anymore, although I find it suspicious that 2 of the 3 occurrences of #12618 happened with them. Closing for now anyway.

Also available in: Atom PDF