Project

General

Profile

Bug #12629

Feature #5630: Reproducible builds

Document reproducible release process

Added by u 6 months ago. Updated 9 days ago.

Status:
In Progress
Priority:
Elevated
Assignee:
Category:
-
Target version:
Start date:
06/02/2017
Due date:
% Done:

50%

QA Check:
Info Needed
Feature Branch:
Type of work:
Contributors documentation
Blueprint:
Easy:
Affected tool:

Description

We need to update the release process documentation to take care of reproducible ISOs and IUKs.


Related issues

Duplicated by Tails - Feature #12628: Draft a "user" (aka. RM) story for the reproducible release process Duplicate 06/02/2017

Associated revisions

Revision 2ca22f37 (diff)
Added by intrigeri 6 months ago

Release process: fetch the ISO from Jenkins and ensure it matches the signature created by the release manager (refs: #12629).

Revision 69c1263b
Added by intrigeri 5 months ago

Merge branch 'doc/12629-reproducible-release-process' (refs: #12629).

Revision 412b64f4 (diff)
Added by anonym about 2 months ago

Encode our requirements on reproducibility in the release process.

Will-fix: #12629

Revision 7479d707 (diff)
Added by anonym about 2 months ago

Let's be real and don't try to protect against "rouge RMs".

For example: at the moment a rouge RM could upload a compromised .deb
to our custom APT repo without our process being able to identify it,
so let's not even pretend that we are working towards "rouge RM
resistance" just yet.

Refs: #12629

Revision 0ad32beb (diff)
Added by anonym about 1 month ago

Release process: make sure the reproduced ISO/IUKs are what we release.

Will-fix: #12629

History

#1 Updated by u 6 months ago

  • Related to Feature #12628: Draft a "user" (aka. RM) story for the reproducible release process added

#2 Updated by intrigeri 6 months ago

  • Status changed from New to Confirmed

#3 Updated by intrigeri 6 months ago

  • Target version set to Tails_3.2

I see two main aspects here, that I'll discuss first for ISOs and then for IUKs.

For the ISO image:

  • ensure at least N entities produced the same ISO: developers laptop? CI infra? where do we set the bar?
  • avoid having to upload the ISO at release time, and while we're at it, fix the "Upload images" section of the release process doc )AFAIK no RM has actually followed it as-is since years); so, instead of pretending we seed the ISO and then copy it from bittorrent.lizard to rsync.lizard, we should probably instead:
    • scp the detached signature to rsync.lizard
    • ssh rsync.lizard and wget the ISO built by Jenkins
    • verify the detached signature
    • scp the Torrent to bittorrent.lizard
    • ssh bittorrent.lizard, wget the ISO built by Jenkins, add to Transmission

For IUKs:

  • "ensure at least N entities produced the same" still applies, modulo we don't build them on our CI so only developers can reproduce them;
  • regarding publication, it's a bit more subtle since we don't build them on our CI so one needs to upload them (which apparently is not documented yet BTW).

Out of personal interest I might give "avoid having to upload the ISO" a try during the 3.0 release process, in which case I'll probably draft the needed changes in a branch; then anonym can test & polish them.

#4 Updated by intrigeri 6 months ago

  • Status changed from Confirmed to In Progress
  • % Done changed from 0 to 10
  • Feature Branch set to doc/12629-reproducible-release-process

Draft written and tested for the ISO publication. I'll push once I've tested the bits about the IUKs and Torrent.

#5 Updated by intrigeri 6 months ago

Pushed! My branch addresses the "ensure at least N entities produced the same" part, but the other part is left as an exercise to the reader^W^Wanonym :)

#6 Updated by intrigeri 5 months ago

  • Related to deleted (Feature #12628: Draft a "user" (aka. RM) story for the reproducible release process)

#7 Updated by intrigeri 5 months ago

  • Duplicated by Feature #12628: Draft a "user" (aka. RM) story for the reproducible release process added

#8 Updated by intrigeri 5 months ago

  • Feature Branch deleted (doc/12629-reproducible-release-process)

The updated doc on the branch has worked fine during the 3.0.1 release process, so I'm merging it. This is not everything this ticket is about though.

#9 Updated by intrigeri 3 months ago

  • Priority changed from Normal to Elevated

It would be nice if this was drafted in time to be tested while releasing 3.2~rc1, so we can polish it as needed and test a final version during the 3.2 release process.

#10 Updated by intrigeri 3 months ago

anonym plans to do this post-3.2-freeze.

#11 Updated by anonym about 2 months ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA

I'm a bit confused. I followed the updated release process both for 3.0.1, 3.2~rc1 and 3.2, and it (beautifully!) does the ISO/IUK publication parts, but contrary to #12629#note-5 it does not address the "ensure at least N entities produced the same ISO/IUK" part (well, it verifies that jenkins built the same ISO, but I don't think that counts! :)). In fact, AFAICT, that is the only part that is left for this ticket, except for some sort of contingency plan in case reproduction fails. If I have missed anything else, please let me know!

For the first issue, I am going to be conservative and suggest that we simply say that N=2 (note: for ISOs we also implicitly require that Jenkins reproduces the ISO due to the publishing instructions, but this is orthogonal to our requirements on N, imho). As for the requirements of these N persons, we could require them to be members of tails-rm@, but that is a bit limiting (there's only three of us currently). Same with tails@ unless we make more of its members able to build Tails. If we go beyond that we don't really have a security policy to rely on. Hm. Well, for now I don't think I dare proposing us to require anything less than tails@ membership -- I'll lobby for more of them having Tails build setups ready for Tails 3.3! :) Note that in practice that means the "participants" are the RM + one more tails@ member (since all possible RMs themselves are members of tails@).

For the second issue, the case where reproduction fails, I say, essentially: make a quick investigation. If the issue can be fixed cheaply, then rebuild. Otherwise, let's not delay the release, just make sure there's no backdoor, random bit flip, or something else serious. To me this seems to strike the right balance between taking reproducibility seriously (and benefiting from all its nice security implications) vs the cost for our users by delaying releases.

I went ahead and pushed my ready proposal straight to testing (so it will end up in master soon) since I thought it won't make things worse, at least. :) See 412b64f418a0261b199fce266b4b3fddcced65e2. Note the four levels of bullet points, oh yeah!!!! >:)

What do you think? I'm also asking for your opinions, Ulrike and bertagaz!

[BTW, Ulrike, my secret plan is to make you set up a build environment fore Tails 3.3 so you can help reproduce it! :)]

#12 Updated by intrigeri about 2 months ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

Hi!

I'm a bit confused. I followed the updated release process both for 3.0.1, 3.2~rc1 and 3.2, and it (beautifully!) does the ISO/IUK publication parts, but contrary to #12629#note-5 it does not address the "ensure at least N entities produced the same ISO/IUK"

Right, copy'n'paste error, sorry. I think I meant to paste "avoid having to upload the ISO at release time" instead, but that was 4 months ago, so well…

For the first issue, I am going to be conservative and suggest that we simply say that N=2 (note: for ISOs we also implicitly require that Jenkins reproduces the ISO due to the publishing instructions, but this is orthogonal to our requirements on N, imho). As for the requirements of these N persons, we could require them to be members of tails-rm@, but that is a bit limiting (there's only three of us currently). Same with tails@ unless we make more of its members able to build Tails. If we go beyond that we don't really have a security policy to rely on. Hm. Well, for now I don't think I dare proposing us to require anything less than tails@ membership -- I'll lobby for more of them having Tails build setups ready for Tails 3.3! :) Note that in practice that means the "participants" are the RM + one more tails@ member (since all possible RMs themselves are members of tails@).

OK, let's try this and we can adjust later if needed.

For the second issue, the case where reproduction fails, I say, essentially: make a quick investigation. If the issue can be fixed cheaply, then rebuild. Otherwise, let's not delay the release, just make sure there's no backdoor, random bit flip, or something else serious. To me this seems to strike the right balance between taking reproducibility seriously (and benefiting from all its nice security implications) vs the cost for our users by delaying releases.

I don't know about the "right balance" as it greatly depends on what we're trying to achieve, which is unclear to me: I have no idea what high-level goal you're trying to achieve with this doc and the doc itself seems both confused and confusing to me in this respect. Here you write "make sure there's no backdoor" and in 412b64f418a0261b199fce266b4b3fddcced65e2 you write "try to rule out that the RM has gone rouge by including a backdoor". But the way I understand this updated doc, both comparing hashes, "immediately compare the ISOs" and deciding whether the non-determinism matters are the RM's job. Besides, nothing seems to prevent the RM from actually releasing a different ISO than the one that other people built identically (or manually tested, by the way).

So let's please start by clarifying the goals; at first glance, either the bar must be set quite lower (one should check whether it still matches the set of goals and expected benefits we told the sponsor about though), or we need a different verification and decision-making process.

#13 Updated by anonym about 2 months ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

I'm a bit confused. I followed the updated release process both for 3.0.1, 3.2~rc1 and 3.2, and it (beautifully!) does the ISO/IUK publication parts, but contrary to #12629#note-5 it does not address the "ensure at least N entities produced the same ISO/IUK"

Right, copy'n'paste error, sorry. I think I meant to paste "avoid having to upload the ISO at release time" instead, but that was 4 months ago, so well…

As long as you think I didn't miss anything you already had thought about, I am happy.

For the second issue, the case where reproduction fails, I say, essentially: make a quick investigation. If the issue can be fixed cheaply, then rebuild. Otherwise, let's not delay the release, just make sure there's no backdoor, random bit flip, or something else serious. To me this seems to strike the right balance between taking reproducibility seriously (and benefiting from all its nice security implications) vs the cost for our users by delaying releases.

I don't know about the "right balance" as it greatly depends on what we're trying to achieve, which is unclear to me: I have no idea what high-level goal you're trying to achieve with this doc and the doc itself seems both confused and confusing to me in this respect.

Like I said, the high-level goal is to "[benefit] from [reproducibility's] nice security implications". Looking at the "why" section of our blueprint, I believe the only one worth mentioning is: "independent verification that a build product matches what the source intended to produce ⇒ better resist attacks against build machines and developers". (I could also mention e.g. "No more bit flip[s]" since they theoretically could silently degrade security, but I think the other goal is enough.)

By requiring two tails@ members we have at least eliminated the single point of failure that could lead to Tails being backdoored through build system compromise. IMHO the only thing that would substantially improve this would be wide-spread involvement of third parties, but let's take it easy with that for now. :) As for "the cost for our users by delaying releases", I think it's pretty clear that from their PoV it is more important to get security updates than waiting for a fix for a proven non-malicious reproducibility problem.

Here you write "make sure there's no backdoor" and in 412b64f418a0261b199fce266b4b3fddcced65e2 you write "try to rule out that the RM has gone rouge by including a backdoor".

Your quote excludes the ":)" -- this was a poor attempt at self-deprecating humor, which I then started to take semi-seriously (after I wrote it, I actually went over my text and made a few changes that I thought would make the process more robust against a rouge RM without adding extra cost), and this only muddied the waters (sorry!).

While I do think that just having this process involving other trusted (and semi-paranoid! :)) people makes it harder for the RM to go rouge (or more likely: backdoor the ISO under legal duress) the process certainly isn't robust against a dishonest RM, and there's still other vectors that our current reproducibility model doesn't take into account (e.g. the packages in our custom APT repo). Let's forget about the "going rouge"/"legal duress" part, and re-focus towards "resist attacks against build machines". That is 7479d7070ae871fa894ca5b3eed448948ee0591a.

But the way I understand this updated doc, both comparing hashes, "immediately compare the ISOs" and deciding whether the non-determinism matters are the RM's job.

My idea was actually that everyone involved compares hashes, and in the event of mismatch stays involved in the process, awaiting a plausible explanation from the RM that they can verify (possibly by involving e.g. another tails-rm@ member, that should be up for the task). Any way, let's forget about this, and indeed make all this the RMs job, that everyone just trusts.

Besides, nothing seems to prevent the RM from actually releasing a different ISO than the one that other people built identically (or manually tested, by the way).

Agreed, the RM could replay someone else's hash, so some care would have to made about the order these things are communicated (i.e. RM must send it first). Any way, let's forget about this!

So let's please start by clarifying the goals; at first glance, either the bar must be set quite lower (one should check whether it still matches the set of goals and expected benefits we told the sponsor about though), or we need a different verification and decision-making process.

Is the situation clearer/saner now?

#14 Updated by intrigeri about 2 months ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Info Needed

Hi!

anonym:

intrigeri wrote:

Like I said, the high-level goal is to "[benefit] from [reproducibility's] nice security implications". Looking at the "why" section of our blueprint, I believe the only one worth mentioning is: "independent verification that a build product matches what the source intended to produce ⇒ better resist attacks against build machines and developers". (I could also mention e.g. "No more bit flip[s]" since they theoretically could silently degrade security, but I think the other goal is enough.)

I see.

By requiring two tails@ members we have at least eliminated the single point of failure that could lead to Tails being backdoored through build system compromise.

Agreed.

IMHO the only thing that would substantially improve this would be wide-spread involvement of third parties, but let's take it easy with that for now. :)

Right. See below for the cheapest such improvement I have in mind.

While I do think that just having this process involving other trusted (and semi-paranoid! :)) people makes it harder for the RM to go rouge (or more likely: backdoor the ISO under legal duress) the process certainly isn't robust against a dishonest RM, and there's still other vectors that our current reproducibility model doesn't take into account (e.g. the packages in our custom APT repo). Let's forget about the "going rouge"/"legal duress" part, and re-focus towards "resist attacks against build machines". That is 7479d7070ae871fa894ca5b3eed448948ee0591a.

Fair enough.

Bonus nitpicking: this does not fully achieve "resist attacks against build machines", as in theory a compromised RM's machine can replace the reproduced + tested ISO/IUK with other ones. All it takes is to 1. steal the smartcard PIN code while the RM is busy typing that PIN numerous times for signing UDFs, and seize this opportunity to sign other data while the smartcard is plugged; 2. steal the RM's SSH credentials; 3. upload replacement ISO + IUKs after we've reproduced+tested the genuine ones; 4. push replacement IDF and UDFs to Git using the stolen credentials (I doubt we would reliably notice that unless someone carefully verifies what happens in Git in and around the "merge new release to master" commit every time). Granted, that's quite theoretical but I think we should take highly sophisticated adversaries into account in this context. And granted too, our custom APT repo gives such adversaries much easier (and harder to detect) attack vectors at the moment. I'm not arguing in favour of trying to fix this remaining problem, I just want us to be super clear about what we think we're achieving here, both in our own minds and in our external communication :)

So to be extra clear, ignoring my nitpicking above, these two documented reasons "Why we want reproducible builds" are not achieved yet:

  • "the incentive for an attacker […] to compromise developers themselves, is lowered"
  • "In turn, this avoids the need to trust people (or software) who build the ISO we release, which in turn allows more people to get involved in release management work."

Let's keep this in mind when communicating the benefits to our users and when writing the design doc. It's not obvious as it differs from our currently documented stated goals. So I'd like to see a note about this, pointing here, on the corresponding 2-3 tickets. But wait, see my proposal below.

But the way I understand this updated doc, both comparing hashes, "immediately compare the ISOs" and deciding whether the non-determinism matters are the RM's job.

My idea was actually that everyone involved compares hashes, and in the event of mismatch stays involved in the process, awaiting a plausible explanation from the RM that they can verify (possibly by involving e.g. another tails-rm@ member, that should be up for the task). Any way, let's forget about this, and indeed make all this the RMs job, that everyone just trusts.

OK, that makes it much clearer (as IMO the proposed doc does not correctly implement the idea you previously had in mind).

So let's please start by clarifying the goals; at first glance, either the bar must be set quite lower (one should check whether it still matches the set of goals and expected benefits we told the sponsor about though), or we need a different verification and decision-making process.

Is the situation clearer/saner now?

Yes, it's much clearer.

Now, I'd like to propose having a third-party (e.g. another Foundations Team member, i.e. most often myself) check, shortly after the release goes live, that the published ISO image matches both the published tag and what manual testers have tested. The additional work this requires is:

  • File a ticket about this for every release in advance (so we don't rely on the RM to file it… or not, due to being under duress or tired/sloppy).
  • When sending the call for manual testing, the RM attaches the detached signature.
  • The reproducer rebuilds the ISO from the tag and verifies it matches:
    • the published detached signature + hash found in the IDF
    • the detached signature previously emailed by the RM for manual testing
  • The reproducer rebuilds the IUKs and checks that their hash matches:
    • the published UDFs
    • what the RM pushed to the test channel for manual testing
  • Document the above.

Cost:

  • The additional recurring work for reproducing seems quite small, and would be mostly on my plate. Some of it can be automated as I get bored doing it manually every time.
  • The initial Redmine + documentation work seems pretty small too. I can do it the first time I play the post-release reproducer role.
  • The additional work for the RM is limited to "attach the detach signature to an email once during each release process", which seems totally negligible compared to our RM'ing time budget.

Benefit: this addresses most of the concerns I've raised, and gives us part of the two goals you're proposing we drop, i.e. we don't have to trust the RM and their machine to actually publish the ISO + IUK that have been reproduced and manually tested. I am aware this does not fully protect against a corrupt RM person nor machine due to other, unrelated attack vectors (i.e. various trusted input nobody can easily verify), but at least 1. it makes the "trusted inputs → published artifacts" relationship verifiable, which feels huge to me (that's what reproducible builds are primarily about :) and 2. it clarifies what benefits we would get from enabling independent verification of currently trusted inputs in the future, which un-muddies the water a lot (it's not very useful to enable such independent verification if we still rely purely on the RM to triage verification results).

At first glance, the cost/benefit seems totally favorable to me. But I've dived too much into this right now to have a good perspective, so I think I'll need to sleep on it, take a step back, and look at the big picture in a few days again :)

What do you think?

#15 Updated by anonym about 2 months ago

  • Target version changed from Tails_3.2 to Tails_3.3

#16 Updated by intrigeri about 2 months ago

  • Blocks Feature #12356: Communicate about reproducible builds to users via a blog post added

#17 Updated by anonym about 1 month ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Info Needed to Ready for QA

intrigeri wrote:

Bonus nitpicking: this does not fully achieve "resist attacks against build machines", as in theory a compromised RM's machine can replace the reproduced + tested ISO/IUK with other ones. All it takes is to 1. steal the smartcard PIN code while the RM is busy typing that PIN numerous times for signing UDFs, and seize this opportunity to sign other data while the smartcard is plugged; 2. steal the RM's SSH credentials; 3. upload replacement ISO + IUKs after we've reproduced+tested the genuine ones; 4. push replacement IDF and UDFs to Git using the stolen credentials (I doubt we would reliably notice that unless someone carefully verifies what happens in Git in and around the "merge new release to master" commit every time). Granted, that's quite theoretical but I think we should take highly sophisticated adversaries into account in this context. And granted too, our custom APT repo gives such adversaries much easier (and harder to detect) attack vectors at the moment. I'm not arguing in favour of trying to fix this remaining problem, I just want us to be super clear about what we think we're achieving here, both in our own minds and in our external communication :)

So to be extra clear, ignoring my nitpicking above, these two documented reasons "Why we want reproducible builds" are not achieved yet:

  • "the incentive for an attacker […] to compromise developers themselves, is lowered"
  • "In turn, this avoids the need to trust people (or software) who build the ISO we release, which in turn allows more people to get involved in release management work."

I more or less reworked to whole text (0ad32beb9ee7422bfde0a513f1cc8af0341ea726), so I now think the two points above are achieved as far as hardware is concerned. I.e. I think the release process now resists an attacker compromising the RMs hardware (modulo it changing any of the trusted inputs, but the diff review should catch things modified in Tails' Git, and the APT parts are out of scope for now).

But the way I understand this updated doc, both comparing hashes, "immediately compare the ISOs" and deciding whether the non-determinism matters are the RM's job.

My idea was actually that everyone involved compares hashes, and in the event of mismatch stays involved in the process, awaiting a plausible explanation from the RM that they can verify (possibly by involving e.g. another tails-rm@ member, that should be up for the task). Any way, let's forget about this, and indeed make all this the RMs job, that everyone just trusts.

OK, that makes it much clearer (as IMO the proposed doc does not correctly implement the idea you previously had in mind).

With my rewrite, it is now very clear.

Now, I'd like to propose having a third-party (e.g. another Foundations Team member, i.e. most often myself) check, shortly after the release goes live, that the published ISO image matches both the published tag and what manual testers have tested. [...]

I like it, but don't see why it has to be done by another Foundations Team member, nor why it should be done post release. So I've adapted your idea so the tails@ member does it before the release. Did I screw it up?

If you like this approach, I'd like u to test it with me for Tails 3.3.

#18 Updated by intrigeri about 1 month ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

I more or less reworked to whole text (0ad32beb9ee7422bfde0a513f1cc8af0341ea726), so I now think the two points above are achieved as far as hardware is concerned. I.e. I think the release process now resists an attacker compromising the RMs hardware (modulo it changing any of the trusted inputs, but the diff review should catch things modified in Tails' Git, and the APT parts are out of scope for now).

Now, I'd like to propose having a third-party (e.g. another Foundations Team member, i.e. most often myself) check, shortly after the release goes live, that the published ISO image matches both the published tag and what manual testers have tested. [...]

I like it,

Cool :)

but don't see why it has to be done by another Foundations Team member,

Because we need someone who commits to do boring work regularly under tight time constraints. I think the only way to have that is to include it in Core work (it's almost exactly our working definition of Core work actually), and the simplest way to do that on the short term is to piggy-back on some existing role instead of creating a new one; I happened to pick Foundations Team but feel free to pick another one that fits better if you want, or to propose creating a new dedicated Core work role. I don't care much, as long as we have good enough means to rely on that commitment.

nor why it should be done post release.

Well, it's logically impossible to check "that the published ISO image matches both the published tag and what manual testers have tested" before it is released, isn't it?

If you like this approach, I'd like u to test it with me for Tails 3.3.

Nice, but as said above I'd rather not rely on non-formalized commitments for this on the long term. So either make it so the commitment is formalized somewhere, or fallback to the Foundations Team idea.

I've had a look at 0ad32beb9ee7422bfde0a513f1cc8af0341ea726 and (surprise!) I have a few comments:

  • A compromised RM's system can still publish a different ISO than the one that has been successfully reproduced by the TR, no? It seems that even with the pre-release "Verify the meta data pointing to the uploaded ISO and IUKs" step, our only protection against this implicitly lies the fact some people will monitor every Git commit on the master branch all the time, which is unreliable (nobody really does that consistently, e.g. I often skip merge commits and you sometimes don't revert spam when you push new stuff there, which suggests you just did git pull without checking the changes closely; and anyway, it's on nobody's job definition to do that currently). Hence the need to do the verification after the release, unless I missed something.
  • I'm worried about adding "Verify the meta data pointing to the uploaded ISO and IUKs" as a blocker in the release process. Historically we RMs have been pretty bad at giving a reliable ETA for such things, so I'm concerned that this adds stress on the TR who is supposed to be available, on short notice, for an unspecified amount of time. I'd rather see this happen post-release, which will relax everyone involved… and also increases the value of the verification, as explained in the previous bullet point.
  • The process depends on the RM explicitly triggering the verification, which can be blocked by hardware/system compromise. I'd rather have something that we know will happen even if the RM does not ask anyone anything (be it because of hardware/system compromise… or more trivially because in the real world, every RM manages to skip/miss/forget at least N% of the release process doc). I believe my proposal (Redmine tickets created in advance) is not affected by this problem, so I don't understand why we would instead implement a process that is affected.
  • "involve another RM" ← there's no other RM with time budgeted to do this work (or even awareness they are on-call that day), so I'd rather s/another RM/a Foundations Team member who is not the RM/; and then we need to add this to the Foundations Team role definition because it's added work/availability.
  • The part about IUKs refers to "solution or explanation the RM presents" but I can't see where the RM presents any such thing to the TR.
  • go to the "If something seemingly malicious is found" case for the ISO above points to text that got removed
  • typo in "reproducibiliy-followup" and in "release_process#reproducibiliy"
  • typo in "the the"

#19 Updated by anonym about 1 month ago

  • QA Check changed from Dev Needed to Info Needed

intrigeri wrote:

I more or less reworked to whole text (0ad32beb9ee7422bfde0a513f1cc8af0341ea726), so I now think the two points above are achieved as far as hardware is concerned. I.e. I think the release process now resists an attacker compromising the RMs hardware (modulo it changing any of the trusted inputs, but the diff review should catch things modified in Tails' Git, and the APT parts are out of scope for now).

Now, I'd like to propose having a third-party (e.g. another Foundations Team member, i.e. most often myself) check, shortly after the release goes live, that the published ISO image matches both the published tag and what manual testers have tested. [...]

I like it,

Cool :)

but don't see why it has to be done by another Foundations Team member,

Because we need someone who commits to do boring work regularly under tight time constraints. I think the only way to have that is to include it in Core work (it's almost exactly our working definition of Core work actually), and the simplest way to do that on the short term is to piggy-back on some existing role instead of creating a new one; I happened to pick Foundations Team but feel free to pick another one that fits better if you want, or to propose creating a new dedicated Core work role. I don't care much, as long as we have good enough means to rely on that commitment.

Ok. I am actively working against this, i.e. you and me becoming more inter-dependent [especially around release time], which is what your proposal means in practice. I'm also not intrigued at becoming blocked by your slow internet connection. :)

I think what I haven't managed to articulate yet is that I see the TR's work as part of QA, and e.g. manual testing is just as affected by what you say, and it seems to work. The release QA has the RM as fallback, which won't work in this case, but the Foundations Team seems like the only sane fallback, so let's go with that at least.

nor why it should be done post release.

Well, it's logically impossible to check "that the published ISO image matches both the published tag and what manual testers have tested" before it is released, isn't it?

The ISO image is generally published (read: uploaded) ~24h before we release (read: announce the release on the website), and the Git tag even before that. So I was referring to the possibility of the check happening during this ~24h window right before the release.

If you like this approach, I'd like u to test it with me for Tails 3.3.

Nice, but as said above I'd rather not rely on non-formalized commitments for this on the long term. So either make it so the commitment is formalized somewhere, or fallback to the Foundations Team idea.

I've had a look at 0ad32beb9ee7422bfde0a513f1cc8af0341ea726 and (surprise!) I have a few comments:

  • A compromised RM's system can still publish a different ISO than the one that has been successfully reproduced by the TR, no?

I'm not really sure what you mean with "publish" here but I think what you say is trivially true: theoretically, whenever we do the check, a compromised RM can publish a different ISO image just after the check.

It seems that even with the pre-release "Verify the meta data pointing to the uploaded ISO and IUKs" step, our only protection against this implicitly lies the fact some people will monitor every Git commit on the master branch all the time, which is unreliable (nobody really does that consistently, e.g. I often skip merge commits and you sometimes don't revert spam when you push new stuff there, which suggests you just did git pull without checking the changes closely; and anyway, it's on nobody's job definition to do that currently). Hence the need to do the verification after the release, unless I missed something.

I would argue that only checking post-release simply is too late:

  • if there's some trivial reproducibility problem, we now lost the chance to cheaply skip the bad release and bump to an "emergency release" with the reproducibility fix. Hopefully this will be a rare occurrence, so the value of this point's argument is hopefully low.
  • now there's a window (start: Tails release; stop: post-release check) where we would happily distribute compromised images, because our process detects them late. Our only defense at this point is that Jenkins is not compromised in the same way as the RM's system. This is very serious, imho.

To me, the second point means we must have a pre-release check (otherwise I really do not understand what we are trying to achieve here). Also doing a single post-release check might add some non-zero value, but I can't help but feel it is arbitrary: the compromised ISO/IUK uploads or change of their meta data on our website could the go by unnoticed if it happened after that single check. To get anything with real guarantees in this direction we'd need a continuous post-release check, i.e. something that ensures that what we checked with the pre-release check stays true until the release's EOL (i.e. next release).

  • I'm worried about adding "Verify the meta data pointing to the uploaded ISO and IUKs" as a blocker in the release process. Historically we RMs have been pretty bad at giving a reliable ETA for such things, so I'm concerned that this adds stress on the TR who is supposed to be available, on short notice, for an unspecified amount of time. I'd rather see this happen post-release, which will relax everyone involved…

To me this would be similar to postponing the manual testing after the release. IMHO this check is the same type of a necessary evil as manual testing.

  • The process depends on the RM explicitly triggering the verification, which can be blocked by hardware/system compromise. I'd rather have something that we know will happen even if the RM does not ask anyone anything (be it because of hardware/system compromise… or more trivially because in the real world, every RM manages to skip/miss/forget at least N% of the release process doc). I believe my proposal (Redmine tickets created in advance) is not affected by this problem, so I don't understand why we would instead implement a process that is affected.

Can you please elaborate on how this is a problem, given that the RM and TR are assumed to work together without malice? And how Redmine tickets are relevant (I'm not against it, it just seems orthogonal).

In the end, I think we need a real time meeting to discuss this. I think we're working with some slight but important differences among our assumptions and end up talking in circles around each other, but that we actually could easily agree on something sane if we just could understand each other better. What do you think?


  • "involve another RM" ← there's no other RM with time budgeted to do this work (or even awareness they are on-call that day), so I'd rather s/another RM/a Foundations Team member who is not the RM/; and then we need to add this to the Foundations Team role definition because it's added work/availability.
  • The part about IUKs refers to "solution or explanation the RM presents" but I can't see where the RM presents any such thing to the TR.
  • go to the "If something seemingly malicious is found" case for the ISO above points to text that got removed
  • typo in "reproducibiliy-followup" and in "release_process#reproducibiliy"
  • typo in "the the"

I'll deal with these later.

#20 Updated by anonym about 1 month ago

  • Assignee changed from anonym to intrigeri

#21 Updated by intrigeri 26 days ago

  • Assignee changed from intrigeri to anonym

Hi!

In the end, I think we need a real time meeting to discuss this. I think we're working with some slight but important differences among our assumptions and end up talking in circles around each other, but that we actually could easily agree on something sane if we just could understand each other better. What do you think?

Fully agreed. I'm reassigning to you so you track and organize this.

I'll reply to some points below anyway but I have little hope it helps much, so let's discuss this when we meet (i.e. soon! :)

anonym wrote:

intrigeri wrote:

anonym wrote:

but don't see why it has to be done by another Foundations Team member,

Because we need someone who commits to do boring work regularly under tight time constraints. I think the only way to have that is to include it in Core work (it's almost exactly our working definition of Core work actually), and the simplest way to do that on the short term is to piggy-back on some existing role instead of creating a new one; I happened to pick Foundations Team but feel free to pick another one that fits better if you want, or to propose creating a new dedicated Core work role. I don't care much, as long as we have good enough means to rely on that commitment.

Ok. I am actively working against this, i.e. you and me becoming more inter-dependent [especially around release time],

I fully agree.

which is what your proposal means in practice.

I don't think so: my proposal implies that the other FT member can do this check at some point after the release. Contrary to what you are proposing, this doesn't necessarily has to be within a 24 hours window.

I'm also not intrigued at becoming blocked by your slow internet connection. :)

Sure, let's not do that.

I think what I haven't managed to articulate yet is that I see the TR's work as part of QA, and e.g. manual testing is just as affected by what you say, and it seems to work. The release QA has the RM as fallback, which won't work in this case, but the Foundations Team seems like the only sane fallback, so let's go with that at least.

I think we have a problem here. See below.

nor why it should be done post release.

Well, it's logically impossible to check "that the published ISO image matches both the published tag and what manual testers have tested" before it is released, isn't it?

The ISO image is generally published (read: uploaded) ~24h before we release (read: announce the release on the website), and the Git tag even before that. So I was referring to the possibility of the check happening during this ~24h window right before the release.

At first glance I don't want to be the one committed to do this in this timeframe for two reasons:

  • I'm not always available during this 24h window and pretty often changing this would require me to enter sacrifice mode.
  • Our track record of providing reliable info wrt. when this 24h window starts is pretty bad. Waiting for something like this is the kind of things that kills me. I don't want to add more of it in my life.

I'm open to discussing this further though :)

I'm not really sure what you mean with "publish" here but I think what you say is trivially true: theoretically, whenever we do the check, a compromised RM can publish a different ISO image just after the check.

This is correct in theory (and actually correct for any single compromised RM machine, not only the current release's RM), but in practice it doesn't work like this: exploiting this weakness requires a RM to plug in their smartcard, which we only do at specific times. Hence my proposal to do the check after the last time when the RM for a given release has plugged their smartcard.

To me, the second point means we must have a pre-release check (otherwise I really do not understand what we are trying to achieve here).

I'd personally be comfortable enough with relying on Jenkins to do this check. But I'm also fine with having such a check done by someone else during the QA, as you're proposing; it's just that IMO it's not enough to achieve our stated goals, hence my proposal.

Also doing a single post-release check might add some non-zero value, but I can't help but feel it is arbitrary: the compromised ISO/IUK uploads or change of their meta data on our website could the go by unnoticed if it happened after that single check.

See above for why I agree with this reasoning in a theoretical world that's slightly different from the one we live in, but disagree once applied to our actual situation.

To get anything with real guarantees in this direction we'd need a continuous post-release check, i.e. something that ensures that what we checked with the pre-release check stays true until the release's EOL (i.e. next release).

I think there's something worthy in this idea. It can probably be simplified a lot: we could monitor the detached ISO signature and the IDF and notify people when they change. Assuming we would be notified if the ISO didn't match either of those anymore, this should be enough to detect any "compromised ISO re-published after the reproducibility check" situation. Depending on some important details it could work either for a pre-release check, or for a post-release one, or for both.

  • The process depends on the RM explicitly triggering the verification, which can be blocked by hardware/system compromise. I'd rather have something that we know will happen even if the RM does not ask anyone anything (be it because of hardware/system compromise… or more trivially because in the real world, every RM manages to skip/miss/forget at least N% of the release process doc).

Can you please elaborate on how this is a problem, given that the RM and TR are assumed to work together without malice?

I was specifically reasoning about "hardware/system compromise", not about human malice: a compromised system can block arbitrary outgoing email… e.g. the one that asks the TR to do their job.

I believe my proposal (Redmine tickets created in advance) is not affected by this problem, so I don't understand why we would instead implement a process that is affected.

And how Redmine tickets are relevant (I'm not against it, it just seems orthogonal).

I think I was wrong: I was assuming that a pre-existing Redmine ticket would resist a compromised RM system. But it doesn't as such a system can delete tickets in a silent way.

So the only way to protect against such an attack against the currently active RM might be to rely on tasks the TRs add to their personal, local calendar in advance.

#22 Updated by intrigeri 26 days ago

(This won't be done by the end of the contract.)

#23 Updated by anonym 9 days ago

  • Target version changed from Tails_3.3 to Tails_3.4

#24 Updated by intrigeri 9 days ago

  • Blocks deleted (Feature #12356: Communicate about reproducible builds to users via a blog post)

Also available in: Atom PDF