Project

General

Profile

Bug #10777

Bug #10288: Fix newly identified issues to make our test suite more robust and faster

The test suite machinery sometimes misses the boot splash

Added by intrigeri over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Elevated
Assignee:
-
Category:
Test suite
Target version:
Start date:
12/18/2015
Due date:
% Done:

100%

QA Check:
Pass
Feature Branch:
test/10777-robust-boot-menu
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

https://jenkins.tails.boum.org/view/RM/job/test_Tails_ISO_stable/46/cucumberTestReport/installing-packages-through-apt/apt-sources-are-configured-correctly/ says "FindFailed: can not find TailsBootSplash.png on the screen", while the video shows said boot splash. I of course can't mark as fragile all potentially affected scenarios, so priority > normal :)

I'm pretty sure I've heard of this bug already, but failed to find the corresponding ticket. Sorry if that's a duplicate.


Related issues

Blocks Tails - Bug #10502: The test suite sometimes cannot connect to the remote shell Resolved 11/06/2015

Associated revisions

Revision db7e00e9 (diff)
Added by anonym over 2 years ago

Change the image used for the Totem sample video.

We are going to remove TailsBootSplash.png.

Refs: #10777

Revision 8129b692 (diff)
Added by anonym over 2 years ago

Be more robust when detecting Tails' boot menu.

We'll spam TAB to get to the kernel commandline prompt so that Sikuli
can take its sweet time detecting an appropriate image and not have to
race against a the boot menu timeout.

Will-fix: #10777

Revision 4601cbb0 (diff)
Added by intrigeri over 2 years ago

Fix inverted logic.

refs: #10777

Revision 65ddac3b (diff)
Added by anonym about 2 years ago

Change the image used for the Totem sample video.

We are going to remove TailsBootSplash.png.

Refs: #10777

Revision e1675f20 (diff)
Added by anonym about 2 years ago

Be more robust when detecting Tails' boot menu.

We'll spam TAB to get to the kernel commandline prompt so that Sikuli
can take its sweet time detecting an appropriate image and not have to
race against a the boot menu timeout.

Will-fix: #10777

Revision 0e6f9064 (diff)
Added by anonym almost 2 years ago

Reboot the VM if we missed the boot menu.

Will-fix: #10777

Revision 76cb3e2d (diff)
Added by bertagaz almost 2 years ago

Use a simpler image to detect UEFI boot loader setup.

Sikuli does not find the previous image on the screen, which makes the
UEFI scenario fail.

Refs: #10777

Revision 8bcb3b9e (diff)
Added by bertagaz almost 2 years ago

Use wait() rather than find() to detect some boot screenshots.

UEFI initialization and boot take a bit of time, so Sikuli miss this
steps images. It's works better when waiting for the UEFI boot loader
setup image to appear, as well as for the boot menu kernel command line
image. This does not break the legacy boot anyway.

Refs: #10777

Revision 7fad433d (diff)
Added by intrigeri almost 2 years ago

Cherry-pick commit 1bf50e3ccbf68a262d35b8472375ad9fdc29e7fc:

Unmark all scenarios that use Tails Installer as fragile.

bugfix/11590-installer-robustness was based on this branch
(bugfix/10720-installer-freezes-on-jenkins), and flagged the tests as
fragile again, so we have to revert that on this branch so these tests
run on Jenkins again.

This branch is only meant to run more tests in Jenkins involving the
boot menu for Refs: #10777 anyway, it won't be merged.

This reverts the following commits:

3c02ea74788cffe0b4db21181f46778eab0689b5
20d07239a8e2c3706827edcf3b17266899c37a64
2c4c6434c0c6bdcf2534f966434bf92230af3698

Revision 50da5a4c (diff)
Added by bertagaz almost 2 years ago

Revert "Test suite: mark UEFI boot test as fragile."

This reverts commit 4a61e393327d8b0fe49e836288ab446dd057cf68.

That's a special boot case we want to test more.

Refs: #10777

Revision 46eddfa5 (diff)
Added by bertagaz almost 2 years ago

Retry rather than re-raise the FindFailed exception.

Otherwise the begin block won't be retried.

Refs: #10777

Revision 5fc4f10c (diff)
Added by bertagaz almost 2 years ago

Rescue more broadly FindFailed of the Kernel command line image.

Sometimes the virtual machine doesn't even starts syslinux, or freeze at
this step, sometimes the virtual USB bus bugs (see #11588).

Refs: #10777

Revision bfa3193e (diff)
Added by bertagaz almost 2 years ago

Remove unecessary ESC+TAB dance in the boot menu.

The new step is supposed to stop at the kernel command line prompt, so
there's no real reason to hide it again for the next step. Let just type
directly the options we want. Also it may be a bit racy: I've seen
scenarios where the ESC key was typed and the kernel command line
hidden, but then shown again quickly after, so Sikuli couldn't find the
TailsBootSplashTabMsg image.

Refs: #10777

Revision a7dc1cf1 (diff)
Added by bertagaz almost 2 years ago

Fix bad UEFI firmware setup menu description.

Refs: #10777

Revision d229ebd9
Added by intrigeri over 1 year ago

Merge remote-tracking branch 'origin/test/10777-robust-boot-menu' into stable (fix-committed: #10777)

History

#1 Updated by anonym over 2 years ago

  • Target version changed from Tails_2.2 to Tails_2.0

#2 Updated by anonym over 2 years ago

  • % Done changed from 0 to 10

A fundamental problem with our Sikuli-based approach is that images that do not stay (e.g. due to a timeout, like in this case) may be missed, e.g. if the system is under load or whatever. I have two ideas to solve it in this case:

1. Spam this:
1.1. Press TAB
1.2. wait() for at most 1 second for some part of the kernel cmdline that always look the save, e.g. "boot=live". on failure, repeat from the previous step.
2. We wrap the current check for the bootsplash around a retry that restarts the computer.

The issue with 1 is that the wait() and TAB spamming really should happen in parallel for this to work. I'm actually not exactly sure what sikuli does with a too small timeout for wait(), e.g. perhaps it can come in a situation where it times our before it has scanned everything? Or will it actually wait longer?

The issue with 2 is that the restart is a much more serious side-effect. E.g. if we use this while we are testing that Tails restarts successfully, perhaps this workaround is what in the end makes the test succeed, possibly hiding some bug that e.g. made Tails not restart.

I think 1 is the better approach.

#3 Updated by anonym over 2 years ago

anonym wrote:

I'm actually not exactly sure what sikuli does with a too small timeout for wait(), e.g. perhaps it can come in a situation where it times our before it has scanned everything? Or will it actually wait longer?

I did some measurements like so:

    t = Time.now
    @screen.wait('TailsBootMenuKernelCmdline.png', 0.00000001)
    STDERR.puts "XXX: #{Time.now - t}" 

With my system's CPU (I assume Sikuli's scanning is CPU bound) I got reported that it took around 0.3 seconds with the image being found, so Sikuli will always do one full scan before checking the timeout. Hence we might as well use find(), which only does one pass, and do not need some arbitrary timeout given.

#4 Updated by anonym over 2 years ago

  • Status changed from Confirmed to In Progress

#5 Updated by anonym over 2 years ago

  • Assignee changed from anonym to kytv
  • % Done changed from 10 to 50
  • QA Check set to Ready for QA
  • Feature Branch set to test/10777-robust-boot-menu

#6 Updated by intrigeri over 2 years ago

  • Related to Bug #10502: The test suite sometimes cannot connect to the remote shell added

#7 Updated by intrigeri over 2 years ago

  • Target version changed from Tails_2.0 to Tails_2.2

#8 Updated by anonym over 2 years ago

  • Assignee changed from kytv to intrigeri

What do you think, intri?

#9 Updated by anonym over 2 years ago

  • Priority changed from Normal to Elevated

#10 Updated by intrigeri over 2 years ago

  • Assignee changed from intrigeri to anonym
  • % Done changed from 50 to 60
  • QA Check changed from Ready for QA to Dev Needed

I've run some tests and pushed a few fixes. But this branch seems to break the "Booting Tails from a USB drive in UEFI mode" scenario: we end up in what looks like the UEFI firmware configuration menu.

#12 Updated by intrigeri over 2 years ago

I think that I spotted a bug and fixed it in 4601cbb, please have a look :)

#13 Updated by intrigeri over 2 years ago

I see some failures to enter the syslinux cmdline editor on heavy load:

(8 test suite runs in parallel on 8 isotesters, and this is the experimental branch, where we give 2 vcpus to TailsToaster)

Not to say it doesn't improve things, though. I'll report back if I see more such failures during my benchmarks (#11011, #11113 and friends) with this branch in, or without.

#14 Updated by anonym over 2 years ago

  • Target version changed from Tails_2.2 to Tails_2.3

#15 Updated by anonym about 2 years ago

  • Target version changed from Tails_2.3 to Tails_2.4

#16 Updated by anonym about 2 years ago

  • Target version changed from Tails_2.4 to Tails_2.5

#17 Updated by anonym about 2 years ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 60 to 70
  • QA Check changed from Dev Needed to Ready for QA

Ok, I've (force) pushed a new attempt. They key here is that the TAB spamming and Sikuli waiting for the TAB prompt happen in parallel. I think the problem with the previous approach was that Sikuli could (with sufficient load) take >5 seconds to scan the screen for the image, and hence block the TAB spamming from happen before the timeout. With the current implementation I've never seen the timeout go lower than 4 seconds (remaining!) even when frying my CPU at the same time so Sikuli takes a long while to actually notice the TAB message.

I'm hopeful! However, given how rare this problem is, perhaps the best approach is to just merge this now for extra exposure, and then keep an eye out over the next couple of months?

#18 Updated by anonym about 2 years ago

My previous attempt had some outright errors (e.g. I forgot to remove a reference to a variable I just removed) and didn't work for UEFI, so I've pushed a few fixes. Let's see what jenkins thinks...

#19 Updated by anonym about 2 years ago

This doesn't seem to work well when rebooting -- I can see that the TAB spamming is working when shutting down (the asterisks printed when wiping the memory has lots of TAB in between). I'm pretty sure the TAB spammer is still running when the boot menu appears. One could imagine that it the focus of virt-viewer is lost, so the key presses are lost, but it should be the same window instance, so nothing should change.

BTW, I've tried

xdotool search "TailsToaster.* Virt Viewer" key Tab

but then the keys are always lost (eaten by the Window, and not sent through to the VM?).

Back to the drawing board...

#20 Updated by anonym almost 2 years ago

Ok, so I've pushed an abomination that seems to work. It seems that sometimes the spammer or virtualized keyboard simply isn't working, so we have to detect when the boot process has progressed past the boot menu and reboot the VM + restart the spammer in that case. Yay.

#21 Updated by intrigeri almost 2 years ago

  • Assignee changed from intrigeri to bertagaz

IIRC bertagaz took over the "reviewing anonym' test suite work" job.

#22 Updated by bertagaz almost 2 years ago

  • QA Check changed from Ready for QA to Dev Needed

intrigeri wrote:

IIRC bertagaz took over the "reviewing anonym' test suite work" job.

I've spent time running this branch, as well as reading it, and came up with:

  • for an "abomination", due to the need to workaround something nasty, it's still a nice one with a clever implementation. I like the way the tabspammer is handled. Thats sound more robust than what we had :)
  • On legacy boot it works pretty well. I've seen the tabspammer spawned and stopped, no errors so far. I've not yet seen the bug being really catched with the "We missed the boot menu before we could deal with it" error message though.
  • The UEFI boot was still broken. I've pushed commit 76cb3e2 and commit 8bcb3b9 that seems to deal with it better. I've also pushed test/10777-more-robust-boot-menu-with-more-fragile-tests to get more exposure in Jenkins on this, as well as started to run extensive tests $HOME. intrigeri might want to have a quick look to it, but that's pretty simple commits.

Let see how this runs in Jenkins now.

#23 Updated by intrigeri almost 2 years ago

bertagaz wrote:

I've pushed commit 76cb3e2 and commit 8bcb3b9 that seems to deal with it better. I've also pushed test/10777-more-robust-boot-menu-with-more-fragile-tests to get more exposure in Jenkins on this, as well as started to run extensive tests $HOME. intrigeri might want to have a quick look to it, but that's pretty simple commits.

Please reassign to me once you're happy with it :)

#24 Updated by intrigeri almost 2 years ago

... keeping in mind that the freeze for 2.5 is some time today.

#25 Updated by bertagaz almost 2 years ago

intrigeri wrote:

bertagaz wrote:

I've pushed commit 76cb3e2 and commit 8bcb3b9 that seems to deal with it better. I've also pushed test/10777-more-robust-boot-menu-with-more-fragile-tests to get more exposure in Jenkins on this, as well as started to run extensive tests $HOME. intrigeri might want to have a quick look to it, but that's pretty simple commits.

Please reassign to me once you're happy with it :)
... keeping in mind that the freeze for 2.5 is some time today.

Ok, I thought to wait until tomorrow and do a pass checking how it went in Jenkins (damn freeze!). It works pretty well at home, so I'm quite fine with it. I won't be able to work on it further today anyway, but can do that tomorrow morning. If you think it will be too late though, you can take it over if you wish.

#26 Updated by intrigeri almost 2 years ago

I've had a very quick look at the recent changes, and I've noticed comments and commit messages about the "setup menu" of "our UEFI bootloader", and an image called UEFIBootLoaderSetup.png. I'm confused since our UEFI bootloader is syslinux, and it has no setup menu AFAIK. What is this about? The UEFI firmware, or anything else?

Wrt. 76cb3e2d404ddf5b6377591aacda41969d777514: wow, that's a super generic image. I'm a bit afraid that sikuli might now find it when it should not (it's sometimes quite flexible), but I'll trust your tests. We'll easily notice whenever this starts breaking anyway, so no big deal.

I'm quite fine with it.

Can you please clarify why you're only "quite" fine with it? Does this branch fix this bug for you, or not?

If you think it will be too late though, you can take it over if you wish.

No need to rush this. Please reassign to me once you're happy with this branch + the issue I've raised above has been fixed. I'll adjust the target version depending on what stage of the 2.5 release process I'm in.

#27 Updated by bertagaz almost 2 years ago

intrigeri wrote:

I've had a very quick look at the recent changes, and I've noticed comments and commit messages about the "setup menu" of "our UEFI bootloader", and an image called UEFIBootLoaderSetup.png. I'm confused since our UEFI bootloader is syslinux, and it has no setup menu AFAIK. What is this about? The UEFI firmware, or anything else?

Right, that's the firmware setup menu, my bad.

Wrt. 76cb3e2d404ddf5b6377591aacda41969d777514: wow, that's a super generic image. I'm a bit afraid that sikuli might now find it when it should not (it's sometimes quite flexible), but I'll trust your tests. We'll easily notice whenever this starts breaking anyway, so no big deal.

Yeah, I didn't find another image Sikuli could find reliabily at this step. I don't know why but it does not seem to like words in the images at this step. It sure is very generic, but I don't think that at this step it is really a problem.

I'm quite fine with it.

Can you please clarify why you're only "quite" fine with it? Does this branch fix this bug for you, or not?

Sorry for the sloppy words. I was a bit unsure because although it was working well at home, I didn't see any run where the boot splash was missed and the new code was tiggerred. That's probably also because the tabspammer is now quite robust, but...

Meanwhile, I've had a look at the Jenkins runs yesterday, and found out some runs where it should have been, but wasn't (see the "Writing files first to a read/write-enabled persistent partition, and then [...]" scenario in https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10777-robust-boot-menu-with-more-fragile-tests/13/console for example). There was a flaw in the code I didn't see (I didn't play with ruby exception before). I've pushed a fix in commit 46eddfa and commit 5fc4f10. There still are some errors though (see https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10777-robust-boot-menu/86/, https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10777-robust-boot-menu/87/) that I need to fix before we can consider this branch ready.

If you think it will be too late though, you can take it over if you wish.

No need to rush this. Please reassign to me once you're happy with this branch + the issue I've raised above has been fixed. I'll adjust the target version depending on what stage of the 2.5 release process I'm in.

Ok. I think this branch deserves more testing in Jenkins with the new commits anyway, so let's wait a bit more.

#28 Updated by intrigeri almost 2 years ago

Right, that's the firmware setup menu, my bad.

Then I expect that someone fixes the misleading comment(s?) in the code, before we merge this branch.

There still are some errors though (see https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10777-robust-boot-menu/86/, https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10777-robust-boot-menu/87/) that I need to fix before we can consider this branch ready.

Indeed, this branch in its current state, seems to actually make things worse: I've not seen #10777 recently, and now it's happening on the very branch that's supposed to fix it.

#29 Updated by bertagaz almost 2 years ago

intrigeri wrote:

Right, that's the firmware setup menu, my bad.

Then I expect that someone fixes the misleading comment(s?) in the code, before we merge this branch.

Pushed commit a7dc1cf

There still are some errors though (see https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10777-robust-boot-menu/86/, https://jenkins.tails.boum.org/job/test_Tails_ISO_test-10777-robust-boot-menu/87/) that I need to fix before we can consider this branch ready.

Indeed, this branch in its current state, seems to actually make things worse: I've not seen #10777 recently, and now it's happening on the very branch that's supposed to fix it.

Yeah, noticed that too. It seems to happen on 3 scenarios mostly, I've pushed commit bfa3193 that should (and seems to) fix it. Let's wait some days to see how it behaves in Jenkins now.

#30 Updated by bertagaz almost 2 years ago

bertagaz wrote:

Yeah, noticed that too. It seems to happen on 3 scenarios mostly, I've pushed commit bfa3193 that should (and seems to) fix it. Let's wait some days to see how it behaves in Jenkins now.

Seems to behave much better now. Test runs 89 and 90 of the branch in Jenkins passed, and their debug.log shows occurrences where the boot menu was missed and the VM reseted. We'll see in a few days.

#31 Updated by bertagaz almost 2 years ago

  • Target version changed from Tails_2.5 to Tails_2.6

#32 Updated by bertagaz almost 2 years ago

  • Assignee changed from bertagaz to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

Setting this ticket RfQA, I think it's pretty robust now. This branch never failed anymore (at least for the bug it is supposed to fix) since my last commits (that's since run 89 in Jenkins). There's even times where the bug was triggered and the VM reset as expected. Assigning to intrigeri as asked, but 2.6 has a different RM, so up to him to decide who will review this branch.

#33 Updated by intrigeri almost 2 years ago

  • Assignee changed from intrigeri to bertagaz
  • QA Check changed from Ready for QA to Dev Needed

Setting this ticket RfQA, I think it's pretty robust now. This branch never failed anymore (at least for the bug it is supposed to fix) since my last commits (that's since run 89 in Jenkins).

Congrats!

There's even times where the bug was triggered and the VM reset as expected.

Confirmed, by looking for "resetting" in https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_test-10777-robust-boot-menu/90/artifact/build-artifacts/debug.log and https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_test-10777-robust-boot-menu/89/artifact/build-artifacts/debug.log.

Still, I suspect that in these two cases, the code from this branch actually workaround'ed #9707 or #10776 (that are fixed with my branch for #10733 according to my tests), and not the bug this ticket is about: the "resetting" message happens only after rebooting, which matches the problems we've seen with memory erasure on shutdown. So at this point, I have to state that I still have not seen one single case where it's clear that this code fixes #10777. Of course, this can probably be explained by the fact that I haven't seen #10777 itself for a long while :)

Assigning to intrigeri as asked, but 2.6 has a different RM, so up to him to decide who will review this branch.

I'll just deal with a first pass of code review myself, but you've done too deep changes for me to dare merging without anonym having had a look first. No big hurry anyway: I've not seen that bug happen on Jenkins recently.

Also, https://jenkins.tails.boum.org/view/Tails_ISO/job/test_Tails_ISO_test-10777-robust-boot-menu-with-more-fragile-tests/ has been broken for a long while. Please either merge in there the branches that can actually make these additional tests pass (#11582, #11588, #11590), so that it's useful to gather data for this ticket, or move that branch out of Jenkins' scope somehow (delete or rename to wip/).

So, here's a quick code review update as of 5007dd76070f29e24058e495b3fbbe24e0dfa6b1. My only concern with the new commits is bfa3193e236f2b60bad1ba5327928c07a9a4ad73, that:

  1. subtly modifies a step called "Tails is at the boot menu" into something that in effect is "Tails is at the boot menu's kernel command line editing interface"; that feels wrong, or at least surprising;
  2. makes another step silently rely on that surprising behaviour.

I totally trust you that this change fixes a problem you've seen, but it breaks the semantics in a way that will make the code hard to understand, debug and maintain.

I have another problem with that commit: I don't understand why there is a problem to fix (regardless of the way it's fixed) -- the begin block ends with screen.wait(boot_menu_cmdline_image, 15), so I have no clue how this can possibly happen: "I've seen scenarios where the ESC key was typed and the kernel command line hidden, but then shown again quickly after, so Sikuli couldn't find the TailsBootSplashTabMsg image". Can you explain this? I suspect that this commit working around another bug that has not been identified yet, and I am worried that a pile of workarounds will prevent us from actually identifying and fixing that bug. Could it be that the TAB spammer is still running, or that key strokes it has sent are still being processed asynchronously somewhere further in the libvirt/QEMU/Linux/seabios pipeline? This would explain the behaviour you've seen, no? This is one case when perhaps the best we can do is to revert that commit, and sleep a bit before "Ensure that we're back at the boot splash"..

Anyway, I'm not sure I understand this code and your changes well enough to comment further, so either you'll need to explain it very clearly to me, or we'll wait for anonym who should grasp it more easily.

#34 Updated by anonym over 1 year ago

Firs to all, this branch seems to have introduced a regression somehow where we run out of disk space on Jenkin's isotesters. As far as the history goes back I see errors like this:

Error creating disk "pristine" in "/tmp/TailsToaster/TailsToasterStorage". Need 4096 MiB but only 4187 MiB is available of which 500 MiB is reserved for other temporary files. 

and it's just getting worse and worse. WTF? Without being able to inspect the isotester's workspace I don't know how to debug this. Any ideas?

Before this is fixed I can't really get a picture of whether this branch improves stuff or not. As for the code (up to 5007dd76070f29e24058e495b3fbbe24e0dfa6b1):

commit bfa3193e236f2b60bad1ba5327928c07a9a4ad73

I have similar concerns as intrigeri here regarding the semantics and how there can be a problem at all. Indeed, I'd like to have more of an explanation before I'd merge this.

commit 46eddfa584ff805729209c7fd14faa28e9b30420

This commit seems unnecessary -- raising an exception inside a try_for will make us retry the whole block again, which will be equivalent in this case since the try_for's block starts with the same begin block that retry would retry. But I guess this makes the flow clearer -- thanks for teaching me about retry! :)

The rest looks good!

#35 Updated by anonym over 1 year ago

  • Target version changed from Tails_2.6 to Tails_2.7

#36 Updated by anonym over 1 year ago

  • Assignee changed from bertagaz to anonym

#37 Updated by anonym over 1 year ago

anonym wrote:

Firs to all, this branch seems to have introduced a regression somehow where we run out of disk space on Jenkin's isotesters.

Ok. This is fixed now, and the runs since then seem very promising (only clearly unrelated failures). I've started three more, for control, then I'll have a closer look. I actually think I'd like to have like twenty complete runs without this branch causing a single error before I merge this -- any regression caused by it can affect any scenario, just like the problem it tries to solve.

#38 Updated by anonym over 1 year ago

All three runs were successful (= only unrelated failures). Started another three...

#39 Updated by anonym over 1 year ago

  • Related to deleted (Bug #10502: The test suite sometimes cannot connect to the remote shell)

#40 Updated by anonym over 1 year ago

  • Blocks Bug #10502: The test suite sometimes cannot connect to the remote shell added

#41 Updated by anonym over 1 year ago

  • % Done changed from 70 to 80
  • QA Check deleted (Dev Needed)

This branch will fix #10502 too!

Also: all other Jenkins runs looks fine in the sense that this branch doesn't seem to make booting worse, only better. I'm getting close to merging this branch, I just want to trigger a few more Jenkins runs (but right now Jenkins is about to restart, so perhaps I can get to it tomorrow). No more dev seems to be needed right now, at least.

#42 Updated by bertagaz over 1 year ago

  • Assignee changed from anonym to intrigeri
  • QA Check set to Info Needed

anonym wrote:

No more dev seems to be needed right now, at least.

Ok wait. I agree this branch sounds good right now, but there's been comments on code that hasn't been replied yet (my bad):

intrigeri wrote:

My only concern with the new commits is bfa3193e236f2b60bad1ba5327928c07a9a4ad73, that:
subtly modifies a step called "Tails is at the boot menu" into something that in effect is "Tails is at the boot menu's kernel command line editing interface"; that feels wrong, or at least surprising;
makes another step silently rely on that surprising behaviour.
I totally trust you that this change fixes a problem you've seen, but it breaks the semantics in a way that will make the code hard to understand, debug and maintain.

I agree this changes a bit the semantic. If problematic, we can adapt the step to be: "Tails is at the boot prompt" (or anything better you could come up with).

Could it be that the TAB spammer is still running, or that key strokes it has sent are still being processed asynchronously somewhere further in the libvirt/QEMU/Linux/seabios pipeline?

It can totally be one of this case. However, I don't think there's an easy way to fix it for real, and adapting the step to end with the boot menu kernel command line opened sounds much more convenient to ensure we're at a known state, no matter what. So I'd be to only update the semantic of the step. As of now, it has proved to be quite solid.

bertagaz wrote:
commit 46eddfa584ff805729209c7fd14faa28e9b30420

anonym wrote:
This commit seems unnecessary -- raising an exception inside a try_for will make us retry the whole block again, which will be equivalent in this case since the try_for's block starts with the same begin block that retry would retry. But I guess this makes the flow clearer -- thanks for teaching me about retry! :)

Well I thought so too, but seen this step begin..end block not being retried when it should have been. The "retry" change did fix it. I'm not sure why I admit.

#43 Updated by intrigeri over 1 year ago

  • Assignee changed from intrigeri to bertagaz

I've no clue what info is required from me. Looks like the next steps are bertagaz fixing remaining code issues, no?

#44 Updated by anonym over 1 year ago

  • Assignee changed from bertagaz to anonym

intrigeri wrote:

I've no clue what info is required from me. Looks like the next steps are bertagaz fixing remaining code issues, no?

No, I take this one over.

#45 Updated by BitingBird over 1 year ago

  • QA Check deleted (Info Needed)

#46 Updated by anonym over 1 year ago

  • Assignee changed from anonym to intrigeri

So I sort of gave up on this one due to #11874 making it too annoying to work on any other robustness improvement. With it fixed, the first jenkins test run of this branch passed. Given all the good results prior to when #11874 struck, it's time to merge this. (Also, I've had a casual look at tests 30-41, and I say no boot menu related issues.)

I've added two commits:

589a4c0 Remove unnecessary sub-step invocation.
26e9ac2 Rename step for improved semantics.

and after those I'm happy with this branch. So, please review those commits, and merge this branch!

#47 Updated by intrigeri over 1 year ago

  • QA Check set to Ready for QA

#48 Updated by intrigeri over 1 year ago

hi!

First of all, I've looked at the test suite runs for stable and devel on Jenkins since we have data (early June), and I could find no identified occurrence of this bug in there. So it's hard to evaluate what this branch improves in practice. But whatever, we've seen this problem in the past, and the changes brought by this branch make sense to me.

Code review passes. I've run locally "Booting Tails from a USB drive in UEFI mode" (that we don't run on Jenkins) once, and it passed. I've analyzed the last runs on Jenkins and the only failures are unrelated to this branch ⇒ merging!

#49 Updated by intrigeri over 1 year ago

  • Status changed from In Progress to Fix committed
  • % Done changed from 80 to 100

#50 Updated by intrigeri over 1 year ago

  • Assignee deleted (intrigeri)
  • QA Check changed from Ready for QA to Pass

#51 Updated by bertagaz over 1 year ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF