Project

General

Profile

Feature #9521

Feature #9519: Make the test suite more deterministic through network simulation

Use the chutney Tor network simulator in our test suite

Added by anonym about 3 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Test suite
Target version:
Start date:
04/15/2016
Due date:
% Done:

100%

QA Check:
Pass
Feature Branch:
test/9521-chutney
Type of work:
Code
Blueprint:
Starter:
Affected tool:

Description

See parent ticket (#9519) for the rationale.

We may want to use chutney to simulate the Tor network for increased determinism in our test suite.

The Tor network is a large part of the test suite's indeterminism, both from transient network issues when communicating with the Tor network (or internal issues, e.g. bad circuits), and from the chosen exit node being blocked. Simulating the Tor network on the testing host would eliminate all such issues.

There is, however, a potential for making the testing host blacklisted/blocked if services identifies the repeated connections as spam (example: irc.oftc.net blocks lizard because of the multiple IRC connections made per hour, potentially). If that is a problem, #9520 would solve that.


Related issues

Related to Tails - Bug #9478: How to deal with transient network errors in the test suite? Resolved 05/27/2015
Related to Tails - Feature #9520: Investigate the Shadow network simulator for use in our test suite Rejected 06/02/2015
Related to Tails - Feature #11356: Add Chutney to our isotesters Rejected 04/15/2016
Blocks Tails - Bug #10442: Watching a WebM video over HTTPS is fragile In Progress 10/28/2015
Blocks Tails - Bug #10381: The "I open the address" steps are fragile Resolved 10/15/2015
Blocks Tails - Feature #10379: Check that we do not see any error pages in the "I open the address" step. Rejected 10/15/2015
Blocks Tails - Bug #10376: The "the Tor Browser loads the (startup page|Tails roadmap)" step is fragile Resolved 10/15/2015
Blocks Tails - Bug #10497: wait_until_tor_is_working helper is fragile Resolved 11/06/2015
Blocks Tails - Bug #10495: The 'the time has synced' step is fragile In Progress 11/06/2015
Blocks Tails - Feature #11351: Upgrade to Tor 0.2.8 Resolved 03/26/2016
Blocks Tails - Bug #9654: "IPv4 TCP non-Tor Internet hosts were contacted" during the test suite Resolved 06/29/2015
Blocks Tails - Bug #10440: Time syncing scenarios are fragile Resolved 10/28/2015

Associated revisions

Revision 139c3ace (diff)
Added by anonym over 2 years ago

Make it possible to make Tails use a simulated Tor network.

... provided by Chutney (https://gitweb.torproject.org/chutney.git).
This is enabled iff the local configuration contains something like:

Chutney:
src_dir: "/path/to/chutney-src-tree"

otherwise we'll use the real Tor network as we done previously.

The main motivation here is improved robustness -- since the "Tor
network" we now use will exit from the host running the automated test
suite, we won't have to deal with Tor network blocking. Performance
should also be improved.

Will-fix: #9521

Revision 270cde68
Added by intrigeri about 2 years ago

Merge remote-tracking branch 'origin/test/9521-chutney' into devel

Fix-committed: #9521

History

#1 Updated by anonym about 3 years ago

  • Related to Bug #9478: How to deal with transient network errors in the test suite? added

#2 Updated by anonym about 3 years ago

  • Related to Feature #9520: Investigate the Shadow network simulator for use in our test suite added

#3 Updated by anonym about 3 years ago

Mostly copy-pasted from #9478:

chutney doesn't look as polished as Shadow (the first line of the README isn't very encouraging: "This is chutney. It doesn't do much so far. It isn't ready for prime-time."). It is, however, dirt simple to setup, which is nice:

git clone https://git.torproject.org/chutney.git
cd chutney
./chutney configure networks/basic
./chutney start networks/basic

And that's literally it for this simple setup (I guess we want a slightly bigger network, plus bridges, which seems doable even if we have to make some templates for the latter ourselves). :) I could use the two clients that networks/basic defines, and the traffic would exit from my computer as expected, which is what we want for the mid-term goal. I suspect chutney will be trivial to package for Debian, as it only depends on python (2.7+). Yay!

Barring any issues due to it's supposed immaturity, it seems chutney would indeed work well for the mid-term goal. As for the long-term goal, chutney clearly isn't designed to "simulate the Internet" like Shadow is. I don't know how much that matters, though. I seems fairly easy to setup a virtual network where the services we need would run directly on the testing host, and the exits would reach them. However, I wonder how it'll if that network is using the private IP space. Relays/exit will work fine once we set ExitPolicyRejectPrivate 0, so that's fine, but Tails client will go ballistic if it resolves a domain to a private address, right (that's a feature of Tor, IIRC)? And perhaps the differences in resolving works in SOCKS4 vs 5 will cause problems (just brainstorming). Perhaps we can pick some random non-private IP range and use it in the local network, and play with the testing host's routing table, to work around this? Network namespaces can probably be useful.

It should be noted that I have no idea if Shadow actually solves this better given our requirements. Also, this presumed advantage of Shadow should be weighed against the advantages of chutney. Hm. Not having to patch the Tor client + super easy setup sounds really compelling, even if we have to do some custom tricks for the long-term "simulate the Internet" goal. IMHO, chutney actually looks like the better option.

#4 Updated by anonym about 3 years ago

After having had a deeper look in how we could use chutney in Tails automated test suite I've hit a blocker. Obviously the chutney Tor network would run on the testing host, and the testing guest would access it over the virtual LAN. However, chutney hardcodes 127.0.0.1 as the destination in e.g. the authority certificates, and Tor will not like this discrepancy. In short, chutney is designed to run the whole simulation, including all clients, on the same host, using loopback. Fixing this should be pretty simple e.g. by adding an env var CHUTNEY_LISTEN_ADDRESS that we could export as 10.2.2.1 or whatever the virtual host interface has. Hopefully upstream will accept such a patch.

#5 Updated by anonym about 3 years ago

Minimal patch that fixes the above:

--- a/lib/chutney/TorNet.py
+++ b/lib/chutney/TorNet.py
@@ -672,7 +672,7 @@ DEFAULTS = {
     'tor': os.environ.get('CHUTNEY_TOR', 'tor'),
     'tor-gencert': os.environ.get('CHUTNEY_TOR_GENCERT', None),
     'auth_cert_lifetime': 12,
-    'ip': '127.0.0.1',
+    'ip': os.environ.get('CHUTNEY_LISTEN_ADDRESS', '127.0.0.1'),
     'ipv6_addr': None,
     'dirserver_flags': 'no-v2',
     'chutney_dir': '.',

Of course, it may break other stuff where 127.0.0.1 is assumed, I really didn't look deeply especially since this worked for my purposes: after adding the appropriate torrc lines (TestingTorNetwork 1, a suitable DirAuthority line for each simulated authority, etc. looking at a simulated client's generated torrc should give a good idea of what to do) to a Tails session, it worked just fine. Yay!

Unfortunately bootstrap (and re-bootstrap from restarting Tor, which we sill do every time we restore from a snapshot) isn't much faster than when using the real network. It's still 10-15 seconds in general. That's a shame as I was hoping that using chutney would more or less eliminate that waiting time, reducing a full test suite with something like NUMBER_OF_SCENARIOS_USING_TOR * 10 seconds. Given that NUMBER_OF_SCENARIOS_USING_TOR currently is something like 100, that means 1000 seconds, or a bit more than 15 minutes. Oh well.

#6 Updated by intrigeri about 3 years ago

Unfortunately bootstrap (and re-bootstrap from restarting Tor, which we sill do every time we restore from a snapshot) isn't much faster than when using the real network. It's still 10-15 seconds in general.

I'm under the impression that it takes longer than 10-15 seconds on isotester1.lizard and in the settings where I run the test suite locally most often, but I didn't measure it. I would measure it if there was some trivial way for me to do so, in case it matters (your call :)

#7 Updated by anonym about 3 years ago

intrigeri wrote:

I'm under the impression that it takes longer than 10-15 seconds on isotester1.lizard and in the settings where I run the test suite locally most often, but I didn't measure it. I would measure it if there was some trivial way for me to do so, in case it matters (your call :)

You can try this:

--- a/features/support/helpers/misc_helpers.rb
+++ b/features/support/helpers/misc_helpers.rb
@@ -69,6 +69,11 @@ end
 def wait_until_tor_is_working
   try_for(270) { @vm.execute(
     '. /usr/local/lib/tails-shell-library/tor.sh; tor_is_working').success? }
+  tor_log_lines = @vm.file_content("/var/log/tor/log").split("\n")
+  tor_start = DateTime.parse(tor_log_lines.first)
+  tor_done = DateTime.parse(tor_log_lines.grep(/Bootstrapped 100%: Done/).first)
+  diff = tor_done.to_time - tor_start.to_time
+  STDERR.puts "XXX: Tor bootstrap time (seconds): #{diff}" 
 end

 def convert_bytes_mod(unit)

#10 Updated by intrigeri about 3 years ago

You can try this:

Done!

  • on isotester1.lizard (features/torified_browsing.feature run twice, so only two full bootstraps, and the rest is re-bootstraps):
    • bootstrap times: 13, 13, 27, 75, 203, 12, 141, 27, 12, 21, 12, 27, 13, 13, 139, 139, 13, 140, 139, 12
    • mean bootstrap time: 59.55
    • median bootstrap time: 24.0
  • in my usual local testing environment (all tests up to, and including, electrum.feature, so a few full bootstraps + a few re-bootstraps):
    • bootstrap times: 22, 13, 14, 12, 26, 22, 20, 19, 23, 21, 21, 21, 22, 104, 24, 95
    • mean bootstrap time: 29.94
    • median bootstrap time: 21.5

=> so it seems that going down to 10-15 seconds would be a performance improvement in these settings, especially since (I guess) it would remove outliers that make the mean higher than the median. I'm not saying that this, in itself, is worth going the chutney way, but we have other reasons to investigate it anyway :)

#11 Updated by anonym almost 3 years ago

#12 Updated by intrigeri almost 3 years ago

  • Target version set to 2016

(As added by anonym post-summit.)

#13 Updated by anonym almost 3 years ago

  • Description updated (diff)

#15 Updated by anonym over 2 years ago

  • Blocks Bug #10442: Watching a WebM video over HTTPS is fragile added

#16 Updated by anonym over 2 years ago

  • Blocks Bug #10381: The "I open the address" steps are fragile added

#17 Updated by anonym over 2 years ago

  • Blocks Feature #10379: Check that we do not see any error pages in the "I open the address" step. added

#18 Updated by anonym over 2 years ago

  • Blocks Bug #10376: The "the Tor Browser loads the (startup page|Tails roadmap)" step is fragile added

#19 Updated by anonym over 2 years ago

  • Subject changed from Investigate the chutney Tor network simulator for use in our test suite to Use the chutney Tor network simulator in our test suite
  • Status changed from Confirmed to In Progress
  • Priority changed from Normal to Elevated
  • Target version changed from 2016 to Tails_2.4
  • % Done changed from 0 to 20
  • Type of work changed from Research to Code

#20 Updated by intrigeri over 2 years ago

  • Blocks Bug #10497: wait_until_tor_is_working helper is fragile added

#21 Updated by anonym over 2 years ago

  • Blocks Bug #10495: The 'the time has synced' step is fragile added

#22 Updated by anonym over 2 years ago

  • % Done changed from 20 to 30
  • Feature Branch set to test/9521-chutney

The current branch allows us to seamlessly run the tests with either the real Tor network (i.e. like before) or a simulated one (by Chutney) based on the local configuration; to use Chutney's simulated Tor network one has to configure something like

Chutney:
  src_dir: "/path/to/chutney-src-tree" 

and the Chutney sources must have the patches I have submitted upstream to these two tickets:

What remains is (at least):

  • Consider adding chutney as a Git submodule instead of having the user providing a path to a correct Chutney distribution. Long-term we'd prefer to have Chutney packaged in Debian, of course, but let's not bother with even starting to think about that now.
  • Figure out how we want to use this:
    • Should the simulated network be used by default? How to control that (run_test_suite --real-tor-network)?
    • Or do we actually always want to use it, except in some feature that explicitly uses the real Tor network?
    • What to do for the run we do for tentative release images?
  • Probably drop everything using check.tp.o, see commits 197e795 and ac10a9e why it is ugly and problematic with Chutney.
  • Currently the only scenarios that fail are the ones using bridges (\o/) because there's no support for that yet. I think I could pretty easily add support for "normal" bridges, since there is a Chutney torrc template for it, so we need to either: (1) patch Chutney so we can provide our own templates bridges with the transports we want to test, or (2) upstream templates for all transports we want to test. Perhaps we want to do both; (2) because we are nice with upstream, (1) because we want our own way to add torrc templates so we don't have to wait (2) each time we want to add a test for a new transport.

Also, currently Jenkins will run these tests with the real Tor network, but I'd like to have it start testing with the simulated network soon. The easiest would be to put a Chutney source tree checkout with my patches applied and the required test suite configuration (which won't interfere when other branches are being tested) on all isotesters.

#23 Updated by anonym over 2 years ago

anonym wrote:

  • Consider adding chutney as a Git submodule instead of having the user providing a path to a correct Chutney distribution. Long-term we'd prefer to have Chutney packaged in Debian, of course, but let's not bother with even starting to think about that now.

Due to the pain of using Git submodules, we decided to go with the current approach where we point to a Chutney checkout somewhere on the filesystem. It will be dealt with on the isotesters with #11356.

  • Should the simulated network be used by default?

We all agree that: yes, we should do this ...

How to control that (run_test_suite --real-tor-network)?

... and we don't care about this. I'll keep all this "pluggable" though so if we come up for a case where we want to support this, it'll be easy. It will not complicate things, and only add minimal bloat (two if:s that always will be true) so why not?

  • Or do we actually always want to use it, except in some feature that explicitly uses the real Tor network?
  • What to do for the run we do for tentative release images?

We will use Chutney in all tests in the short-term and probably mid-term, so that is what we will focus on now.

Long-term we want to make Jenkins an integral part of the release process, and delegate the automated tests to it (or at least make it another player in that game), but it first has to be robust enough. In this case we'll probably want it to run at least some basic test(s) with the normal Tor network, as a sanity sanity check. An idea is: that feature is marked @release and normally Jenkins runs tests with --tags ~@release (just like how we do with the @fragile tag), except when we build from a release tag. Note: we currently do not build release tags, but presumably we'd want to do that once we reach the state of "involving Jenkins in releases".

(<Dreaming>The next step would be that when we have reproducible builds, we just have to inform Jenkins that the RM's locally built image's hash matches the one Jenkins built, and then Jenkins (or something else, automated) will proceed with publishing the image over bittorrent and our HTTP mirrors</Dreaming>)

Also, currently Jenkins will run these tests with the real Tor network, but I'd like to have it start testing with the simulated network soon. The easiest would be to put a Chutney source tree checkout with my patches applied and the required test suite configuration (which won't interfere when other branches are being tested) on all isotesters.

Again, #11356.

#24 Updated by anonym about 2 years ago

So I spent most of the past four days improving and extensively testing this branch. It certainly looks like a big improvement -- while I still see occasional bootstrap failures, I think I have only seen one transient post-bootstrap error (in git.feature cloning over HTTPs once randomly failed).

I'd really like us to get this running on Jenkins (#11356 => Elevated) for some stats gathering. Beyond this branch, I will also create these branches for testing purposes only:

  • devel with all Tor-related @fragile tags removed
  • test/9521-chutney with all Tor-related @fragile tags removed

I think I'd also like to have variants that kills the Tor bootstrap restarting stuff we have in the restart-tor script (which I suspect make things worse sometimes nowadays), and maybe also try with tor 2.8.x (I guess both on the test suite host and in Tails) since it affects bootstrapping in some interesting ways (see #11285, which would have to be solved for the test suite first).

#25 Updated by anonym about 2 years ago

anonym wrote:

I will also create these branches for testing purposes only:

  • devel with all Tor-related @fragile tags removed

Done in the test/9521-with-fragile-scenarios branch.

  • test/9521-chutney with all Tor-related @fragile tags removed

Done in the test/9521-chutney-with-fragile-scenarios branch.

I think I'd also like to have variants that kills the Tor bootstrap restarting stuff we have in the restart-tor script (which I suspect make things worse sometimes nowadays), and maybe also try with tor 2.8.x (I guess both on the test suite host and in Tails) since it affects bootstrapping in some interesting ways (see #11285, which would have to be solved for the test suite first).

we'll see how the above branches fare before testing any of this.

#26 Updated by bertagaz about 2 years ago

I think there's a bug somewhere in the new branch without fragile tags (and the chutney branch itself). It's failing constantly on the @check_tor_leaks scenarios with a "no implicit conversion of nil into Array (TypeError)" error in features/support/hooks.rb:271:in `After'.

It seems to have appeared since acc3a1905db53a9e2707fa56d67d7828591b602f has been merged in this branch. See e.g first Jenkins run that started to exhibit this failure

#27 Updated by anonym about 2 years ago

bertagaz wrote:

I think there's a bug somewhere in the new branch without fragile tags (and the chutney branch itself). It's failing constantly on the @check_tor_leaks scenarios with a "no implicit conversion of nil into Array (TypeError)" error in features/support/hooks.rb:271:in `After'.

It seems to have appeared since acc3a1905db53a9e2707fa56d67d7828591b602f has been merged in this branch. See e.g first Jenkins run that started to exhibit this failure

Whops, I didn't see your comment. Yes I noticed this independently and fixed it. It was force-pushed, but it is confined into the corresponding commit, namely 8474c25005395aa9866f29188856cf26c490bb62.

#28 Updated by anonym about 2 years ago

#29 Updated by anonym about 2 years ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 30 to 50
  • QA Check set to Ready for QA

As decided during the CI meeting, I'm assigning it to you, intrigeri, for a code review. Also, if you haven't reviewed it by Wednesday next week (2016-05-18), I'm supposed to just merge it any way, and I guess we'll do a post-merge code review.

#30 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

Great job! I've pushed a couple typo fixes, but did not test the thing yet.

Regarding the doc to set up test suite with Chutney:

  • It feels that manually clone + apply patches by hand is less practical than it could be (especially if we ever have to add more patches, etc.). How about we add a Git submodule pointing to the relevant branch in our own Chutney repo?
  • "For now you also have to copy (or, better, symlink)" ← how hard would it be to have the test suite do it itself?

Regarding things like assert_all_connections(@sniffer.pcap_file) do |host| → s/host/connection/ (reusing the terminology from pcap_connections_helper), for better clarity, and then make it clear that we're talking of the destination host+port? Or instead s/host/destination_host/, maybe. And then, I'm not convinced that the "Convenience aliases" help more (the test developer) than they add to confusion (for me): e.g. I found it weird to read [host.mac_saddr, host.mac_daddr].include?($vm.real_mac) == is_leaking (it looks like we're asking the saddr of the destination host, which feels awkward).

Now, a couple rephrasing suggestions:

  • "there was no traffic sent to the web server on the LAN": maybe "no traffic was sent to [...]" instead?
  • "Unexpected hosts were contacted": maybe "Unexpected packets were seen" instead? (we're not necessarily selecting on the destination host only)

Is it me, or this branch will fix #8961?

Why was require 'ipaddr' added to vm_helper.rb?

#31 Updated by anonym about 2 years ago

#32 Updated by anonym about 2 years ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 50 to 60

intrigeri wrote:

Great job! I've pushed a couple typo fixes, but did not test the thing yet.

Regarding the doc to set up test suite with Chutney:

  • It feels that manually clone + apply patches by hand is less practical than it could be (especially if we ever have to add more patches, etc.). How about we add a Git submodule pointing to the relevant branch in our own Chutney repo?

This is actually what I originally suggested, but it was rejected ("Git submodules are awkward to work with" or something) when we talked about it during some CI meeting; see beginning of #9521#note-23.

Personally I would love this solution. Since chutney then becomes self-contained, no setup instruction are needed any more and we can get rid of the Chutney: src_dir crap from the local test suite configuration; and new patches will, indeed, be much easier to add since there is no coordination with sysadmins required. If you create a repo forked from the upstream, I can get this into shape (hence "Dev Needed").

  • "For now you also have to copy (or, better, symlink)" ← how hard would it be to have the test suite do it itself?

It would not be hard, but if we have chutney as a Git submodule I'll just push the templates in there instead (I guess the one we have could be upstreamed, eventually, so it makes sense that way too). Then I can forget about #11364 and other upstream chutney work for a while (until the autumn or something), which is welcome in these times of -- well -- not enough time. :)

Regarding things like assert_all_connections(@sniffer.pcap_file) do |host| → s/host/connection/ (reusing the terminology from pcap_connections_helper), for better clarity

Fixed in d098f3c.

and then make it clear that we're talking of the destination host+port? Or instead s/host/destination_host/, maybe. And then, I'm not convinced that the "Convenience aliases" help more (the test developer) than they add to confusion (for me):

Agreed. The assertion is made from the Tails VM's perspective, so what "source" and "destination" means should be clear (added comment in 800fd2f just to be sure), but, indeed, let's make it explicit which end of a connection we are looking at by dropping the "convenience aliases" and use daddr and dport instead. Fixed in d6b1752. Note that we essentially never will have to look at the source address/port, that's why I added the convenience aliases.

e.g. I found it weird to read [host.mac_saddr, host.mac_daddr].include?($vm.real_mac) == is_leaking (it looks like we're asking the saddr of the destination host, which feels awkward).

With these changes and explanations in place, we're good here, right?

Now, a couple rephrasing suggestions:

  • "there was no traffic sent to the web server on the LAN": maybe "no traffic was sent to [...]" instead?

Agreed, fixed in 2b6626a.

  • "Unexpected hosts were contacted": maybe "Unexpected packets were seen" instead? (we're not necessarily selecting on the destination host only)

Absolutely, but I think I stick with the "connections" term, so: "Unexpected connections were made". Fixed in 69b6da7.

Is it me, or this branch will fix #8961?

It should, yes. I'll have to go through these Tor-related test suite tickets and see which ones should be affected.

Why was require 'ipaddr' added to vm_helper.rb?

vm_helper.rb uses ipaddr's IPAddr in bridge_ip_addr() so it should have been there in the first place. Ruby is happy as long as some file @require@s a module, so that's why it worked before.

#34 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to anonym
  • It feels that manually clone + apply patches by hand is less practical than it could be (especially if we ever have to add more patches, etc.). How about we add a Git submodule pointing to the relevant branch in our own Chutney repo?

Personally I would love this solution. Since chutney then becomes self-contained, no setup instruction are needed any more and we can get rid of the Chutney: src_dir crap from the local test suite configuration; and new patches will, indeed, be much easier to add since there is no coordination with sysadmins required. If you create a repo forked from the upstream, I can get this into shape (hence "Dev Needed").

Excellent. As clarified over IM, the repo already exists => please go ahead :)

  • "For now you also have to copy (or, better, symlink)" ← how hard would it be to have the test suite do it itself?

It would not be hard, but if we have chutney as a Git submodule I'll just push the templates in there instead (I guess the one we have could be upstreamed, eventually, so it makes sense that way too). Then I can forget about #11364 and other upstream chutney work for a while (until the autumn or something), which is welcome in these times of -- well -- not enough time. :)

Cool. Please go ahead.

With these changes and explanations in place, we're good here, right?

Yes!

Code-reviewed the branch again up to 69b6da72353431acc5fa8959987d3ab2bf65085f, fine with me!

#35 Updated by anonym about 2 years ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 60 to 70
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

  • It feels that manually clone + apply patches by hand is less practical than it could be (especially if we ever have to add more patches, etc.). How about we add a Git submodule pointing to the relevant branch in our own Chutney repo?

Personally I would love this solution. Since chutney then becomes self-contained, no setup instruction are needed any more and we can get rid of the Chutney: src_dir crap from the local test suite configuration; and new patches will, indeed, be much easier to add since there is no coordination with sysadmins required. If you create a repo forked from the upstream, I can get this into shape (hence "Dev Needed").

Excellent. As clarified over IM, the repo already exists => please go ahead :)

  • "For now you also have to copy (or, better, symlink)" ← how hard would it be to have the test suite do it itself?

It would not be hard, but if we have chutney as a Git submodule I'll just push the templates in there instead (I guess the one we have could be upstreamed, eventually, so it makes sense that way too). Then I can forget about #11364 and other upstream chutney work for a while (until the autumn or something), which is welcome in these times of -- well -- not enough time. :)

Cool. Please go ahead.

Done in:

  • 97ada21 Add our temporary Chutney fork as a Git submodule.
  • a1f8a47 Move bridge-obfs4.tmpl into the Chutney submodule.
  • 77d16ec Use our Chutney Git submodule instead of an external checkout.

Code-reviewed the branch again up to 69b6da72353431acc5fa8959987d3ab2bf65085f, fine with me!

Please also look at e500661 (dogtail! :)) since the use of Tor Check just felt so wrong to keep using now that it will always tell us we do not use Tor thanks to Chutney.

#36 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Info Needed

Code review passes. I've tried to run the test suite from that branch, and here's what I see:

Command failed (returned pid 28987 exit 255): ["/srv/git/submodules/chutney/chutney", "start", "/srv/git/features/chutney/test-network", {:err=>[:child, :out]}]:
Using Python 2.7.9

Starting nodes

Couldn't launch test000auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/000auth/torrc): 255

Couldn't launch test001auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/001auth/torrc): 255

Couldn't launch test002auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/002auth/torrc): 255

Couldn't launch test003auth (tor --quiet -f /tmp/TailsToaster/chutney-data/nodes/003auth/torrc): 255

.
<0> expected but was
<#<Process::Status: pid 28987 exit 255>>. (Test::Unit::AssertionFailedError)
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:55:in `block in assert_block'
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:1593:in `call'
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:1593:in `_wrap_assertion'
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:53:in `assert_block'
/usr/lib/ruby/vendor_ruby/test/unit/assertions.rb:240:in `assert_equal'
/srv/git/features/support/helpers/misc_helpers.rb:196:in `block in cmd_helper'
/srv/git/features/support/helpers/misc_helpers.rb:192:in `popen'
/srv/git/features/support/helpers/misc_helpers.rb:192:in `cmd_helper'
/srv/git/features/step_definitions/chutney.rb:27:in `block (2 levels) in ensure_chutney_is_running'
/srv/git/features/step_definitions/chutney.rb:26:in `chdir'
/srv/git/features/step_definitions/chutney.rb:26:in `block in ensure_chutney_is_running'
/srv/git/features/step_definitions/chutney.rb:49:in `call'
/srv/git/features/step_definitions/chutney.rb:49:in `ensure_chutney_is_running'
/srv/git/features/support/hooks.rb:177:in `block in <top (required)>'
/srv/git/features/support/extra_hooks.rb:35:in `call'
/srv/git/features/support/extra_hooks.rb:35:in `invoke'
/srv/git/features/support/extra_hooks.rb:114:in `block in before_feature'
/srv/git/features/support/extra_hooks.rb:113:in `each'
/srv/git/features/support/extra_hooks.rb:113:in `before_feature'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:181:in `block in send_to_all'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:179:in `each'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:179:in `send_to_all'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:169:in `broadcast'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:26:in `visit_feature'
/usr/lib/ruby/vendor_ruby/cucumber/ast/features.rb:28:in `block in accept'
/usr/lib/ruby/vendor_ruby/cucumber/ast/features.rb:17:in `each'
/usr/lib/ruby/vendor_ruby/cucumber/ast/features.rb:17:in `each'
/usr/lib/ruby/vendor_ruby/cucumber/ast/features.rb:27:in `accept'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:21:in `block in visit_features'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:170:in `broadcast'
/usr/lib/ruby/vendor_ruby/cucumber/ast/tree_walker.rb:20:in `visit_features'
/usr/lib/ruby/vendor_ruby/cucumber/runtime.rb:49:in `run!'
/usr/lib/ruby/vendor_ruby/cucumber/cli/main.rb:42:in `execute!'
/usr/bin/cucumber:13:in `<main>'

How can I debug this?

#37 Updated by bertagaz about 2 years ago

intrigeri wrote:

How can I debug this?

You can remove the --quiet option used in chutney to start the Tor instances (only used in two occurrences), you'll get the reason why they failed to start. Are you using our dedicated branch with all the patches? This error sound a lot like the one I experienced in Jenkins when chutney was sourcing the system torrc.

#38 Updated by anonym about 2 years ago

bertagaz wrote:

intrigeri wrote:

How can I debug this?

You can remove the --quiet option used in chutney to start the Tor instances (only used in two occurrences), you'll get the reason why they failed to start.

If it helps, this is what I'd use:

sed -i '/"--quiet",/d' submodules/chutney/lib/chutney/TorNet.py

Are you using our dedicated branch with all the patches?

As you can see in the log, he uses the chutney Git submodule that I pushed yesterday, and it tracks the correct branch so all required commits should be there.

This error sound a lot like the one I experienced in Jenkins when chutney was sourcing the system torrc.

From the log I can see that it is not the same commandline that fails (e.g. there is no "--list-fingerprint" this time); now it indeed must be the start() method that fails, so this is another issue.

intrigeri, was this the first time you tried running it, or did this work before? And what versoin of Tor are you running? I'm again wondering if there is some permissions issue involved here. :)

#39 Updated by anonym about 2 years ago

  • Assignee changed from anonym to intrigeri

#40 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to anonym

If it helps, this is what I'd use:

Thanks! So I re-run the command line that failed, and:

# tor -f /tmp/TailsToaster/chutney-data/nodes/003auth/torrc
May 12 07:35:20.159 [notice] Tor v0.2.5.12 (git-3731dd5c3071dcba) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.0.1k and Zlib 1.2.8.
May 12 07:35:20.159 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
May 12 07:35:20.159 [notice] Read configuration file "/tmp/TailsToaster/chutney-data/nodes/003auth/torrc".
May 12 07:35:20.170 [notice] Based on detected system memory, MaxMemInQueues is set to 8192 MB. You can override this by setting MaxMemInQueues by hand.
May 12 07:35:20.170 [warn] You have used DirAuthority or AlternateDirAuthority to specify alternate directory authorities in your configuration. This is potentially dangerous: it can make you look different from all other Tor users, and hurt your anonymity. Even if you've specified the same authorities as Tor uses by default, the defaults could change in the future. Be sure you know what you're doing.
May 12 07:35:20.170 [warn] The DirAuthority options 'hs' and 'no-hs' are obsolete; you don't need them any more.
May 12 07:35:20.170 [warn] The DirAuthority options 'hs' and 'no-hs' are obsolete; you don't need them any more.
May 12 07:35:20.170 [warn] The DirAuthority options 'hs' and 'no-hs' are obsolete; you don't need them any more.
May 12 07:35:20.170 [warn] The DirAuthority options 'hs' and 'no-hs' are obsolete; you don't need them any more.
May 12 07:35:20.170 [warn] Failed to parse/validate config: V3AuthVotingInterval is insanely low.
May 12 07:35:20.170 [err] Reading config failed--see warnings above.

So "V3AuthVotingInterval is insanely low" is probably the problem, right?

intrigeri, was this the first time you tried running it, or did this work before?

First time.

And what versoin of Tor are you running?

0.2.5.12-1 from Jessie. Do we need anything newer?

#41 Updated by intrigeri about 2 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Info Needed to Dev Needed

Indeed, upgrading to 0.2.7.6-1~bpo8+1 fixes the problem => doc++ :)

#42 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to anonym

#43 Updated by anonym about 2 years ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

Indeed, upgrading to 0.2.7.6-1~bpo8+1 fixes the problem => doc++ :)

Ah, yes. Specifically Tor 0.2.6.x is required, sorry for not having made this clear before. Docs fixed in 1dd8cb7.

#44 Updated by intrigeri about 2 years ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

Code review passes!

I'm doing a full test suite run (and will report back here later), but sending back to anonym's plate so that he can handle failures such as https://jenkins.tails.boum.org/job/test_Tails_ISO_test-9521-chutney-with-fragile-scenarios/40/cucumberTestReport/the-tor-enforcement-is-effective/anti-test_-detecting-udp-leaks-of-dns-lookups-with-the-firewall-leak-detector/.

#45 Updated by intrigeri about 2 years ago

So, here are my test results:

  • I see most time syncing scenarios with a modified clock fail (timeout in "Tor is ready") twice in a row. Interestingly, the two "Clock is one day in the future" scenarios pass each time.
  • (probably caused by dogtail rather than chutney) I see "When I disable the first persistence preset" fail: on the video, I see the apps->Tails submenu is open, but the persistent volume assistant entry is never clicked. Want a ticket about it?
  • (probably unrelated to chutney) I see the second "Then Pidgin automatically enables my XMPP account" fail in "Scenario: Using a persistent Pidgin configuration": I see the account manager, and the XMPP account we created before rebooting is not listed (what?!). OTOH, "Pidgin has the expected persistent accounts configured" has succeeded, so clearly something is wrong; and FWIW, I've verified that the random IRC nickname listed in the account manager is the same as during the previous boot. Shall I report this separately?

#46 Updated by anonym about 2 years ago

intrigeri wrote:

  • I see most time syncing scenarios with a modified clock fail (timeout in "Tor is ready") twice in a row. Interestingly, the two "Clock is one day in the future" scenarios pass each time.

Interesting. I'm not sure what to do with this. We know that "tordate" is a mess. I really do not expect us to try to fix it -- in fact, I hope we don't and instead spend that energy on #5774 (Robust time syncing). Hence, I would actually like to remove all "tordate"-related scenarios from time_syncing.feature (i.e. all scenarios but the last two), and only add more scenarios once #5771 is solved.

So, to me it doesn't really matter if Chutney made "tordate" less robust. "tordate" is dead to me :). Besides, given how crazy "tordate" is at depending on exact behavior of the client vs the rest of the network, it surely could be that Chutney's network and the real network differs enough that "tordate" will work differently between them.

  • (probably caused by dogtail rather than chutney) I see "When I disable the first persistence preset" fail: on the video, I see the apps->Tails submenu is open, but the persistent volume assistant entry is never clicked. Want a ticket about it?

Please!

  • (probably unrelated to chutney) I see the second "Then Pidgin automatically enables my XMPP account" fail in "Scenario: Using a persistent Pidgin configuration": I see the account manager, and the XMPP account we created before rebooting is not listed (what?!). OTOH, "Pidgin has the expected persistent accounts configured" has succeeded, so clearly something is wrong; and FWIW, I've verified that the random IRC nickname listed in the account manager is the same as during the previous boot. Shall I report this separately?

Please!

#47 Updated by anonym about 2 years ago

intrigeri wrote:

I'm doing a full test suite run (and will report back here later), but sending back to anonym's plate so that he can handle failures such as https://jenkins.tails.boum.org/job/test_Tails_ISO_test-9521-chutney-with-fragile-scenarios/40/cucumberTestReport/the-tor-enforcement-is-effective/anti-test_-detecting-udp-leaks-of-dns-lookups-with-the-firewall-leak-detector/.

I am pretty sure that the static sleep I added in 7675d1e is enough for my system, but not on jenkins. If I remove it, I can reproduce the same problem most of the time. I am pretty sure if I would bump it, it'd work on jenkins too.

I spent some time researching this and found a seemingly perfect solution in ca2b65a (see its commit message in particular). It works perfectly on my system, but let's see how it does on jenkins.

#48 Updated by anonym about 2 years ago

  • Blocks Bug #9654: "IPv4 TCP non-Tor Internet hosts were contacted" during the test suite added

#49 Updated by intrigeri about 2 years ago

Interesting. I'm not sure what to do with this. We know that "tordate" is a mess. I really do not expect us to try to fix it -- in fact, I hope we don't and instead spend that energy on #5774 (Robust time syncing).

ACK.

Hence, I would actually like to remove all "tordate"-related scenarios from time_syncing.feature (i.e. all scenarios but the last two), and only add more scenarios once #5771 is solved.

Let's do that, but keep "Clock with host's time", "Clock with host's time in bridge mode", and the last three scenarios, OK?

So, to me it doesn't really matter if Chutney made "tordate" less robust. "tordate" is dead to me :). Besides, given how crazy "tordate" is at depending on exact behavior of the client vs the rest of the network, it surely could be that Chutney's network and the real network differs enough that "tordate" will work differently between them.

Sounds totally reasonable to be, let's be pragmatic here.

#50 Updated by anonym about 2 years ago

Ack, most "tordate" scenarios were removed in ffba63d.

#51 Updated by anonym about 2 years ago

  • Blocks Bug #10440: Time syncing scenarios are fragile added

#52 Updated by intrigeri about 2 years ago

  • (probably caused by dogtail rather than chutney) I see "When I disable the first persistence preset" fail: on the video, I see the apps->Tails submenu is open, but the persistent volume assistant entry is never clicked. Want a ticket about it?

Please!

I've fixed it in 41952f6, works for me. Pushed directly to devel, please take a look.

  • (probably unrelated to chutney) I see the second "Then Pidgin automatically enables my XMPP account" fail in "Scenario: Using a persistent Pidgin configuration": I see the account manager, and the XMPP account we created before rebooting is not listed (what?!). OTOH, "Pidgin has the expected persistent accounts configured" has succeeded, so clearly something is wrong; and FWIW, I've verified that the random IRC nickname listed in the account manager is the same as during the previous boot. Shall I report this separately?

Now known as #11413.

#53 Updated by intrigeri about 2 years ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 70 to 90
  • QA Check changed from Dev Needed to Ready for QA

So, we compared test results on Jenkins (devel vs. this branch, and devel+fragile vs. this branch+fragile), and it looks like we're good and can merge this! I'll do a last full test suite run locally and then I expect to merge.

#54 Updated by intrigeri about 2 years ago

  • Status changed from In Progress to Fix committed
  • Assignee deleted (intrigeri)
  • % Done changed from 90 to 100
  • QA Check changed from Ready for QA to Pass

Merged! Now you can look at the bunch of tickets that were blocked by this one, and hopefully a few of those will be fixed for free :)

#55 Updated by sajolida about 2 years ago

Woohoo!

#56 Updated by anonym about 2 years ago

  • Status changed from Fix committed to Resolved

#57 Updated by BitingBird about 2 years ago

  • Priority changed from Elevated to Normal

Also available in: Atom PDF