Feature #12411

Stop restarting tor if bootstrapping stalls

Added by anonym 3 months ago. Updated 2 months ago.

Status:ResolvedStart date:03/29/2017
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:-
Target version:Tails_2.12
QA Check:Pass Blueprint:
Feature Branch:feature/12411-tor-bootstrap-fixes Easy:
Type of work:Code Affected tool:

Description

While investigating #10238 preliminary tests indicate that tor might have fixed the issues we used to experience with the bootstrap process stalling and requiring a restart to kickstart it (#9516). Let's see if we can get rid of this crap!


Related issues

Related to Tails - Bug #10238: Investigate why Tor sometimes is slow at or even fails bootstrapping Rejected 09/23/2015

Associated revisions

Revision 729baa2b
Added by anonym 3 months ago

Stop restarting tor if bootstrapping stalls.

Preliminary tests indicate that tor might have fixed the issues we
used to experience with the bootstrap process stalling and requiring a
restart to kickstart it.

Refs: #10238, #9516
Will-fix: #12411

Revision bd53a93b
Added by anonym 3 months ago

Test suite: no need to spawn `restart-tor` when restoring snapshots any more.

Refs: #12411

Revision 1684decc
Added by anonym 3 months ago

Merge remote-tracking branch 'origin/feature/12411-tor-bootstrap-fixes' into devel

Refs: #12411

Revision d13b6a5d
Added by intrigeri 2 months ago

Merge remote-tracking branch 'origin/feature/12411-tor-bootstrap-fixes' into testing (refs: #12411)

History

#1 Updated by anonym 3 months ago

  • % Done changed from 0 to 20
  • QA Check set to Ready for QA
  • Feature Branch set to feature/12411-tor-bootstrap-fixes

Let's see what Jenkins thinks. I intend to test this branch many times before considering it further for inclusion in Tails 2.12.

#2 Updated by anonym 3 months ago

  • Status changed from Confirmed to In Progress

#3 Updated by intrigeri 3 months ago

anonym wrote:

Let's see what Jenkins thinks. I intend to test this branch many times before considering it further for inclusion in Tails 2.12.

Let me know if you want me to run it many times on a poorer Internet connection.

#4 Updated by anonym 3 months ago

intrigeri wrote:

anonym wrote:

Let's see what Jenkins thinks. I intend to test this branch many times before considering it further for inclusion in Tails 2.12.

Let me know if you want me to run it many times on a poorer Internet connection.

My branch seem to make /usr/local/sbin/tor-has-bootstrapped always fail, so I have to try to understand how all that systemd state works again.

Until then I'd appreciate if you could build and run the test suite for these branches:

  • test/tor-bootstrap-robustness_tor-0.2.9.x
  • test/tor-bootstrap-robustness_tor-0.3.1.x

#5 Updated by intrigeri 3 months ago

My branch seem to make /usr/local/sbin/tor-has-bootstrapped always fail, so I have to try to understand how all that systemd state works again.

I can give a hand if needed (over the week-end, or on Wednesday).

Until then I'd appreciate if you could build and run the test suite for these branches:

ACK. On which ticket shall I report back?

#6 Updated by anonym 3 months ago

intrigeri wrote:

My branch seem to make /usr/local/sbin/tor-has-bootstrapped always fail, so I have to try to understand how all that systemd state works again.

I can give a hand if needed (over the week-end, or on Wednesday).

I've fixed it, so please test the feature branch as well!

Until then I'd appreciate if you could build and run the test suite for these branches:

ACK. On which ticket shall I report back?

Here is good, but I'll treat those results with a grain of salt since they also kill tordate. IMHO they still give good data about tor-specific bootstrapping issues.

#7 Updated by intrigeri 3 months ago

I've fixed it, so please test the feature branch as well!

OK, I'm on it.

Until then I'd appreciate if you could build and run the test suite for these branches:

ACK. On which ticket shall I report back?

Here is good, but I'll treat those results with a grain of salt since they also kill tordate. IMHO they still give good data about tor-specific bootstrapping issues.

I've run the test suite from both branches and they completed successfully, with a (download) saturated and relatively slow Internet connection.

#8 Updated by intrigeri 3 months ago

I've fixed it, so please test the feature branch as well!

Passes!

#9 Updated by intrigeri 3 months ago

Please reassign to me as soon as the branch builds again and is ready for code review (even if you want to run more tests on it).

#10 Updated by anonym 3 months ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 20 to 50

So not a single bootstrap issue has been observed by you, me, sid-Jenkins and lizard-Jenkins -- please review'n'merge!

#11 Updated by intrigeri 3 months ago

  • Related to Bug #10238: Investigate why Tor sometimes is slow at or even fails bootstrapping added

#12 Updated by intrigeri 3 months ago

  • Assignee changed from intrigeri to anonym

Code review at 4d3ba48037be1f42576c75b7092feda3079c8614:

  • In get_tor_control_socket_path(), I think we should also look in /usr/share/tor/tor-service-defaults-torrc (this is the place where the default socket is configured on Debian), otherwise looking in /etc/tor/torrc is kinda futile.
  • It's a bit sad to add a dependency on socat, while we already include nc that (iirc) can talk to Unix sockets.
  • I'm curious why tor_bootstrap_progress() doesn't need to pass -n to echo anymore.
  • I'd rather see extended regexp:s, because I would prefer not to have to learn another regexp language.

IMO none of these are real blockers for merging, but I would appreciate if the 3 first items on this list were fixed during the freeze.

So please make sure that Scenario: Clock with host's time in bridge mode passes (it's disabled on Jenkins), merge, and reassign to you for these fixes.

What's "sid-Jenkins"?

#13 Updated by anonym 3 months ago

  • Assignee changed from anonym to intrigeri
  • QA Check changed from Ready for QA to Info Needed

intrigeri wrote:

Code review at 4d3ba48037be1f42576c75b7092feda3079c8614:

  • In get_tor_control_socket_path(), I think we should also look in /usr/share/tor/tor-service-defaults-torrc (this is the place where the default socket is configured on Debian), otherwise looking in /etc/tor/torrc is kinda futile.

Good point! Will fix.

  • It's a bit sad to add a dependency on socat, while we already include nc that (iirc) can talk to Unix sockets.

We install netcat-traditional, which can't talk to UNIX sockets. netcat-openbsd can, though. I didn't want to potentially break other stuff we use nc/netcat by switching implementation, but I realize that was perhaps a bit overly cautious since our only other uses are:

features/step_definitions/tor.rb:    cmd = "echo | netcat #{host} #{port}" 
features/step_definitions/tor.rb:    cmd = "echo | netcat -u #{host} #{port}" 

So, I could try switching to netcat-openbsd and make sure that the tests affected still pass. Ok?

  • I'm curious why tor_bootstrap_progress() doesn't need to pass -n to echo anymore.

I'm curious why it was ever needed. Do you have an explanation?

IMHO, in general, if you need to worry about newlines in shell function "return values" (now I'm talking about what they write, not the return status) you are doing something wrong. At least with my current understanding of shell expansion rules. Am I missing something?

  • I'd rather see extended regexp:s, because I would prefer not to have to learn another regexp language.

As someone who learned regexes through via grep and sed without --extended-regexp I could return your argument right back to you, but I get that probably more people than not just get confused by all those escapes. :)

IMO none of these are real blockers for merging, but I would appreciate if the 3 first items on this list were fixed during the freeze.

I think I'll try to do all this before merging any way. The amount of work needed to track this will be in the same order as actually fixing them. :)

So please make sure that Scenario: Clock with host's time in bridge mode passes (it's disabled on Jenkins), merge, and reassign to you for these fixes.

Yes!

What's "sid-Jenkins"?

I meant si*b*-Jenkins. :)

#14 Updated by intrigeri 3 months ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Info Needed to Dev Needed

intrigeri wrote:

  • It's a bit sad to add a dependency on socat, while we already include nc that (iirc) can talk to Unix sockets.

We install netcat-traditional, which can't talk to UNIX sockets. netcat-openbsd can, though.

FWIW we have switched to netcat-openbsd on feature/stretch :)

I didn't want to potentially break other stuff we use nc/netcat by switching implementation,

Makes sense.

but I realize that was perhaps a bit overly cautious since our only other uses are:

> features/step_definitions/tor.rb:    cmd = "echo | netcat #{host} #{port}" 
> features/step_definitions/tor.rb:    cmd = "echo | netcat -u #{host} #{port}" 
> 

… and I bet we already have adapted the code to use netcat-openbsd on feature/stretch :)

So, I could try switching to netcat-openbsd and make sure that the tests affected still pass. Ok?

Yes :)

  • I'm curious why tor_bootstrap_progress() doesn't need to pass -n to echo anymore.

I'm curious why it was ever needed. Do you have an explanation?

I guess someone (possibly me) wanted to ensure that we return an integer, not an integer followed by a newline.

IMHO, in general, if you need to worry about newlines in shell function "return values" (now I'm talking about what they write, not the return status) you are doing something wrong. At least with my current understanding of shell expansion rules. Am I missing something?

I'm probably missing something wrt. how a function's stdout is captured by $(). Forget it.

  • I'd rather see extended regexp:s, because I would prefer not to have to learn another regexp language.

As someone who learned regexes through via grep and sed without --extended-regexp I could return your argument right back to you, but I get that probably more people than not just get confused by all those escapes. :)

Forget it then (even though all the programming languages we use in Tails default to something like PCRE, so it's not as if we were going to standardize on old-school regexps any time soon).

I think I'll try to do all this before merging any way. The amount of work needed to track this will be in the same order as actually fixing them. :)

OK.

#15 Updated by anonym 3 months ago

  • Assignee changed from anonym to intrigeri
  • % Done changed from 50 to 60
  • QA Check changed from Dev Needed to Ready for QA

intrigeri wrote:

intrigeri wrote:
but I realize that was perhaps a bit overly cautious since our only other uses are:
[...]

… and I bet we already have adapted the code to use netcat-openbsd on feature/stretch :)

Good!

So, I could try switching to netcat-openbsd and make sure that the tests affected still pass. Ok?

Yes :)

Done, but then I found a bug in tor, see 8b950d5902ad1a157a99cca234b5ddd32e661c89. So we're back to TCP communication again, and I had to disable PrivateNetwork=yes in a systemd unit file that were using the tor.sh library.

  • I'd rather see extended regexp:s, because I would prefer not to have to learn another regexp language.

As someone who learned regexes through via grep and sed without --extended-regexp I could return your argument right back to you, but I get that probably more people than not just get confused by all those escapes. :)

Forget it then (even though all the programming languages we use in Tails default to something like PCRE, so it's not as if we were going to standardize on old-school regexps any time soon).

Note that I was mostly teasing you! :) I have switched to extended regexes in this file, and will try to remember doing so for any future shell scripting I do.

I think I'll try to do all this before merging any way. The amount of work needed to track this will be in the same order as actually fixing them. :)

OK.

In the end I didn't, and I wish I had at least tested "bridge mode" and Tor Launcher, cause it's broken, and will be so in 2.12~rc1. The current state of the feature branch should fix this though => please review'n're-merge into testing and devel!

#16 Updated by intrigeri 2 months ago

  • Assignee changed from intrigeri to anonym
  • QA Check changed from Ready for QA to Dev Needed

Hi!

Note that I was mostly teasing you! :) I have switched to extended regexes in this file, and will try to remember doing so for any future shell scripting I do.

Thanks! :)

In the end I didn't, and I wish I had at least tested "bridge mode" and Tor Launcher, cause it's broken, and will be so in 2.12~rc1.

Ouch :/

The current state of the feature branch should fix this though => please review'n're-merge into testing and devel!

Merged, as I understand this does improve the current state of things.

Some bonus nitpicking (surprise!):

  • control_port="$(tor_rc_lookup ControlPort)" assumes that no flag is set. Perhaps reintroduce | awk '{ print $1 }') that you had added in tor_control_socket_path?
  • In theory, ControlPort accepts unix:path and auto values: Maybe make sure it contains only digits?

Feel free to tell me that you have more important Foundations Team stuff to work on than addressing these, if that's how you feel :) If that's the case, mark this ticket as Fix Committed.

#17 Updated by anonym 2 months ago

  • Status changed from In Progress to Fix committed
  • Assignee deleted (anonym)
  • % Done changed from 60 to 100
  • QA Check changed from Dev Needed to Pass

intrigeri wrote:

The current state of the feature branch should fix this though => please review'n're-merge into testing and devel!

Merged, as I understand this does improve the current state of things.

Thanks!

Some bonus nitpicking (surprise!):

  • control_port="$(tor_rc_lookup ControlPort)" assumes that no flag is set. Perhaps reintroduce | awk '{ print $1 }') that you had added in tor_control_socket_path?
  • In theory, ControlPort accepts unix:path and auto values: Maybe make sure it contains only digits?

Feel free to tell me that you have more important Foundations Team stuff to work on than addressing these, if that's how you feel :) If that's the case, mark this ticket as Fix Committed.

Yeah, I guess I should focus on other things: my inner compulsively refactoring perfectionist wanted me to do things like these, but I thought "we are not really trying to make a user-friendly general purpose library for others' consumption, so let's cheat by making some assumptions that likely will remain true for Tails in order to avoid over-engineering this". If we are gonna do all this correctly, we should also be able to handle multiple ControlPort directives, ControlPort unix:PATH vs the equiavalent ControlSocket PATH, ControlListenAddress etc. Let's not bother!

#18 Updated by intrigeri 2 months ago

Let's not bother!

ACK

#19 Updated by anonym 2 months ago

  • Status changed from Fix committed to Resolved

Also available in: Atom PDF