Project

General

Profile

Bug #13426

Tor does not start on Tails 3.0.1 automatically upgraded from 3.0

Added by conorsch 5 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
-
Category:
Tor configuration
Target version:
Start date:
07/05/2017
Due date:
% Done:

100%

QA Check:
Feature Branch:
Type of work:
Research
Blueprint:
Easy:
Affected tool:

Description

Upgrading Tails to 3.0.1 via the Tails upgrade frontend causes the "tor" service to break on subsequent boots. Likely this is related to the tor upgrade to version 0.3.0.9-1~d90.stretch+1, described in #13253.

I've confirmed breakage on multiple machines, both hardware and libvirt VMs, when upgrading from Tails 3.0 -> 3.0.1. I have not confirmed that the breakage exists in a fresh install of 3.0.1.

Due to the Tails update flow happening over tor, I expect the resolution here to involve manual changes. Happy to test any solutions suggested, and will work on resolving the problem locally in the meantime.

Associated revisions

Revision a4d24aab (diff)
Added by intrigeri 5 months ago

Drop the 3.0 → 3.0.1 automatic upgrade path (refs: #13426).

I was not able to find a workaround we could document for #13426,
so let's not point users to something we know is broken.

Revision 6217c2f4 (diff)
Added by anonym 4 months ago

Publish instructions for how to repair a Tails 3.0.1 broken by the automatic upgrade.

Refs: #13426

Revision eca3d100 (diff)
Added by intrigeri 4 months ago

Clarify a bit what needs to be tested on the incrementally upgraded system (refs: #13426).

Revision ae05e98a (diff)
Added by bertagaz 4 months ago

Fill the hole about available IUKs for 3.1.

Refs: #13570, #13426

Revision 91e2e074 (diff)
Added by bertagaz 4 months ago

Disable 3.0.1 to 3.1 incremental upgrade.

It's broken by refs: #13426.

History

#1 Updated by conorsch 5 months ago

Relevant log lines:

$ sudo journalctl -af
[sudo] password for amnesia: 
-- Logs begin at Wed 2017-07-05 22:52:18 UTC. --
Jul 05 23:04:33 amnesia onion-grater[5333]: /usr/bin/onioncircuits (pid: 18913, user: amnesia, port: 55098, filter: onioncircuits) connected: loaded filter: onioncircuits
Jul 05 23:04:33 amnesia onion-grater[5333]: /usr/bin/onioncircuits (pid: 18913, user: amnesia, port: 55098, filter: onioncircuits) disconnected: client quit
Jul 05 23:04:33 amnesia onion-grater[5333]: ----------------------------------------
Jul 05 23:04:33 amnesia onion-grater[5333]: Exception happened during processing of request from ('127.0.0.1', 55098)
Jul 05 23:04:33 amnesia onion-grater[5333]: Traceback (most recent call last):
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/lib/python3.5/socketserver.py", line 625, in process_request_thread
Jul 05 23:04:33 amnesia onion-grater[5333]:     self.finish_request(request, client_address)
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
Jul 05 23:04:33 amnesia onion-grater[5333]:     self.RequestHandlerClass(request, client_address, self)
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
Jul 05 23:04:33 amnesia onion-grater[5333]:     self.handle()
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/local/lib/onion-grater", line 630, in handle
Jul 05 23:04:33 amnesia onion-grater[5333]:     self.controller = self.connect_to_real_control_port()
Jul 05 23:04:33 amnesia onion-grater[5333]:   File "/usr/local/lib/onion-grater", line 566, in connect_to_real_control_port
Jul 05 23:04:33 amnesia onion-grater[5333]:     with open(global_args.control_cookie_path, "rb") as f:
Jul 05 23:04:33 amnesia onion-grater[5333]: FileNotFoundError: [Errno 2] No such file or directory: '/run/tor/control.authcookie'

Tailing journalctl while bouncing the failing tor service:

Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Failed with result 'exit-code'.
Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Service hold-off time over, scheduling restart.
Jul 05 23:21:25 amnesia systemd[1]: systemd-timesyncd.service: Cannot add dependency job, ignoring: Unit systemd-timesyncd.service is masked.
Jul 05 23:21:25 amnesia systemd[1]: Stopped Anonymizing overlay network for TCP.
Jul 05 23:21:25 amnesia systemd[1]: Starting Anonymizing overlay network for TCP...
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.932 [notice] Tor 0.3.0.9 (git-100816d92ab5664d) running on Linux with Libevent 2.0.21-stable, OpenSSL 1.1.0f and Zlib 1.2.8.
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.938 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.943 [notice] Read configuration file "/usr/share/tor/tor-service-defaults-torrc".
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.943 [notice] Read configuration file "/etc/tor/torrc".
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.945 [warn] The ControlListenAddress option is deprecated, and will most likely be removed in a future version of Tor. Use ControlPort instead. (If you think this is a mistake, please let us know!)
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.946 [warn] The TransListenAddress option is deprecated, and will most likely be removed in a future version of Tor. Use TransPort instead. (If you think this is a mistake, please let us know!)
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.946 [warn] The WarnUnsafeSocks option is deprecated, and will most likely be removed in a future version of Tor. Changing this option makes it easier for you to accidentally lose your anonymity by leaking DNS information (If you think this is a mistake, please let us know!)
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.951 [warn] Directory /var/lib/tor cannot be read: Permission denied
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.951 [warn] Failed to parse/validate config: Couldn't access/create private data directory "/var/lib/tor" 
Jul 05 23:21:25 amnesia tor[31125]: Jul 05 23:21:25.951 [err] Reading config failed--see warnings above.
Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Control process exited, code=exited status=1
Jul 05 23:21:25 amnesia systemd[1]: Failed to start Anonymizing overlay network for TCP.
Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Unit entered failed state.
Jul 05 23:21:25 amnesia systemd[1]: tor@default.service: Failed with result 'exit-code'.

$ sudo systemctl status tor@default.service
[sudo] password for amnesia: 
● tor@default.service - Anonymizing overlay network for TCP
   Loaded: loaded (/lib/systemd/system/tor@default.service; static; vendor preset: enabled)
  Drop-In: /lib/systemd/system/tor@default.service.d
           └─fix-obfs4proxy.conf, writable-etc-tor.conf
   Active: failed (Result: exit-code) since Wed 2017-07-05 22:45:12 UTC; 15min ago
  Process: 10399 ExecStartPre=/usr/bin/tor --defaults-torrc /usr/share/tor/tor-service-defaults-torrc -f /etc/tor/torrc --RunAsDaemon 0 --verify-config (code=exited, status=1/FAILURE)
  Process: 10397 ExecStartPre=/usr/bin/install -Z -m 02755 -o debian-tor -g debian-tor -d /var/run/tor (code=exited, status=0/SUCCESS)

Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Control process exited, code=exited status=1
Jul 05 22:45:12 amnesia systemd[1]: Failed to start Anonymizing overlay network for TCP.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Unit entered failed state.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Failed with result 'exit-code'.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Service hold-off time over, scheduling restart.
Jul 05 22:45:12 amnesia systemd[1]: Stopped Anonymizing overlay network for TCP.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Start request repeated too quickly.
Jul 05 22:45:12 amnesia systemd[1]: Failed to start Anonymizing overlay network for TCP.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Unit entered failed state.
Jul 05 22:45:12 amnesia systemd[1]: tor@default.service: Failed with result 'exit-code'.

#5 Updated by goupille 5 months ago

as it seems to happen only with the automatic upgrade, affected users should try to upgrade manually :

https://tails.boum.org/upgrade

#6 Updated by goupille 5 months ago

  • Assignee set to intrigeri
  • Type of work changed from Sysadmin to Research
  • Affected tool set to Tor Launcher

#7 Updated by goupille 5 months ago

  • Status changed from New to Confirmed

#8 Updated by intrigeri 5 months ago

  • Subject changed from Tails 3.0.1 upgrade breaks tor to Tor does not start on Tails 3.0.1 automatically upgraded from 3.0
  • Target version set to Tails_3.1

#9 Updated by intrigeri 5 months ago

  • Affected tool deleted (Tor Launcher)

(This has nothing to do about Tor Launcher, has it?)

#10 Updated by intrigeri 5 months ago

@help desk: I was hoping I could document a simple workaround, but we're not there yet.

On 3.0 started from USB:

drwx--S--- 2 108 115 3 Jun 10 14:39 /lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/

On 3.0.1 started from DVD:

drwx--S--- 2 109 116 3 Jul  4 13:59 /lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/

So the UID/GID for the debian-tor user/group have changed (most likely due to #13427).

And inside 3.0.1.squashfs that's in the IUK, as expected:

drwx--S--- 2 109 116 3 Jul  4 13:59 tmp/var/lib/tor/

On a USB drive automatically upgraded from 3.0 to 3.0.1:

drwx--S--- 2 109 116 3 Jul  4 13:59 /var/lib/tor/
drwx--S--- 2 109 116 3 Jul  4 13:59 /lib/live/mount/rootfs/3.0.1.squashfs/var/lib/tor/
drwx--S--- 2 108 115 3 Jun 10 14:39 /lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/

# grep debian-tor /etc/passwd
debian-tor:x:109:116::/var/lib/tor:/bin/false

… so far, so good.

But:

# sudo -u debian-tor ls -l /var/lib/tor
ls: cannot open directory '/var/lib/tor': Permission denied

# sudo -u '#109' ls -l /var/lib/tor
ls: cannot open directory '/var/lib/tor': Permission denied

# sudo -u '#108' ls -l /var/lib/tor
ls: cannot open directory '/var/lib/tor': Permission denied

… while interestingly:

# sudo -u debian-tor ls -l /lib/live/mount/rootfs/3.0.1.squashfs/var/lib/tor/
total 0

… and as expected:

# sudo -u debian-tor ls -l /lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/
ls: cannot open directory '/lib/live/mount/rootfs/filesystem.squashfs/var/lib/tor/': Permission denied

FWIW, manually running chown debian-tor:debian-tor /var/lib/tor doesn't change that.

Initially I suspected the changes we did in the SquashFS delta generation for #11974, but 1. these changes should affect only hardlinked files; 2. they went in 3.0~rc1 and nobody complained about this while we provided an incremental upgrade path from 3.0~rc1 to 3.0. OTOH perhaps that IUK simply did not do the kind of things that trigger this bug.

So it very much looks like we've hit a bug in aufs here, where UID/GID overriden by a top-most branch are taken into account when looking at files' metadata, but not when actually accessing these files. Wow!

#11 Updated by conorsch 5 months ago

Thanks for sharing the analysis, intrigeri! Confirmed that upgrading the tor package under Tails 3.0 does not cause similar breakage. I had to manually bounce the tor service after the apt upgrade, but thereafter tor continued to work well. So the breakage does seem to have come in through the Tails 3.0.1 upgrade specifically, as your comment seems to indicate.

#12 Updated by intrigeri 5 months ago

FWIW sudo -u debian-tor strace ls /var/lib/tor says: open("/var/lib/tor", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 EACCES (Permission denied).

#13 Updated by intrigeri 5 months ago

And sudo -u debian-tor ls -l /lib/live/mount/overlay/var/lib/tor/ works just fine. So the debian-tor user has access to all relevant components of our aufs stack (read-write tmpfs > read-only 3.0.1.squashfs > read-only filesystem.squashfs), just not to the lowest component (filesystem.squashfs), but it doesn't have access to the aufs "filesystem" resulting from that stack. "Interesting."

#14 Updated by intrigeri 5 months ago

At this point I'm lost, and it seems the next steps for investigating this problem are low-level, way outside of my comfort/skill-zone :(

So I'm open to all suggestions wrt. what I should try next. If anyone here knows people who are knowledgeable in low-level stuff, kernel debugging and this kind of stuff, please ask them to take a look. In 3.0.1 we ship linux-image-4.9.0-3-amd64 (4.9.30-2+deb9u2) and aufs-dkms (4.9+20161219-1), both from Stretch (+ security).

#15 Updated by intrigeri 5 months ago

Disabled the incremental upgrade path from 3.0 to 3.0.1, so that nobody who's not been affected yet will be (this time / until we fix the root cause or do #8415).

#16 Updated by intrigeri 5 months ago

  • Status changed from Confirmed to In Progress

#17 Updated by conorsch 5 months ago

Blocking the upgrade path is a solid plan, thanks for doing that. For folks who already updated, is there a manual workaround for resolution? Seems like no, and the surest path forward for users who did upgrade to 3.0.1 is to recreate a 3.0 Tails stick and migrate.

#18 Updated by msheiny 5 months ago

conorsch wrote:

Blocking the upgrade path is a solid plan, thanks for doing that. For folks who already updated, is there a manual workaround for resolution? Seems like no, and the surest path forward for users who did upgrade to 3.0.1 is to recreate a 3.0 Tails stick and migrate.

Looks like running the tails installer from a 3.0.1 stick to upgrade a 3.0 stick works. I just did this on a stick that broke tor during the online upgrade path (3.0->3.0.1) and things look like they are working a-okay.

#19 Updated by msheiny 5 months ago

msheiny wrote:

conorsch wrote:

Blocking the upgrade path is a solid plan, thanks for doing that. For folks who already updated, is there a manual workaround for resolution? Seems like no, and the surest path forward for users who did upgrade to 3.0.1 is to recreate a 3.0 Tails stick and migrate.

Looks like running the tails installer from a 3.0.1 stick to upgrade a 3.0 stick works. I just did this on a stick that broke tor during the online upgrade path (3.0->3.0.1) and things look like they are working a-okay.

To clarify, I was able to fix my broken 3.0.1 install (that was the product of doing an online upgrade from 3.0 --> 3.0.1) by doing the install from clone on a new 3.0.1 ISO.

  • Boot from new 3.0.1 ISO
  • Insert broken 3.0.1 USB installation, and run tails installer through cloning
  • Remove ISO, boot back to "broken" 3.0.1 -- which now has tor repaired.

#20 Updated by franps 4 months ago

I have upgraded tails in a manual way and the problem I had related to automatically upgrading (tor didn't boot) has dissapeared.
Thanks

#21 Updated by intrigeri 4 months ago

  • Status changed from In Progress to Resolved
  • Assignee deleted (intrigeri)
  • % Done changed from 0 to 100

Two weeks later, I really don't see what I could do about this problem. The short-term problem was addressed by disabling this incremental upgrade and documenting how to repair a broken system. I took two actions for the longer term:

  • increase the chances that we detect such issues before releasing: eca3d1001236570cc6a26fd2a961710a0e151ca2
  • add #8415 to the list of candidates for our next roadmap update; the root cause of this bug is probably in aufs, and hopefully overlayfs would not have exposed it.

And I'm now closing ths ticket as it's not actionable.

I'm still open to suggestions if anyone wants to help me debug this problem.

One option would be to report a bug somewhere, but our current reproducer is pretty hard to set up so I guess one would have to share a disk image (USB stick with 3.0 + the broken 3.0.1 upgrade) and point developers to it; I suspect they won't like it much. So I think I'd rather spend time on #8415 instead.

#22 Updated by bertagaz 3 months ago

As a note somewhere, during 3.1 IUK tests, spriver noticed that the 3.0.1->3.1 upgrade path did not work. It was exposing the same uid mess preventing Tor to start correctly. So I've disabled this upgrade path.

Hopefully we won't get hit again, as long as the debian-tor uid/gid isn't changed between releases because of something like exim to be installed and deinstalled at some point of the build. That's maybe something to check with a regression test, depending on the advancement of #8415?

#23 Updated by intrigeri 3 months ago

Hopefully we won't get hit again, as long as the debian-tor uid/gid isn't changed between releases because of something like exim to be installed and deinstalled at some point of the build. That's maybe something to check with a regression test, depending on the advancement of #8415?

Please file a dedicated ticket if you think eca3d1001236570cc6a26fd2a961710a0e151ca2 is not enough.

Also available in: Atom PDF