Project

General

Profile

Feature #3543

Support bup as a backup method

Added by LeLutin almost 6 years ago. Updated over 2 years ago.

Status:
In Progress
Priority:
Low
Assignee:
-
Category:
new handler
Target version:
-
Start date:
10/17/2011
Due date:
% Done:

50%

QA Check:
Ready for QA

Description

Hello,

I've been working on a project that promises a lot: bup

I've created a very simplistic handler for making it possible to backup with using bup, either to a local or a remote (via ssh) bup repository:

https://github.com/lelutin/puppet-backupninja/tree/bup_helper

bup is still not at 1.0 yet, and it's still missing some important features (e.g. saving file metadata) and a couple of performance optimizations, but I thought creating an adapter to backupninja could help people to get to use and test bup (and so, possibly to report issues to the mailing list), hence pushing the project forward.

History

#1 Updated by intrigeri almost 6 years ago

  • Status changed from New to In Progress
  • Assignee set to LeLutin
  • % Done changed from 0 to 10

Looks good, most welcome!

Lacks documentation update, example config file, dealing with the bup output, and a dup.helper.

Could you please ping us again / reassign to me when the missing bits are implemented?

Besides, are you're using it for real? We've had problems recently with features we've pushed into stable releases, that had not been seriously tested by their authors, and happened to break unrelated stuff or just... not work. So I'm getting (even) more cautious.

#2 Updated by LeLutin almost 6 years ago

intrigeri wrote:

Lacks documentation update, example config file, dealing with the bup output, and a dup.helper.

Could you please ping us again / reassign to me when the missing bits are implemented?

heh, I thought that would be the case :) I wanted to see, first, if the work would be accepted and if I was going in the right direction.

I'll work on those bits and push them on top of the same branch, and I'll let you know when I think I have everything you mentioned.

If I understand it correctly, the bup.helper file is the script that manages the bup part of the interface for ninjahelper?

Besides, are you're using it for real? We've had problems recently with features we've pushed into stable releases, that had not been seriously tested by their authors, and happened to break unrelated stuff or just... not work. So I'm getting (even) more cautious.

that's a perfectly sane concern. to be honest, up to now I've tested the thing in a VM on my laptop and it seemed to do the job. but to avoid the "it works on my laptop" syndrome, I'll run more tests to make sure that everything is going according to plan. Is there an automated testing framework in use for the project?
Also, I'm thinking of starting to use it on my "home" infrastructure really soon, so that'll provide some real-life use, maybe not in a very wide range of situations, but that'll be a start.

I'll try and push the missing bits soon so that others can test it in parallel (and I'll also ask on the bup ML if there are some people interested in helping with the testing).

#3 Updated by intrigeri almost 6 years ago

LeLutin wrote:

I wanted to see, first, if the work would be accepted and if I was going in the right direction.

As far as I am concerned, it would be accepted, and you are heading in the right direction.

If I understand it correctly, the bup.helper file is the script that manages the bup part of the interface for ninjahelper?

Exactly. I suggest you start from rdiff.helper or dup.helper that should have similar options.

Is there an automated testing framework in use for the project?

There is none. Would be most welcome.

Also, I'm thinking of starting to use it on my "home" infrastructure really soon, so that'll provide some real-life use, maybe not in a very wide range of situations, but that'll be a start.

Having it run every day on systems that do something you care about is a pretty good start, much nicer to my ear than the "works twice in a VM on my laptop" effect you are very well aware of.

Please go on with the great work!

#4 Updated by LeLutin almost 6 years ago

WOW! I just saw the stupid mistake I had made: I confused "puppet-backupninja" with "backupninja" ... the repository posted above is wrong. I deleted the above branch from puppet-backupninja (which made no sense ... how was I able to push this from my backupninja repository? ..) and pushed the right thing this time in the right repository. sorry for the bad mixup.

the new branch to check out would be (could you please correct the issue's description to cross out the link and mention that the good one would be down here?):

https://github.com/lelutin/backupninja/tree/bup_helper

there's currently nothing new there since my last message (except that it's placed in the appropriate repository). but I have the .helper file under work and also some modifications for documentation in my worktree.

#5 Updated by LeLutin almost 6 years ago

I was thinking about this: since bup is not stable yet (e.g. not 1.0 yet, and in need of lots of more work), should I put an "(EXPERIMENTAL)" notice on bup's description in the .helper ?

#6 Updated by intrigeri almost 6 years ago

LeLutin wrote:

I was thinking about this: since bup is not stable yet (e.g. not 1.0 yet, and in need of lots of more work), should I put an "(EXPERIMENTAL)" notice on bup's description in the .helper ?

I don't like adding stuff that depends on the current state of bup inside backupninja source code that much: in the future, one may get an upgrade for bup, but not backupninja, and the warning would not make sense.

On the other hand, if future changes in bup are likely to break the backupninja handler and helper you've been working on, yes, this warning looks like a good idea to me. Unfortunately.

#7 Updated by micah almost 6 years ago

WOW! I just saw the stupid mistake I had made: I confused "puppet-backupninja" with "backupninja" ... the repository posted above is wrong. I deleted the above branch from puppet-backupninja (which made no sense ... how was I able to push this from my backupninja repository? ..) and pushed the right thing this time in the right repository. sorry for the bad mixup.

Weird, this sounds like a gitosis bug that you uncovered.

We have a 'backupninja' repository that (according to gitosis), you
don't have access to push to, same with a repository called
"module_backupninja" which is riseup's internal backupninja module that
we publish as part of the puppet modules effort, which you also aren't
configured to have access to. Then there is the 'shared-backupninja'
repository that you are configured to have access to.

I wonder if the - in the names confuses gitosis and is allowing you
access to things that you aren't supposed to have.

Well, another reason to switch away from gitosis to gitolite, which is
maintained...

micah

#8 Updated by LeLutin almost 6 years ago

micah wrote:

WOW! I just saw the stupid mistake I had made: I confused "puppet-backupninja" with "backupninja" ... the repository posted above is wrong. I deleted the above branch from puppet-backupninja (which made no sense ... how was I able to push this from my backupninja repository? ..) and pushed the right thing this time in the right repository. sorry for the bad mixup.

Weird, this sounds like a gitosis bug that you uncovered.

We have a 'backupninja' repository that (according to gitosis), you
don't have access to push to, same with a repository called
"module_backupninja" which is riseup's internal backupninja module that
we publish as part of the puppet modules effort, which you also aren't
configured to have access to. Then there is the 'shared-backupninja'
repository that you are configured to have access to.

oh.. actually the mixup happened between my two repos on github, not on riseup servers. so the bug would be either on git (using 1.7.7.1), or on github's git interface.

#9 Updated by intrigeri over 5 years ago

  • Priority changed from Normal to Low

#10 Updated by LeLutin over 5 years ago

Hello,

Just a little word to say that I've found a little time to work on this. It's not yet complete, though. I pushed my work so far to github, but it's still based on an earlier point in time (the same from which I branched off initially). I'll rebase on top of master when it's complete.

For now, I've added a bup.helper file, some documentation and integration with the Makefiles. I've also started using the helper and found out nnd fixed some bad bugs in the initial bup handler.

I still need to do more testing for the bup.helper file, which I should be able to do in the following days.

I also need to add version verification (although I don't really know how to do that, yet, since bup is still pre-1.0 and doesn't provide clear lists of version compatibility)

Finally I need to add verification of bup exit codes.

oh, and maybe exposing some progress info as debug messages would be cool, but I don't know if it'll be possible with the current code in bup. maybe I should check out how it's done with rdiff-backup.

I can't garantee when I'll have it all, but I just wanted to mention that the project was not dead and that it was progressing little by little.

#11 Updated by LeLutin about 5 years ago

Hi there,

Just pushed a new set of commits (I also rebased my changes on top of current master -- version 1.0.1):

LeLutin wrote:

I also need to add version verification (although I don't really know how to do that, yet, since bup is still pre-1.0 and doesn't provide clear lists of version compatibility)

I mostly copied rdiff-backup's version verification (and I hope that the functions won't be colliding in the function space cause I used the same name "get_version")

Finally I need to add verification of bup exit codes.

done

oh, and maybe exposing some progress info as debug messages would be cool, but I don't know if it'll be possible with the current code in bup. maybe I should check out how it's done with rdiff-backup.

seems like the rdiff-backup handler just doesn't do that. bash is quite limited in what it can do with commands' stderr/stdin :\

I've also added a feature for pull-style backups with the "bup on ..." command. I'll start using that for backing up my own web server this week, so I should be able to iron out the bumps.
Also, another new feature: a list of file/directory exclusions (which I should also be soon using).

I fixed some bugs in the bup handler that I caught while using it. I can't garantee that there aren't any bugs, still, but it's getting along well. I'd give it a little more time for testing, and the changeset should be ready for review then.

I also refactored some code out of the rdiff handler/helper pair and into an 'ssh' library, so this part will need some watchful eyes to make sure that I didn't break anything.

#12 Updated by LeLutin about 5 years ago

intrigeri wrote:

LeLutin wrote:

Is there an automated testing framework in use for the project?

There is none. Would be most welcome.

there's this one that's pretty simple to use :

https://github.com/apenwarr/wvtest

#13 Updated by LeLutin almost 4 years ago

wow this issue has been hanging for way too long.. but I finally managed to get the time to finish this!

I've squashed my work in a short patch series, based on top of current master:

https://github.com/lelutin/backupninja/tree/bup_support

the detailed iterations over the work are available in the "bup_helper" branch, if that can be of interest..

please review the "bup_support" branch for merging with master.

I've been running this handler for at least one year with both "remote" and "on" types (push- and pull-style backups that the handler supports) and it's working well.

#14 Updated by intrigeri almost 4 years ago

Hi,

wow this issue has been hanging for way too long.. but I finally managed to get the
time to finish this!

Congrats!

I've squashed my work in a short patch series, based on top of current master:

Thanks for taking the time to prepare a nice pull request, this is much appreciated :)

please review the "bup_support" branch for merging with master.

Here you go.

commit 859520

First, thanks a lot for doing this refactoring.

Instead of a if/then block that only calls return, I suggest using
simpler constructions like:

ssh_create_root_key ARGS || return

(Bonus: this would prevent me from nitpicking about using string
comparison for $? ;)

Please preserve lexical sorting in my_execbin_SCRIPTS and friends
(autotools files), and in the loading of shared functions
(backupninja.in, ninjahelper.in).

commit b1e3d190c

Wouldn't type = {push,pull} be clearer than {remote,on}? Both use
a remote server, right? I understand we might prefer to use bup's own
concepts here, but in general, backupninja is about abstracting
various tools' terminology behind a common language, when possible.
Your call :)

+function get_version() {

I would find the code clearer if this was called bup_version.

+ echo `$BUP --version`

Why not simply run the command?

+ debug "ssh $host -l $user 'bup --version'"

So the path in $BUP is only used locally, not on the remote host.
If one needs to be configurable, why doesn't the other?

+ if [ $? = 127 ]; then

Please use the integer comparison operator. (Über-nitpicking, sorry.)

+ if [ $? = 127 ]; then
+ fatal "Unable to execute bup on remote server. It probably isn't installed"
+ else
+ echo "$version"
+ fi

I'm a bit worried about bup having other possible failure error codes
than 127. Why not compare with 0 (optionally whitelisting some
non-fatal error codes if needed)?

The include/exclude handling does not support paths with spaces,
right? This is not a blocker for an initial merge, I guess, but
perhaps example.bup could mention this limitation.

+ if [ -n "$user" ]; then
+ HOST_SPEC="${user}$host"
+ fi
[...]
+ ssh $user
$host "mkdir -p $remote_bupdir && bup -d '$remote_bupdir' init"

If $user can be the empty string, then the ssh command can fail,
right? If it can't be the empty string, then perhaps the first test is
overkill. Or did I miss something?

+ fatal "An error occurred while initializing the remote repository."

As a user, I would not find this informative enough to be able to act
and solve the problem. It would be great to capture the stdout/stderr
and pass it through to the user.

if [ -n "$bupdir" ]; then
BUP_DIR="$bupdir"
+ export BUP_DIR
+ debug "BUP_DIR set to: $bupdir"
+fi

Any reason not to do this after the "CHECK CONFIG" sanity checks?
Code uselessly put before sanity checks calls for more code added
before sanity checks, which itself calls for more bugs introduced
later by some other contributor who might not be as comfortable with
the codebase as the initial author. I'm quoting this one as an
example, but it might be that more code could be moved after
the checks.

+ fatal "bup does not have the same version at the source and at the destination."

It would be useful to tell the user what version is at the source, and
what version is at the destination (and we've just computed this info,
so it's pretty cheap).

+ debug $output

Please quote $output, here and in various other places.

Ah, before I forget: generally, we use full caps for variables that
are global to a backupninja run, e.g. programs paths, so
$EXCLUDE_LIST and friends look weird here.

Whenever you rework this a bit, please don't rewrite history, but pile
fixes on top of your current nice branch, so that I don't have to
review it all again. (I know you like nice Git history, so just in
case, I'm mentioning it in passing ;) Thanks in advance!

commit 2443134b

+ formBegin "$bup_title - host system: includes"

As a potential user, it's unclear to me what "host system" means here.

+ while [ -z "$REPLY" -o -z "$bup_type" ]
+ do

Please use consistent code layout throughout the helper,
e.g. the same as in the for loops found above. (I know the codebase
isn't exactly perfect on this front, but that's no reason to introduce
even more discrepancies in new handlers :)

+ bup_type=$REPLY

Please quote $REPLY.

+ bup_directory=${tmp_array0}
+ bup_branch=${tmp_array1}
+ if [ "$bup_type" != "local" ]; then
+ bup_remote_bupdir=${tmp_array2}
+ fi

More quoting needed.

+ formBegin "$bup_title - Specify on which host and with which user bup should login to access the remote repository."

Maybe s/with/as/ ?

+ booleanBox "$bup_title" "This step will create an SSH key for
the local root user with no passphrase (if one does not
already exist), and attempt to copy root's public ssh key to
authorized_keys file of $bup_user@$bup_host. This will allow
the local root to make unattended backups to
$bup_user@$bup_host.\n\n\nAre you sure you want to continue?"

Any reason why we are displaying this at all for local backups?

# Backups with bup are saved on a branch (think git branch).
# # Usually each server/backup origin should have its own branch, but some people
+# # use multiple branches for a same server for inventive purposes.

I find this layout surprising, and uncommon among similar files in our
codebase, but well, why not.

I can't wait the next pull request :)

#15 Updated by intrigeri almost 4 years ago

  • Category changed from backupninja to new handler
  • % Done changed from 10 to 50
  • QA Check set to Dev Needed

#16 Updated by LeLutin almost 4 years ago

thanks for the thorough review :)

I'll fix the trivial parts and then prepare a new pull request once we've agreed on the below comments.

here are some replies to your comments

intrigeri wrote:

Wouldn't type = {push,pull} be clearer than {remote,on}? Both use
a remote server, right? I understand we might prefer to use bup's own
concepts here, but in general, backupninja is about abstracting
various tools' terminology behind a common language, when possible.
Your call :)

hmm here I'm really not sure yet.

while I perfectly understand your point, I'm sure bup will have other methods in the forseeable future that could conflict with those names.

for example, one that's brewing in a contributor's branch is "bup push" that would be used to send a copy of the already backed-up files to another computer while preparing pack files for only what's absent on the remote end. (like git push)

Also, out of lazyness -- but mostly because I don't use it so I wouldn't be testing the feature -- I didn't include support for the already implemented "dumb remote" mode that was put in place so that busybox wouldn't die because it's either leaking memory or handling things very badly (a user on the list was backing up to a home router with storage attached to it and we could identify the issue to busybox itself)

.. so while {push,pull} do sound a lot simpler and more intuitive than the weird bup subcommand names (or argument names..) I'm wary of getting all mixed up in the terminology later.

+function get_version() {

I would find the code clearer if this was called bup_version.

oops, and actually there's already another function with that name: in the rdiff handler. that function was named like that because I started out by copying the rdiff handler and modifying it to make the bup-specific one.

maybe we should name that function "rdiff_version" in the rdiff handler?

+ echo `$BUP --version`

Why not simply run the command?

humm... very good question indeed.. that too is another artifact left from the rdiff handler.

I'll change that to simply calling the command since it outputs to stdout.

+ debug "ssh $host -l $user 'bup --version'"

So the path in $BUP is only used locally, not on the remote host.
If one needs to be configurable, why doesn't the other?

ah yes, I thought about this case. actually, I had started to implement configuration for the remote bup path but then hit a wall in the "bup on" command: it calls bup from $PATH and there's currently no way of specifying an absolute path to use :\

so for now instead of implementing half of the configuration, I backtracked and thought that the remote would be expected to have the executable in $PATH.

it's not ideal :\ I'll drop a word about this on the bup mailing list and see what we can do about it. until then, the best that the handler can achieve is a "best effort" that doesn't do the right thing for the pull (on) method.

+ if [ $? = 127 ]; then
+ fatal "Unable to execute bup on remote server. It probably isn't installed"
+ else
+ echo "$version"
+ fi

I'm a bit worried about bup having other possible failure error codes
than 127. Why not compare with 0 (optionally whitelisting some
non-fatal error codes if needed)?

oh, right that's bad. that 127 code comes from ssh not being able to exec the command, but I should catch errors from bup itself too..

The include/exclude handling does not support paths with spaces,
right? This is not a blocker for an initial merge, I guess, but
perhaps example.bup could mention this limitation.

ah, but this can be added easily. I'll rip off the code already in the rdiff handler ;)

Since there's no comment about this in the rdiff handler, I'm just wondering why it's replacing "__star__" by a real asterisk. Is this an rdiff-backup weirdness or is it something that has to do with backupninja?

+ if [ -n "$user" ]; then
+ HOST_SPEC="${user}$host"
+ fi
[...]
+ ssh $user
$host "mkdir -p $remote_bupdir && bup -d '$remote_bupdir' init"

If $user can be the empty string, then the ssh command can fail,
right? If it can't be the empty string, then perhaps the first test is
overkill. Or did I miss something?

gah, that's a good catch. I added that test so that I'd use $HOST_SPEC later on but I apparently completely forgot to use it.

actually now that I think about it, leaving out the user name means that our handler is influenced by external elements (e.g. the user's ~/.ssh/config file) and that might be very confusing. so I'll make both user and host configurations mandatory and remove that $HOST_SPEC business.

if [ -n "$bupdir" ]; then
BUP_DIR="$bupdir"
+ export BUP_DIR
+ debug "BUP_DIR set to: $bupdir"
+fi

Any reason not to do this after the "CHECK CONFIG" sanity checks?
Code uselessly put before sanity checks calls for more code added
before sanity checks, which itself calls for more bugs introduced
later by some other contributor who might not be as comfortable with
the codebase as the initial author. I'm quoting this one as an
example, but it might be that more code could be moved after
the checks.

hmm yes. this part specifically is needed for the version checking since we're invoking bup to check its version and it can uselessly bug the user about the default repository location. I'll add a comment to explain why that's placed there.

but in a more general fashion, I agree with you. I've moved most of the things that were done before the CHECK CONFIG section down to the EXECUTE section, and it's actually easier to read like that.

+ debug $output

Please quote $output, here and in various other places.

ah yes this.. iirc that's the other issue that I opened about losing new line separations in debug/warning/fatal messages. I'll quote and we'll see about fixing the newlines in the other issue.

+ while [ -z "$REPLY" -o -z "$bup_type" ]
+ do

Please use consistent code layout throughout the helper,
e.g. the same as in the for loops found above. (I know the codebase
isn't exactly perfect on this front, but that's no reason to introduce
even more discrepancies in new handlers :)

hum here I'll assume you mean the disposition of the "do" at the end of the line. am I missing something? (perhaps using [ test ] || [ test ] instead of -o ? )

+ booleanBox "$bup_title" "This step will create an SSH key for
the local root user with no passphrase (if one does not
already exist), and attempt to copy root's public ssh key to
authorized_keys file of $bup_user@$bup_host. This will allow
the local root to make unattended backups to
$bup_user@$bup_host.\n\n\nAre you sure you want to continue?"

Any reason why we are displaying this at all for local backups?

well, you don't need to connect to any server for local backups so it would be weird to force users to create an ssh key for root when configuring a local backup.

# Backups with bup are saved on a branch (think git branch).
# # Usually each server/backup origin should have its own branch, but some people
+# # use multiple branches for a same server for inventive purposes.

I find this layout surprising, and uncommon among similar files in our
codebase, but well, why not.

d'oh! this is an artifact from vim that I didn't see was there.. when using "gq" on a line-selection to wrap text to 80 columns, vim does some weird things when it tries to respect automatic context completion. It works well if you type, press enter then continue typing but barfs out nonsensical double comments when automatically creating new lines :(

anyway, TLDR; I'll remove that ;)

ok, I went through every one of your comments. now I need to test the new code to ensure that I didn't break anything obvious. once I feel I've tested enough I'll send the new commits on top of the same branch.

#17 Updated by intrigeri almost 4 years ago

Hi,

LeLutin wrote:

thanks for the thorough review :)

Well, I kinda regret we didn't do it this way in the good old times.
Surely the codebase would less of a mess if we had. We have to learn
from our mistakes, I guess :)

Below, I'm skipping the parts that are just like "oh, right"
acknowledgment from you, and the parts I full agree with what
you're proposing.

intrigeri wrote:

Wouldn't type = {push,pull} be clearer than {remote,on}? Both use
a remote server, right? I understand we might prefer to use bup's own
concepts here, but in general, backupninja is about abstracting
various tools' terminology behind a common language, when possible.
Your call :)

hmm here I'm really not sure yet.

while I perfectly understand your point, I'm sure bup will have other methods in the
forseeable future that could conflict with those names.
[...]
.. so while {push,pull} do sound a lot simpler and more intuitive than the weird bup
subcommand names (or argument names..) I'm wary of getting all mixed up in the
terminology later.

Fair enough. Let's keep {push,pull}, then.

+function get_version() {

I would find the code clearer if this was called bup_version.

oops, and actually there's already another function with that name: in the rdiff
handler. that function was named like that because I started out by copying the rdiff
handler and modifying it to make the bup-specific one.

maybe we should name that function "rdiff_version" in the rdiff handler?

Sure, please. While not directly related to the bup handler, I think
it makes sense to include this change on your branch.

The include/exclude handling does not support paths with spaces,
right? This is not a blocker for an initial merge, I guess, but
perhaps example.bup could mention this limitation.

ah, but this can be added easily. I'll rip off the code already in the rdiff handler ;)

Since there's no comment about this in the rdiff handler, I'm just wondering why it's
replacing "__star__" by a real asterisk. Is this an rdiff-backup weirdness or is it
something that has to do with backupninja?

IIRC, the __star__ trick is a backupninja-specific way to avoid
worrying too early about asterisk expansion.

+ while [ -z "$REPLY" -o -z "$bup_type" ]
+ do

Please use consistent code layout throughout the helper,
e.g. the same as in the for loops found above. (I know the codebase
isn't exactly perfect on this front, but that's no reason to introduce
even more discrepancies in new handlers :)

hum here I'll assume you mean the disposition of the "do" at the end of the line.

Yes.

am I missing something? (perhaps using [ test ] || [ test ] instead of -o ? )

While I personally prefer the [ test ] || [ test ] way, I don't
think it's used yet in the codebase, so let's keep -o.

ok, I went through every one of your comments. now I need to test the new code to
ensure that I didn't break anything obvious. once I feel I've tested enough I'll send
the new commits on top of the same branch.

Woo! :)

#18 Updated by LeLutin over 2 years ago

Hi there, I've forced pushed an updated "bup_support" branch to my github repos:

https://github.com/lelutin/backupninja/tree/bup_support

I've rebased on current master, then squashed in some fixes that you requested and also some bug fixes and feature additions that I made over my usage of the handler. I think I have all the features I would like to have in a first version of that thing. Also, I've been regularly using the handler with push and pull type backups for quite a while now.

If that squashing makes things too difficult for you to compare with the previous work, I still have access to the pre-squashed work (but rebased on current master). I could push this to my repos so that you could see what has changed more easily.

Also, my detailed work without any squashing is available on the branch "bup_helper" if this can be of any use.

Cheers

#19 Updated by intrigeri over 2 years ago

  • Assignee deleted (LeLutin)
  • QA Check changed from Dev Needed to Ready for QA

Also available in: Atom PDF