Discussion:
[Duplicity-talk] Verification of filelists, multiprocessing
Andreas Doll
2016-01-27 19:53:41 UTC
Permalink
Hello

If I run

$ duplicity --encrypt-sign-key 123ABC12 /path/to/dir file://backup1

and then

$ duplicity verify file://backup1 /path/to/dir

everything works fine.
However, my setup is as follows: I run

$ duplicity --encrypt-sign-key 123ABC12 --include-globbing-filelist list.txt / file://backup2

with list.txt consisting of directories as well as files:

/path/to/dir
+ /usr/another/dir
+ /etc/foo/file
+ /etc/bar/another/file
- **

Sorry if I don't grok the manpage, but how can I verify file://backup2 using
the filelist list.txt?

Moreover: Judging from the manpage I assume it is not possible to use multiple
CPU cores to speedup de/encryption?

Thanks,
Andreas
Cláudio Gil
2016-01-27 21:27:21 UTC
Permalink
Hi,

About pararelism, from what I could see, duplicity processes your files as
a stream, a long sequence of files. Encryption is also sequential by design
(that's why GnuPG, what duplicity uses, does not use threads).

Maybe paralelization could be introduced in a few select points of
duplicity. But the part that is mostly CPU-bound is the encryption and
encrypting a file will use 1 core.

Since the entire volume is encrypted and not the files in it, you could
test the difference between between backing up 250MB (10 volumes, by
default) with encryption and backing up the same files without encryption
and then encrypting all the volumes with multiple GnuPG processes (using
"parallel" or something).

It's an interesting comparison. It's not obvious if it be faster and how
changing the volume size affects the result.

Cheers,
Cláudio
Post by Andreas Doll
Hello
If I run
$ duplicity --encrypt-sign-key 123ABC12 /path/to/dir file://backup1
and then
$ duplicity verify file://backup1 /path/to/dir
everything works fine.
However, my setup is as follows: I run
$ duplicity --encrypt-sign-key 123ABC12 --include-globbing-filelist
list.txt / file://backup2
/path/to/dir
+ /usr/another/dir
+ /etc/foo/file
+ /etc/bar/another/file
- **
Sorry if I don't grok the manpage, but how can I verify file://backup2 using
the filelist list.txt?
Moreover: Judging from the manpage I assume it is not possible to use multiple
CPU cores to speedup de/encryption?
Thanks,
Andreas
_______________________________________________
Duplicity-talk mailing list
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Nate Eldredge
2016-01-27 21:39:53 UTC
Permalink
Post by Cláudio Gil
Hi,
About pararelism, from what I could see, duplicity processes your files as
a stream, a long sequence of files. Encryption is also sequential by design
(that's why GnuPG, what duplicity uses, does not use threads).
Maybe paralelization could be introduced in a few select points of
duplicity. But the part that is mostly CPU-bound is the encryption and
encrypting a file will use 1 core.
Since the entire volume is encrypted and not the files in it, you could
test the difference between between backing up 250MB (10 volumes, by
default) with encryption and backing up the same files without encryption
and then encrypting all the volumes with multiple GnuPG processes (using
"parallel" or something).
It's an interesting comparison. It's not obvious if it be faster and how
changing the volume size affects the result.
I guess one could write a wrapper around gpg that slurps up the input data
and exits, keeping gpg itself running on that data in the background, with
some sort of simple IPC to coordinate a maximum number of gpg processes.
This could be a useful thing to have in general. And then one wouldn't
need to modify duplicity at all.

Maybe a fun project in someone's "copious free time"...
--
Nate Eldredge
***@thatsmathematics.com
Cláudio Gil
2016-01-28 00:17:10 UTC
Permalink
Post by Nate Eldredge
Post by Cláudio Gil
Hi,
About pararelism, from what I could see, duplicity processes your files as
a stream, a long sequence of files. Encryption is also sequential by design
(that's why GnuPG, what duplicity uses, does not use threads).
Maybe paralelization could be introduced in a few select points of
duplicity. But the part that is mostly CPU-bound is the encryption and
encrypting a file will use 1 core.
Since the entire volume is encrypted and not the files in it, you could
test the difference between between backing up 250MB (10 volumes, by
default) with encryption and backing up the same files without encryption
and then encrypting all the volumes with multiple GnuPG processes (using
"parallel" or something).
It's an interesting comparison. It's not obvious if it be faster and how
changing the volume size affects the result.
I guess one could write a wrapper around gpg that slurps up the input
data and exits, keeping gpg itself running on that data in the background,
with some sort of simple IPC to coordinate a maximum number of gpg
processes. This could be a useful thing to have in general. And then one
wouldn't need to modify duplicity at all.
Post by Nate Eldredge
Maybe a fun project in someone's "copious free time"...
I don't think that would work. Duplicity is not doing things in sequential
and independent steps. All steps to create a volume are chained together
(like a pipe). This means the encryption starts with the first byte to be
backed up and ends when the volume is complete. If you created such a
script I believe there would either be an error or it would work but
duplicity would not call a second GnuPG anyway.

To run multiple GnuPG at the same time, and since volumes must be generated
in sequence, you need to either 1) skip encryption until you have N volumes
and then start multiple GnuPG or 2) buffer what you want to send to GnuPG
in order to be able to advance to the next volume. Both options assume
GnuPG is the bottleneck, option 1 requires more local disk space, and
option 2 requires more memory.

Cheers,
Cláudio
Andreas Doll
2016-01-27 22:27:21 UTC
Permalink
Maybe paralelization could be introduced in a few select points of duplicity.
But the part that is mostly CPU-bound is the encryption and encrypting a file
will use 1 core.
Yes, my naive idea was that it may be possible to de/encrypt files in
parallel. However, I'm not familiar with duplicitys codebase, and I'm not sure
if this is feasible.
Cláudio Gil
2016-01-28 00:35:48 UTC
Permalink
Post by Andreas Doll
Maybe paralelization could be introduced in a few select points of duplicity.
But the part that is mostly CPU-bound is the encryption and encrypting a file
will use 1 core.
Yes, my naive idea was that it may be possible to de/encrypt files in
parallel. However, I'm not familiar with duplicitys codebase, and I'm not sure
if this is feasible.
Multiple volumes can be encrypted or decrypted in parallel but, regardless
of being easy or hard to change duplicity to do it, I'm not even sure it's
worth it. There is a lot of IO involved, for example, and volumes still
need to be processed in a sequence, by duplicity.

Cheers,
Cláudio
e***@web.de
2016-01-28 10:45:42 UTC
Permalink
Post by Cláudio Gil
Post by Andreas Doll
Post by Cláudio Gil
Maybe paralelization could be introduced in a few select points of
duplicity.
Post by Andreas Doll
Post by Cláudio Gil
But the part that is mostly CPU-bound is the encryption and encrypting
a file
Post by Andreas Doll
Post by Cláudio Gil
will use 1 core.
Yes, my naive idea was that it may be possible to de/encrypt files in
parallel. However, I'm not familiar with duplicitys codebase, and I'm not
sure
Post by Andreas Doll
if this is feasible.
Multiple volumes can be encrypted or decrypted in parallel but, regardless
of being easy or hard to change duplicity to do it, I'm not even sure it's
worth it. There is a lot of IO involved, for example, and volumes still
need to be processed in a sequence, by duplicity.
biggest bottleneck (in terms of speed) currently is probably the sequential upload to the backend. if someone would completely asynchronize that from backup creation, so duplicity could generate away and files would be uploaded independently in another thread, maybe even in parallel if there are enough volumes finished, that would make a dent i guess.

ede/duply.net

Loading...