Discussion:
[Duplicity-talk] Moving old sets to cold storage (listing files to be deleted)
Ladislav Laska
2016-04-14 21:52:18 UTC
Permalink
Hi!

I'm using duplicity for some time, but I'm running out of space in my primary
storage for off-site backup and would like to move old sets to cold storage.

I plan on keeping last two full backups + incrementals (that is 1-2 months of
backups) on a few places, move the rest on a local drive just in case. I won't
miss them if the drive dies, and probably never use them, but don't want to
delete them.

For the remove-* commands, man page states this:

"Note also that --force will be needed to delete the files instead of just
listing them."

Listing the files is what I want! Unforunately, duplicity does not list the
files (I have trie 6.x and 7.x with the same result).

Is there any way to list all files that would be removed, so I can move them to
another place? And are those files enough so I can use all the duplicity
commands in the new location?

Thanks!
--
S pozdravem Ladislav "Krakonoš" Láska http://www.krakonos.org/
e***@web.de
2016-04-15 09:40:31 UTC
Permalink
Post by Ladislav Laska
Hi!
I'm using duplicity for some time, but I'm running out of space in my primary
storage for off-site backup and would like to move old sets to cold storage.
I plan on keeping last two full backups + incrementals (that is 1-2 months of
backups) on a few places, move the rest on a local drive just in case. I won't
miss them if the drive dies, and probably never use them, but don't want to
delete them.
"Note also that --force will be needed to delete the files instead of just
listing them."
Listing the files is what I want! Unforunately, duplicity does not list the
files (I have trie 6.x and 7.x with the same result).
Is there any way to list all files that would be removed, so I can move them to
another place? And are those files enough so I can use all the duplicity
commands in the new location?
hmm, you are actually right. they are not listed, not sure if they ever were or this is a doc error. can you please file a ticket on launchpad for this?

anyway - sorting out backup chains on the backend is easy, simply have a look! the files have timespans in their filename.
a chain consists of a full followed by incrementals.
a backup consists of a sig, manifest and volumes if there was data to backup.

moving a complete chain somewhere else does the same as deleting it from the backend as far as duplicity is concerned.

..ede/duply.net
Ladislav Laska
2016-04-15 10:00:09 UTC
Permalink
Post by e***@web.de
hmm, you are actually right. they are not listed, not sure if they ever were or this is a doc error. can you please file a ticket on launchpad for this?
Sure.
Post by e***@web.de
anyway - sorting out backup chains on the backend is easy, simply have a look! the files have timespans in their filename.
a chain consists of a full followed by incrementals.
a backup consists of a sig, manifest and volumes if there was data to backup.
Yeah, my set has 11k volumes and I was kind of hoping to find a reliable way to
do it.

I can easily move the full backup, but the incrementals always have dates of the
incrementals they are based upon. I want to avoid having to parse dates in the
files (I'm just too lazy).

For now, I'll try to use filesystem date and hope it's accruate. Is there any
option to verify if the set is complete and no volume is missing, besides
restoring the whole set?
--
S pozdravem Ladislav "Krakonoš" Láska http://www.krakonos.org/
e***@web.de
2016-04-15 10:11:04 UTC
Permalink
Post by Ladislav Laska
Post by e***@web.de
hmm, you are actually right. they are not listed, not sure if they ever were or this is a doc error. can you please file a ticket on launchpad for this?
Sure.
thanks
Post by Ladislav Laska
Post by e***@web.de
anyway - sorting out backup chains on the backend is easy, simply have a look! the files have timespans in their filename.
a chain consists of a full followed by incrementals.
a backup consists of a sig, manifest and volumes if there was data to backup.
Yeah, my set has 11k volumes and I was kind of hoping to find a reliable way to
do it.
I can easily move the full backup, but the incrementals always have dates of the
incrementals they are based upon. I want to avoid having to parse dates in the
files (I'm just too lazy).
how about just sorting the file list alphabetically and then selecting the first file with full in the name (memorizing the timestamp of it) and keep selecting until the next file with a different full timestamp, excluding it.
Post by Ladislav Laska
For now, I'll try to use filesystem date and hope it's accruate. Is there any
option to verify if the set is complete and no volume is missing, besides
restoring the whole set?
you could locally, or wherever verify the chain. that's a restore file by file and compare with a checksum. it obviously only checks files existing at that point of time of course, meaning files flagged as deleted before are not checked.
but as a basic check that should suffice, as it would throw up on corrupt or missing volumes/files.

..ede/duply.net
e***@web.de
2016-04-15 10:13:52 UTC
Permalink
Post by e***@web.de
anyway - sorting out backup chains on the backend is easy, simply have a look! the files have timespans in their filename.
Post by e***@web.de
a chain consists of a full followed by incrementals.
a backup consists of a sig, manifest and volumes if there was data to backup.
Yeah, my set has 11k volumes and I was kind of hoping to find a reliable way to
do it.
that's a lot.. did you consider bigger volume sizes at some point? if not, why not?

..ede/duply.net
Ladislav Laska
2016-04-15 10:23:24 UTC
Permalink
Post by e***@web.de
that's a lot.. did you consider bigger volume sizes at some point? if not, why not?
Yes I have new backups have larger volumes. But I didn't think of this when I
first migrated to duplicity.
Post by e***@web.de
..ede/duply.net
_______________________________________________
Duplicity-talk mailing list
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
--
S pozdravem Ladislav "Krakonoš" Láska http://www.krakonos.org/
Ladislav Laska
2016-04-15 10:18:30 UTC
Permalink
Post by e***@web.de
how about just sorting the file list alphabetically and then selecting the first file with full in the name (memorizing the timestamp of it) and keep selecting until the next file with a different full timestamp, excluding it.
Well, yeah, but 11k files. Although I might put something heavy on my insert key
in mc :-)
Post by e***@web.de
Post by Ladislav Laska
For now, I'll try to use filesystem date and hope it's accruate. Is there any
option to verify if the set is complete and no volume is missing, besides
restoring the whole set?
you could locally, or wherever verify the chain. that's a restore file by file and compare with a checksum. it obviously only checks files existing at that point of time of course, meaning files flagged as deleted before are not checked.
but as a basic check that should suffice, as it would throw up on corrupt or missing volumes/files.
So the 'verify' command? Seems a bit sad. It'll do, but is it too much to ask
for a way to just check if the backend storage is instact, possibly comparing
checksums of the individual volumes?
--
S pozdravem Ladislav "Krakonoš" Láska http://www.krakonos.org/
e***@web.de
2016-04-15 10:23:45 UTC
Permalink
Post by Ladislav Laska
Post by e***@web.de
how about just sorting the file list alphabetically and then selecting the first file with full in the name (memorizing the timestamp of it) and keep selecting until the next file with a different full timestamp, excluding it.
Well, yeah, but 11k files. Although I might put something heavy on my insert key
in mc :-)
consider using shell magic. like piping ls into awk doing exec mv or so.
Post by Ladislav Laska
Post by e***@web.de
Post by Ladislav Laska
For now, I'll try to use filesystem date and hope it's accruate. Is there any
option to verify if the set is complete and no volume is missing, besides
restoring the whole set?
you could locally, or wherever verify the chain. that's a restore file by file and compare with a checksum. it obviously only checks files existing at that point of time of course, meaning files flagged as deleted before are not checked.
but as a basic check that should suffice, as it would throw up on corrupt or missing volumes/files.
So the 'verify' command? Seems a bit sad. It'll do, but is it too much to ask
for a way to just check if the backend storage is instact, possibly comparing
checksums of the individual volumes?
there are checksums that are used during verify/restore. simple listing is done via signature files, volumes are not needed for that.

actually to verify you have to read the volumes one by one anyway which is pretty much what a restore/verify does ;)

..ede/duply.net
Ladislav Laska
2016-04-15 10:35:31 UTC
Permalink
Post by e***@web.de
Post by Ladislav Laska
Post by e***@web.de
how about just sorting the file list alphabetically and then selecting the first file with full in the name (memorizing the timestamp of it) and keep selecting until the next file with a different full timestamp, excluding it.
Well, yeah, but 11k files. Although I might put something heavy on my insert key
in mc :-)
consider using shell magic. like piping ls into awk doing exec mv or so.
Sure, I'll have to, but that's what I wanted to avoid and expected it'll be
easily done, as it seems like a pretty common task.
Post by e***@web.de
Post by Ladislav Laska
Post by e***@web.de
Post by Ladislav Laska
For now, I'll try to use filesystem date and hope it's accruate. Is there any
option to verify if the set is complete and no volume is missing, besides
restoring the whole set?
you could locally, or wherever verify the chain. that's a restore file by file and compare with a checksum. it obviously only checks files existing at that point of time of course, meaning files flagged as deleted before are not checked.
but as a basic check that should suffice, as it would throw up on corrupt or missing volumes/files.
So the 'verify' command? Seems a bit sad. It'll do, but is it too much to ask
for a way to just check if the backend storage is instact, possibly comparing
checksums of the individual volumes?
there are checksums that are used during verify/restore. simple listing is done via signature files, volumes are not needed for that.
i
actually to verify you have to read the volumes one by one anyway which is pretty much what a restore/verify does ;)
Well, listing uses only signatures, as you said, and I think it won't even
notice if volumes are missing.

To verify missing or corrupt volumes, I really want to avoid restoring all the
files and some obscurities:

Suppose I use duplicity verify with time after the last incremental backup I
just moved. It would extract volumes and check with other files. For example,
I'm not sure what happens if:

Volume xyz is missing, but all the files in the volume (can be easily a
single big file) have been deleted. There is no need to extract it, as it should
not exist at the specific time.

The same applies if xyz is corrupted (incomplete after upload), the file has
changed completely and has newer copy somewhere in other archive...

I'm really not sure what happens in this case, and haven't found an answer, but
unless someone tells me otherwise, I'd have to run verify/restore once for every
incremental backup, and I'm not going to do that.

It would actually be great if I could do backend check and ensure that all the
files that may be needed in the future are on my backend and intact.
--
S pozdravem Ladislav "Krakonoš" Láska http://www.krakonos.org/
e***@web.de
2016-04-15 10:47:19 UTC
Permalink
Post by Ladislav Laska
Post by e***@web.de
Post by Ladislav Laska
Post by e***@web.de
how about just sorting the file list alphabetically and then selecting the first file with full in the name (memorizing the timestamp of it) and keep selecting until the next file with a different full timestamp, excluding it.
Well, yeah, but 11k files. Although I might put something heavy on my insert key
in mc :-)
consider using shell magic. like piping ls into awk doing exec mv or so.
Sure, I'll have to, but that's what I wanted to avoid and expected it'll be
easily done, as it seems like a pretty common task.
feel free to offer a patch to duplicity to (re)enable what you need ;)
Post by Ladislav Laska
Post by e***@web.de
Post by Ladislav Laska
Post by e***@web.de
Post by Ladislav Laska
For now, I'll try to use filesystem date and hope it's accruate. Is there any
option to verify if the set is complete and no volume is missing, besides
restoring the whole set?
you could locally, or wherever verify the chain. that's a restore file by file and compare with a checksum. it obviously only checks files existing at that point of time of course, meaning files flagged as deleted before are not checked.
but as a basic check that should suffice, as it would throw up on corrupt or missing volumes/files.
So the 'verify' command? Seems a bit sad. It'll do, but is it too much to ask
for a way to just check if the backend storage is instact, possibly comparing
checksums of the individual volumes?
there are checksums that are used during verify/restore. simple listing is done via signature files, volumes are not needed for that.
i
actually to verify you have to read the volumes one by one anyway which is pretty much what a restore/verify does ;)
Well, listing uses only signatures, as you said, and I think it won't even
notice if volumes are missing.
it should.. on every run duplicity builds a list of existing chains. run in max. verbosity "-v9" and you'll see.
Post by Ladislav Laska
To verify missing or corrupt volumes, I really want to avoid restoring all the
Suppose I use duplicity verify with time after the last incremental backup I
just moved. It would extract volumes and check with other files. For example,
Volume xyz is missing, but all the files in the volume (can be easily a
single big file) have been deleted. There is no need to extract it, as it should
not exist at the specific time.
duplicity holds a checksum for every volume file (in the manifest i think), so if a volume is opened, it is checked against the checksum first and a error(or at least warning) is printed.
Post by Ladislav Laska
The same applies if xyz is corrupted (incomplete after upload), the file has
changed completely and has newer copy somewhere in other archive...
how.. file names are unique because of the time stamps.
Post by Ladislav Laska
I'm really not sure what happens in this case, and haven't found an answer, but
unless someone tells me otherwise, I'd have to run verify/restore once for every
incremental backup, and I'm not going to do that.
It would actually be great if I could do backend check and ensure that all the
files that may be needed in the future are on my backend and intact.
that's not really duplicity's purview. duplicity treats backends as dumb, to make integrating backends as easy as possible.

disadvantage of that approach of course is that there is no backend logic as such.

what is currently called verify is actually a compare against the datasource. a mere verify would test exactly what you want, but unfortunately the famous no one didn't contribute it so far.

regards.. ede/duply.net
Ladislav Laska
2016-04-15 11:18:44 UTC
Permalink
Post by e***@web.de
feel free to offer a patch to duplicity to (re)enable what you need ;)
It's already on my todo-list, I'll take a look during the weekend to see how
deep would I need to dig. I might post here for some pointers if I decide to do
something about it soon.
Post by e***@web.de
Post by Ladislav Laska
Well, listing uses only signatures, as you said, and I think it won't even
notice if volumes are missing.
it should.. on every run duplicity builds a list of existing chains. run in max. verbosity "-v9" and you'll see.
I just did a crude test: backed up a 100M directory, removed one difftar from
the backup.

It kind of noticed that the volume is gone in collection-status. Not in a way I'd expect though, it
just wrote:

Total number of contained volumes: 3

Instead of 4. I removed volume 2. One might think it could have noticed that
volume 2 is missing! The -v9 output does not say anything about volume 2, so I
guess it just looks for what's there, and doesn't check for what's not.

Is this a feature? I would consider it a bug.

Moreover, list-current-files happily lists all the files, even though the
volume 2 is missing and the files are not accessible. This is what I expected,
as it's reasonable to have this cached...

So to run verify/restore seems to be the best option for checking.
Post by e***@web.de
Post by Ladislav Laska
To verify missing or corrupt volumes, I really want to avoid restoring all the
Suppose I use duplicity verify with time after the last incremental backup I
just moved. It would extract volumes and check with other files. For example,
Volume xyz is missing, but all the files in the volume (can be easily a
single big file) have been deleted. There is no need to extract it, as it should
not exist at the specific time.
duplicity holds a checksum for every volume file (in the manifest i think), so if a volume is opened, it is checked against the checksum first and a error(or at least warning) is printed.
Post by Ladislav Laska
The same applies if xyz is corrupted (incomplete after upload), the file has
changed completely and has newer copy somewhere in other archive...
how.. file names are unique because of the time stamps.
I don't have an answer, as I need to familiarize myself with how the files are
stored in duplicity archives:-)
Post by e***@web.de
Post by Ladislav Laska
I'm really not sure what happens in this case, and haven't found an answer, but
unless someone tells me otherwise, I'd have to run verify/restore once for every
incremental backup, and I'm not going to do that.
It would actually be great if I could do backend check and ensure that all the
files that may be needed in the future are on my backend and intact.
that's not really duplicity's purview. duplicity treats backends as dumb, to make integrating backends as easy as possible.
Sure, that's great, but it's also a reason why duplicity should be able to check
if the backend contains all expected data, isn't it? After all, the backend is
dumb and we can't expect it to do it for us...
Post by e***@web.de
disadvantage of that approach of course is that there is no backend logic as such.
what is currently called verify is actually a compare against the datasource. a mere verify would test exactly what you want, but unfortunately the famous no one didn't contribute it so far.
Yeah, and someone really should. I'll try to take a look, but unfrotunately, I
have more pressing work to do for the next few months.
--
S pozdravem Ladislav "Krakonoš" Láska http://www.krakonos.org/
e***@web.de
2016-04-15 12:20:39 UTC
Permalink
Post by Ladislav Laska
Post by e***@web.de
feel free to offer a patch to duplicity to (re)enable what you need ;)
It's already on my todo-list, I'll take a look during the weekend to see how
deep would I need to dig. I might post here for some pointers if I decide to do
something about it soon.
Post by e***@web.de
Post by Ladislav Laska
Well, listing uses only signatures, as you said, and I think it won't even
notice if volumes are missing.
it should.. on every run duplicity builds a list of existing chains. run in max. verbosity "-v9" and you'll see.
I just did a crude test: backed up a 100M directory, removed one difftar from
the backup.
It kind of noticed that the volume is gone in collection-status. Not in a way I'd expect though, it
Total number of contained volumes: 3
Instead of 4. I removed volume 2. One might think it could have noticed that
volume 2 is missing! The -v9 output does not say anything about volume 2, so I
guess it just looks for what's there, and doesn't check for what's not.
Is this a feature?
guess not.. volumes are numbered ascendingly
Post by Ladislav Laska
I would consider it a bug.
me too
Post by Ladislav Laska
Moreover, list-current-files happily lists all the files, even though the
volume 2 is missing and the files are not accessible. This is what I expected,
as it's reasonable to have this cached...
yeah.. the signature usage
Post by Ladislav Laska
So to run verify/restore seems to be the best option for checking.
Post by e***@web.de
Post by Ladislav Laska
To verify missing or corrupt volumes, I really want to avoid restoring all the
Suppose I use duplicity verify with time after the last incremental backup I
just moved. It would extract volumes and check with other files. For example,
Volume xyz is missing, but all the files in the volume (can be easily a
single big file) have been deleted. There is no need to extract it, as it should
not exist at the specific time.
duplicity holds a checksum for every volume file (in the manifest i think), so if a volume is opened, it is checked against the checksum first and a error(or at least warning) is printed.
Post by Ladislav Laska
The same applies if xyz is corrupted (incomplete after upload), the file has
changed completely and has newer copy somewhere in other archive...
how.. file names are unique because of the time stamps.
I don't have an answer, as I need to familiarize myself with how the files are
stored in duplicity archives:-)
Post by e***@web.de
Post by Ladislav Laska
I'm really not sure what happens in this case, and haven't found an answer, but
unless someone tells me otherwise, I'd have to run verify/restore once for every
incremental backup, and I'm not going to do that.
It would actually be great if I could do backend check and ensure that all the
files that may be needed in the future are on my backend and intact.
that's not really duplicity's purview. duplicity treats backends as dumb, to make integrating backends as easy as possible.
Sure, that's great, but it's also a reason why duplicity should be able to check
if the backend contains all expected data, isn't it? After all, the backend is
dumb and we can't expect it to do it for us...
Post by e***@web.de
disadvantage of that approach of course is that there is no backend logic as such.
what is currently called verify is actually a compare against the datasource. a mere verify would test exactly what you want, but unfortunately the famous no one didn't contribute it so far.
Yeah, and someone really should. I'll try to take a look, but unfrotunately, I
have more pressing work to do for the next few months.
haven't we all? ;).. have a nice weekend, ede/duply.net
Ladislav Laska
2016-04-15 12:25:08 UTC
Permalink
Allright! I'll post some issues on launchpad, in case someone would be willing
to pick them up before I do :-)

Thanks for your inputs and have a nice weekend as well!
--
S pozdravem Ladislav "Krakonoš" Láska http://www.krakonos.org/
Loading...