Sometimes the triggered backup is missing table directories #4

hakimkartik · 2019-02-20T04:44:31Z

We take backup of our Scylla cluster on a regular basis and push it to a S3 bucket.
But recently I has started to notice that one of the backups is having only 2 snapshot table directories rather than having 3.
Tables in the keyspace directory of the cluster

[root@scylla-lst-visual-classification-4-1a] ls /data/scylla/lst_image_classification_details/
imageclassificationdetails-68947a00287011e98990000000000005  messageoriginaltext-6eeced60287011e98990000000000005  parenttochildmessagemap-73655620287011e98990000000000005
[root@scylla-lst-visual-classification-4-1a]

Directories uploaded to S3 by the tool on two different days (backup on 20-02-2019 is missing table named parenttochildmessagemap )

[root@build-master-2-1b] aws s3 ls prod-scylla-backup/lst-visual-classification-20190220/lst_image_classification_details/
                           PRE imageclassificationdetails-68947a00287011e98990000000000005/
                           PRE messageoriginaltext-6eeced60287011e98990000000000005/
[root@build-master-2-1b] aws s3 ls prod-scylla-backup/lst-visual-classification-20190219/lst_image_classification_details/
                           PRE imageclassificationdetails-68947a00287011e98990000000000005/
                           PRE messageoriginaltext-6eeced60287011e98990000000000005/
                           PRE parenttochildmessagemap-73655620287011e98990000000000005/
[root@build-master-2-1b]

Following is the command which we use for triggereing backup

scyllabackup take -c /etc/scyllabackup.yml --prefix=lst-visual-classification-20190220

tried triggering the take command with -l DEBUG but couldnt find any issues .
Can someone please suggest , what might be triggering this anomolous behaviour ?

The text was updated successfully, but these errors were encountered:

perfectayush · 2019-02-21T18:33:08Z

@hakimkartik It looks like you are changing the prefix every day for taking backup. Scyllabackup is not designed for that. It is designed to take backup in same prefix every day (or every time). Using same prefix, it is able to detect, which files are not required to be re-uploaded. The metadata of sqlite db might get affected in such case. I will have to check the impact of such a use case. There might be a corner case due to that.

I would also suggest to use verify command for a snapshot to figure out if any files are missing in the snapshot.

rampreethethiraj · 2019-02-25T11:37:02Z

Hi @perfectayush , without changing the prefix, tried to take backup.
But still I see the same error.
it fails in the verify step.

perfectayush · 2019-02-25T12:55:04Z

@rampreethethiraj can you post the error from scyllabackup in verify step.

I ran the command on one of our server it printed this:
ayush@:~$ scyllabackup verify -c /etc/scyllabackup.yml --max-workers 10
[2019-02-25 12:52:40,552] INFO scyllabackup: Verifying snapshot
[2019-02-25 12:53:16,044] INFO scyllabackup: All files exist remotely

rampreethethiraj · 2019-02-26T08:11:32Z

ERROR scyllabackup.snapshot: Remote file global-metadata-scylla-20190226/spr_global_metadata_db/sprmessagemetadata-2ad184309f8911e8843b000000000005/spr_global_metadata_db-sprmessagemetadata-ka-209792-TOC.txt doesn't exist\n[2019-02-26 07:23:44,826
This is just a sample. But , for a lot of files, I am seeing this.

perfectayush · 2019-02-26T08:36:51Z

@rampreethethiraj Please take the backup with --log-level DEBUG and check if there are any errors. I can't help without any errors to work with.

rampreethethiraj · 2019-02-26T09:28:12Z

@perfectayush we are already triggering with -l debug. And there are no errors while taking the backup. Only place where it is failing is in the verification step saying that it is not able to find some files.

perfectayush · 2019-02-27T18:18:07Z

@rampreethethiraj Ok. Can you report me the exit code of the process when taking backup., If there is some error, the process should exit with some non zero exit code. Check your system/kernel logs to see that the process is not being killed due to OOM or something.

For more debugging i would suggest using some tracing, strace or perf trace. or maybe using https://docs.python.org/2/library/trace.html.

Since I can't replicate the issue at my at my end, without knowing where the code is breaking, I can't help/fix behaviour.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes the triggered backup is missing table directories #4

Sometimes the triggered backup is missing table directories #4

hakimkartik commented Feb 20, 2019

perfectayush commented Feb 21, 2019 •

edited

Loading

rampreethethiraj commented Feb 25, 2019

perfectayush commented Feb 25, 2019

rampreethethiraj commented Feb 26, 2019

perfectayush commented Feb 26, 2019

rampreethethiraj commented Feb 26, 2019

perfectayush commented Feb 27, 2019 •

edited

Loading

Sometimes the triggered backup is missing table directories #4

Sometimes the triggered backup is missing table directories #4

Comments

hakimkartik commented Feb 20, 2019

perfectayush commented Feb 21, 2019 • edited Loading

rampreethethiraj commented Feb 25, 2019

perfectayush commented Feb 25, 2019

rampreethethiraj commented Feb 26, 2019

perfectayush commented Feb 26, 2019

rampreethethiraj commented Feb 26, 2019

perfectayush commented Feb 27, 2019 • edited Loading

perfectayush commented Feb 21, 2019 •

edited

Loading

perfectayush commented Feb 27, 2019 •

edited

Loading