Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes the triggered backup is missing table directories #4

Open
hakimkartik opened this issue Feb 20, 2019 · 7 comments
Open

Sometimes the triggered backup is missing table directories #4

hakimkartik opened this issue Feb 20, 2019 · 7 comments

Comments

@hakimkartik
Copy link

We take backup of our Scylla cluster on a regular basis and push it to a S3 bucket.
But recently I has started to notice that one of the backups is having only 2 snapshot table directories rather than having 3.
Tables in the keyspace directory of the cluster

[root@scylla-lst-visual-classification-4-1a] ls /data/scylla/lst_image_classification_details/
imageclassificationdetails-68947a00287011e98990000000000005  messageoriginaltext-6eeced60287011e98990000000000005  parenttochildmessagemap-73655620287011e98990000000000005
[root@scylla-lst-visual-classification-4-1a]

Directories uploaded to S3 by the tool on two different days (backup on 20-02-2019 is missing table named parenttochildmessagemap )

[root@build-master-2-1b] aws s3 ls prod-scylla-backup/lst-visual-classification-20190220/lst_image_classification_details/
                           PRE imageclassificationdetails-68947a00287011e98990000000000005/
                           PRE messageoriginaltext-6eeced60287011e98990000000000005/
[root@build-master-2-1b] aws s3 ls prod-scylla-backup/lst-visual-classification-20190219/lst_image_classification_details/
                           PRE imageclassificationdetails-68947a00287011e98990000000000005/
                           PRE messageoriginaltext-6eeced60287011e98990000000000005/
                           PRE parenttochildmessagemap-73655620287011e98990000000000005/
[root@build-master-2-1b]

Following is the command which we use for triggereing backup

scyllabackup take -c /etc/scyllabackup.yml --prefix=lst-visual-classification-20190220

tried triggering the take command with -l DEBUG but couldnt find any issues .
Can someone please suggest , what might be triggering this anomolous behaviour ?

@perfectayush
Copy link

perfectayush commented Feb 21, 2019

@hakimkartik It looks like you are changing the prefix every day for taking backup. Scyllabackup is not designed for that. It is designed to take backup in same prefix every day (or every time). Using same prefix, it is able to detect, which files are not required to be re-uploaded. The metadata of sqlite db might get affected in such case. I will have to check the impact of such a use case. There might be a corner case due to that.

I would also suggest to use verify command for a snapshot to figure out if any files are missing in the snapshot.

@rampreethethiraj
Copy link

Hi @perfectayush , without changing the prefix, tried to take backup.
But still I see the same error.
it fails in the verify step.

@perfectayush
Copy link

@rampreethethiraj can you post the error from scyllabackup in verify step.

I ran the command on one of our server it printed this:
ayush@:~$ scyllabackup verify -c /etc/scyllabackup.yml --max-workers 10
[2019-02-25 12:52:40,552] INFO scyllabackup: Verifying snapshot
[2019-02-25 12:53:16,044] INFO scyllabackup: All files exist remotely

@rampreethethiraj
Copy link

ERROR scyllabackup.snapshot: Remote file global-metadata-scylla-20190226/spr_global_metadata_db/sprmessagemetadata-2ad184309f8911e8843b000000000005/spr_global_metadata_db-sprmessagemetadata-ka-209792-TOC.txt doesn't exist\n[2019-02-26 07:23:44,826
This is just a sample. But , for a lot of files, I am seeing this.

@perfectayush
Copy link

@rampreethethiraj Please take the backup with --log-level DEBUG and check if there are any errors. I can't help without any errors to work with.

@rampreethethiraj
Copy link

@perfectayush we are already triggering with -l debug. And there are no errors while taking the backup. Only place where it is failing is in the verification step saying that it is not able to find some files.

@perfectayush
Copy link

perfectayush commented Feb 27, 2019

@rampreethethiraj Ok. Can you report me the exit code of the process when taking backup., If there is some error, the process should exit with some non zero exit code. Check your system/kernel logs to see that the process is not being killed due to OOM or something.

For more debugging i would suggest using some tracing, strace or perf trace. or maybe using https://docs.python.org/2/library/trace.html.

Since I can't replicate the issue at my at my end, without knowing where the code is breaking, I can't help/fix behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants