Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error during git put tasks leads to data loss (eg tfstate changes) #232

Closed
gberche-orange opened this issue Jan 16, 2019 · 1 comment
Closed

Comments

@gberche-orange
Copy link
Member

Expected behavior

Upon mutations (such as tfstate resulting of a terraform apply), resulting git commit should not be lost upon transient git unavailability.

  • A non-zero retry should be configured through the attempts support

  • Possibly retry the job forever, as to enable operators to

    • fix root cause (and the job would then resume)
    • fly in to retrieve the git commits as a workaround
    • explicitly cancel the hang jobs (e.g. in case of pipeline deadlocks)
  • Consider displaying a git patch output in the concourse console, as to enable operators to manually reapply it later on as a manual workaround. This may leak credentials to operators though.

Observed behavior

Put task fails immediately:

remote: error: cannot lock ref 'refs/heads/master': is at 741eed27503195c717bd8925140684050f5202d2 but expected a9157d66de44c3ae0d4fa0dbc91abc18cfebd8d8        
    
To https://elpaaso-gitlab.my.domain.com/fe-group/secrets.git
    
 ! [remote rejected] HEAD -> master (failed to update ref)
    
error: failed to push some refs to 'https://redacted_user:redacted_password@elpaaso-gitlab.my.domain.com/fe-group/secrets.git'
    
failed with non-rebase error

See related #231

Human workaround through fly hijack in the build was not possible (around 30 mins later).

Affected release

Reproduced on version 3.2.2

@gberche-orange gberche-orange changed the title Error during git put tasks leads to data losss (tfstate change) Error during git put tasks leads to data losss (eg tfstate changes) Jan 18, 2019
@o-orand o-orand added discuss and removed discuss labels Jan 18, 2019
@gberche-orange
Copy link
Member Author

Notes during backlog review:

Evaluated impact:

  • Only terraform tfstate is source of truth in git for data that can't be regenerated using retries

Additional considered alternatives:

  • accept the limitation, and try to recover manually lost data (e.g. using terraform import)
  • move tf data outside of git: S3, credhub
  • on git put failure store to alternate storage: S3, Credhub

=> displaying Git patch seems best approach for now

@gberche-orange gberche-orange changed the title Error during git put tasks leads to data losss (eg tfstate changes) Error during git put tasks leads to data loss (eg tfstate changes) Jan 21, 2019
@o-orand o-orand moved this to Done in CF OPS Automation Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants