Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve async cancellation safety of future::Cache (v0.12.0-beta.X) #309

Merged
merged 8 commits into from
Aug 27, 2023

Conversation

tatsuya6502
Copy link
Member

@tatsuya6502 tatsuya6502 commented Aug 23, 2023

This PR improves async cancellation safety of cache write methods insert, get_with, invalidate and remove. It is for future::Cache in v0.12.0-beta.1.

The Problem

#294 (comment)

I was reviewing the codes and found that cache write operations (such as insert and get_with) may not record a write operation log if the caller (enclosing future) has been cancelled. This will leave a newly inserted cached entry not managed by the cache policies (LFU filter, LRU queue, etc.). When this happens, the entry will never be evicted by the policy, while it can be read by get, replaced by other insert, or removed by invalidate.

This issue already existed in very early version of moka. But v0.12.0 will increase the chance of this issue to happen as it now supports for calling user supplied async eviction listener.

A common solution for this would be write-ahead-log, which to record the write op log ahead of the actual insert/update operation. However, we will not be able to do this because it is impossible to know the exact operation we are going to perform (insert or update to the internal concurrent hash table (cht)). Our lock-free cht allows multiple threads to try to insert and remove the same key at the same time, without taking locks. We will know our insert attempt turned out to be an insert or update only after we succeeded.

You will be affected by the above if you use async_eviction_listener and do async cancellation. For example, some sever frameworks such as Axum do async cancellation for you; request handling will be cancelled when the client connection is dropped.

The Solution

This PR addresses the above problem by saving the interrupted tasks to a mpmc channel when async cancellation happens on the following methods:

  • Some of the cache write methods: insert, get_with, invalidate and remove

These interrupted tasks will be resumed/retried when one of the following methods is called:

  • The cache write methods listed above.
  • get method
  • run_pending_tasks method

The following interrupted tasks will be saved and resumed/retried:

  • If an eviction listener is configured, and an async cancellation happens while one of those cache write methods listed above is notifying the eviction listener for an entry replacement/removal, the incomplete notification Future should be saved.
  • An async cancellation happens while one of those cache write methods listed above tries to send a write op log (WriteOp enum) to an internal channel but the channel is full, the WriteOp should be saved.

A Limitation

future::Cache v0.12.0 or later does Immediate notification delivery mode, which guarantees to preserve the order of events on a key. However, if the eviction listener is interrupted by async cancellation and then resumed, the order may not be preserved.

Testing the Solution

Added the following unit tests to verify those interrupted tasks are resumed/retried:

  • cancel_future_while_calling_eviction_listener
  • cancel_future_while_scheduling_write_op

@tatsuya6502 tatsuya6502 self-assigned this Aug 23, 2023
@tatsuya6502 tatsuya6502 added the enhancement New feature or request label Aug 23, 2023
@tatsuya6502 tatsuya6502 added this to the v0.12.0 milestone Aug 23, 2023
@tatsuya6502 tatsuya6502 changed the title Improve async cancellation safety of future::Cache (v0.12.0-beta.X) Improve async cancellation safety of future::Cache (v0.12.0-beta.X) Aug 23, 2023
Update TODO comments.
Make it possible to resume an eviction notification in write operation after
cancelling.
Reschedule write op after async cancel.
Refactoring on the `do_insert_with_hash` of `BaseCache`.
Refactoring on the `PendingOp` and `PendingOpGuard`.
Change back to use the same `ReadOp` between `future` and `sync` caches.
- Add unit tests for async cancellations.
- Some refactoring.
@tatsuya6502 tatsuya6502 marked this pull request as ready for review August 27, 2023 02:03
- Update a unit tests for async cancellations.
Copy link
Member Author

@tatsuya6502 tatsuya6502 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging.

@tatsuya6502 tatsuya6502 added this pull request to the merge queue Aug 27, 2023
Merged via the queue into main with commit 3818083 Aug 27, 2023
38 checks passed
@tatsuya6502 tatsuya6502 deleted the async-cancellation-safety branch August 27, 2023 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant