Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context based cancellation of acquiring a session #164

Merged
merged 1 commit into from
Apr 13, 2021
Merged

Context based cancellation of acquiring a session #164

merged 1 commit into from
Apr 13, 2021

Conversation

jkawamoto
Copy link
Contributor

This PR fixes the issue #106 (comment).

@lukechampine
Copy link
Owner

Thanks -- implemented in 04317ef

@jkawamoto
Copy link
Contributor Author

Does 04317ef fix the problem? It looks like it still blocks if the context is cancelled while reconnecting to the host: 04317ef#r49426524.

@lukechampine
Copy link
Owner

Ah, you're right, my bad. I don't think there's a very clean way of adding ctx to reconnect, so I might have to spawn a goroutine after all. :(

@lukechampine
Copy link
Owner

lukechampine commented Apr 13, 2021

Okay, think I fixed it in 4c410b4. The behavior is not ideal; the host can remain locked even after ctx is canceled. But it's the best we can do without plumbing ctx through a bunch of other code.

@jkawamoto
Copy link
Contributor Author

I'm afraid it won't work because if reconnect succeeds after ctx gets cancelled, no one unlocks ls.mu. The goroutine needs to check if the context is cancelled and then release the lock like this:

https://github.com/lukechampine/us/pull/164/files#diff-a2da222a378a564bf5db4896d7a681980e1838f5a16b173687df2028d9e3eb68R34-R36

@jkawamoto
Copy link
Contributor Author

tryAcquire can also block if the context is cancelled while reconnecting to the host in that function, and we need tryAcquireCtx that runs the same goroutine acquireCtx has.

@lukechampine
Copy link
Owner

oof. Yeah. Clearly I didn't think this through deeply enough. Your implementation was fine after all, so I'm just going to revert to that. Sorry about that.

@lukechampine lukechampine reopened this Apr 13, 2021
@lukechampine lukechampine merged commit 574e553 into lukechampine:kv Apr 13, 2021
@lukechampine
Copy link
Owner

I think there's still a race:

  • ctx.Err() returns nil, so session is not released
  • <-ctx.Done() fires, so acquireCtx returns nil, ctx.Err()
  • session is never released

Will have to think a bit more about this.

@lukechampine
Copy link
Owner

d23ebc4 should fix the race. 🤞🏻
Both goroutines now select on the same two channels, and the channel is unbuffered. So either:

  • the inner goroutine signals the outer goroutine via done, and the outer goroutine returns sess, err, OR
  • ctx.Done() is closed, the outer goroutine returns nil, ctx.Err(), and the inner goroutine eventually releases the session (if applicable)

@jkawamoto jkawamoto deleted the acquireCtx branch April 13, 2021 16:13
@jkawamoto
Copy link
Contributor Author

jkawamoto commented Apr 13, 2021

Thanks for fixing it! But, we have to avoid result parameters because it causes data races:

==================
WARNING: DATA RACE
Write at 0x00c00042e010 by goroutine 16:
  lukechampine.com/us/renter/renterutil.acquireCtx()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:44 +0x3ae
  lukechampine.com/us/renter/renterutil.ParallelChunkDownloader.DownloadChunk.func1()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:683 +0x34e

Previous write at 0x00c00042e010 by goroutine 20:
  lukechampine.com/us/renter/renterutil.acquireCtx.func1()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:30 +0xa4

Goroutine 16 (running) created at:
  lukechampine.com/us/renter/renterutil.ParallelChunkDownloader.DownloadChunk()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:674 +0x671
  lukechampine.com/us/renter/renterutil.ParallelBlobDownloader.DownloadBlob.func1()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:1160 +0x261

Goroutine 20 (running) created at:
  lukechampine.com/us/renter/renterutil.acquireCtx()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:29 +0x2ab
  lukechampine.com/us/renter/renterutil.ParallelChunkDownloader.DownloadChunk.func1()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:683 +0x34e
==================
==================
WARNING: DATA RACE
Write at 0x00c000128050 by goroutine 16:
  lukechampine.com/us/renter/renterutil.acquireCtx()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:44 +0x3db
  lukechampine.com/us/renter/renterutil.ParallelChunkDownloader.DownloadChunk.func1()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:683 +0x34e

Previous write at 0x00c000128050 by goroutine 20:
  lukechampine.com/us/renter/renterutil.acquireCtx.func1()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:30 +0xd5

Goroutine 16 (running) created at:
  lukechampine.com/us/renter/renterutil.ParallelChunkDownloader.DownloadChunk()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:674 +0x671
  lukechampine.com/us/renter/renterutil.ParallelBlobDownloader.DownloadBlob.func1()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:1160 +0x261

Goroutine 20 (running) created at:
  lukechampine.com/us/renter/renterutil.acquireCtx()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:29 +0x2ab
  lukechampine.com/us/renter/renterutil.ParallelChunkDownloader.DownloadChunk.func1()
      /Users/junpei/src/github.com/lukechampine/us/renter/renterutil/strategies.go:683 +0x34e
==================
    testing.go:1092: race detected during execution of test

@jkawamoto
Copy link
Contributor Author

More precisely, both L30 and L44 can update sess at the same time.

@lukechampine
Copy link
Owner

oh yeah, I forgot that technically a return will assign to the result params. Jeez. This is what I get for not running tests.
Well, 6th time's the charm! a0f7724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants