-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pin tip detect getting stuck in loop #897
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #897 +/- ##
=======================================
Coverage 95.63% 95.63%
=======================================
Files 128 128
Lines 5273 5274 +1
=======================================
+ Hits 5043 5044 +1
Misses 230 230 ☔ View full report in Codecov by Sentry. |
@@ -153,6 +153,7 @@ async def _set_triggered_tip(): | |||
""" | |||
async for value in observe_value(self.array_data): | |||
try: | |||
await asyncio.sleep(0) | |||
location = await self._get_tip_and_edge_data(value) | |||
self._set_triggered_values(location) | |||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good if the InvalidPinException
was raised with a message as currently the log message doesn't print any information about the exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the logs, don't we see the
LOGGER.warning(f"Failed to detect pin-tip location, will retry with next image: {e}")
message each time this exception is raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
I'm not sure how much sense the explanation makes - since there are plenty of awaits. However I suppose none of them actually do any IO so perhaps the thread never yields.
This does seem like quite a significant thing to omit from the documentation for wait_for()
though. Unless that's implied by the caveat regarding having to wait for the cancellation of the future.
I guess this means this could be something to watch out for in future where we are waiting for something which is only dependent on the values of soft signals and doesn't involve real IO.
I think this is worth discussing a bit further afield, maybe with @callumforrester or @coretl. It feels like a gotcha that may affect |
I've tried to reproduce this in ophyd-async but can't so far. This passes without needing the async def test_observe_value_times_out_with_busy_sleep():
sig, setter = soft_signal_r_and_setter(float)
async def tick():
for i in range(5):
await asyncio.sleep(0.1)
setter(i + 1)
recv = []
async def watch():
async for val in observe_value(sig, timeout=0.5):
time.sleep(0.15)
recv.append(val)
t = asyncio.create_task(tick())
start = time.time()
try:
with pytest.raises(asyncio.TimeoutError):
await asyncio.wait_for(watch(), timeout=0.2)
assert recv == [0, 1]
assert time.time() - start == pytest.approx(0.3, abs=0.05)
finally:
t.cancel() |
Ok, manages to reproduce it in the same manner as your test: async def test_observe_value_times_out_with_no_external_task():
sig, setter = soft_signal_r_and_setter(float)
recv = []
async def watch():
async for val in observe_value(sig):
# If you put "await asyncio.sleep(0.00001)" here it times out
recv.append(val)
setter(val + 1)
start = time.time()
with pytest.raises(asyncio.TimeoutError):
await asyncio.wait_for(watch(), timeout=0.2)
assert recv
assert time.time() - start == pytest.approx(0.3, abs=0.05) However I question if this is your actual failure mode on I03, as this depends on the busy task prompting the callback in the same asyncio coro, while on I03 you have to yield already to an already busy queue. I'll see if I can reproduce this with an actual EPICS signal... |
It's worth noting that we only saw this issue once a couple of weeks ago and have not seen it again since so it's likely some timing race condition that's quite rare. |
Closing in favour of bluesky/ophyd-async#648 (comment). Let's reopen this if we see the issue again (after the ophyd-async change is live on i03) |
Potential fix for DiamondLightSource/mx-bluesky#634. See DiamondLightSource/mx-bluesky#634 (comment) for explanation
Instructions to reviewer on how to test:
To test:
pip install pytest-timeouts
asyncio.sleep(0)
Checks for reviewer
dodal connect ${BEAMLINE}