Fix pin tip detect getting stuck in loop #897

olliesilvester · 2024-11-11T13:47:43Z

Potential fix for DiamondLightSource/mx-bluesky#634. See DiamondLightSource/mx-bluesky#634 (comment) for explanation

Instructions to reviewer on how to test:

To test:

Run pip install pytest-timeouts
Confirm new test fails without the added asyncio.sleep(0)
Confirm test passes with it

Checks for reviewer

Would the PR title make sense to a scientist on a set of release notes
If a new device has been added does it follow the standards
If changing the API for a pre-existing device, ensure that any beamlines using this device have updated their Bluesky plans accordingly
Have the connection tests for the relevant beamline(s) been run via dodal connect ${BEAMLINE}

codecov · 2024-11-11T13:52:16Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.63%. Comparing base (fc296b5) to head (94e62c3).
Report is 4 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #897   +/-   ##
=======================================
  Coverage   95.63%   95.63%           
=======================================
  Files         128      128           
  Lines        5273     5274    +1     
=======================================
+ Hits         5043     5044    +1     
  Misses        230      230

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rtuck99 · 2024-11-12T17:06:53Z

src/dodal/devices/oav/pin_image_recognition/__init__.py

@@ -153,6 +153,7 @@ async def _set_triggered_tip():
            """
            async for value in observe_value(self.array_data):
                try:
+                    await asyncio.sleep(0)
                    location = await self._get_tip_and_edge_data(value)
                    self._set_triggered_values(location)
                except Exception as e:


Would be good if the InvalidPinException was raised with a message as currently the log message doesn't print any information about the exception.

In the logs, don't we see the
LOGGER.warning(f"Failed to detect pin-tip location, will retry with next image: {e}")
message each time this exception is raised?

rtuck99

Approved.

I'm not sure how much sense the explanation makes - since there are plenty of awaits. However I suppose none of them actually do any IO so perhaps the thread never yields.

This does seem like quite a significant thing to omit from the documentation for wait_for() though. Unless that's implied by the caveat regarding having to wait for the cancellation of the future.

I guess this means this could be something to watch out for in future where we are waiting for something which is only dependent on the values of soft signals and doesn't involve real IO.

DominicOram · 2024-11-13T09:31:37Z

I think this is worth discussing a bit further afield, maybe with @callumforrester or @coretl. It feels like a gotcha that may affect dodal and ophyd-async a lot in the future.

coretl · 2024-11-13T16:23:05Z

I've tried to reproduce this in ophyd-async but can't so far. This passes without needing the asyncio.sleep:

async def test_observe_value_times_out_with_busy_sleep():
    sig, setter = soft_signal_r_and_setter(float)

    async def tick():
        for i in range(5):
            await asyncio.sleep(0.1)
            setter(i + 1)

    recv = []

    async def watch():
        async for val in observe_value(sig, timeout=0.5):
            time.sleep(0.15)
            recv.append(val)

    t = asyncio.create_task(tick())
    start = time.time()
    try:
        with pytest.raises(asyncio.TimeoutError):
            await asyncio.wait_for(watch(), timeout=0.2)
        assert recv == [0, 1]
        assert time.time() - start == pytest.approx(0.3, abs=0.05)
    finally:
        t.cancel()

coretl · 2024-11-13T16:53:38Z

Ok, manages to reproduce it in the same manner as your test:

async def test_observe_value_times_out_with_no_external_task():
    sig, setter = soft_signal_r_and_setter(float)

    recv = []

    async def watch():
        async for val in observe_value(sig):
            # If you put "await asyncio.sleep(0.00001)" here it times out
            recv.append(val)
            setter(val + 1)

    start = time.time()
    with pytest.raises(asyncio.TimeoutError):
        await asyncio.wait_for(watch(), timeout=0.2)
    assert recv
    assert time.time() - start == pytest.approx(0.3, abs=0.05)

However I question if this is your actual failure mode on I03, as this depends on the busy task prompting the callback in the same asyncio coro, while on I03 you have to yield already to an already busy queue. I'll see if I can reproduce this with an actual EPICS signal...

DominicOram · 2024-11-14T10:11:42Z

However I question if this is your actual failure mode on I03, as this depends on the busy task prompting the callback in the same asyncio coro, while on I03 you have to yield already to an already busy queue. I'll see if I can reproduce this with an actual EPICS signal...

It's worth noting that we only saw this issue once a couple of weeks ago and have not seen it again since so it's likely some timing race condition that's quite rare.

olliesilvester · 2024-11-14T16:03:18Z

Closing in favour of bluesky/ophyd-async#648 (comment). Let's reopen this if we see the issue again (after the ophyd-async change is live on i03)

olliesilvester added 3 commits November 11, 2024 13:23

Add asyncio sleep to async for loop

69cea7b

Add test in dodal rather than mx-bluesky

7dfae18

Add pytest-timeouts as dependancy

3553cac

olliesilvester requested review from DominicOram, rtuck99 and dperl-dls November 11, 2024 13:48

Merge branch 'main' into mx_bluesky_634_fix_pin_tip_detect

94e62c3

rtuck99 reviewed Nov 12, 2024

View reviewed changes

rtuck99 approved these changes Nov 12, 2024

View reviewed changes

coretl mentioned this pull request Nov 13, 2024

Yield in each loop of observe_value bluesky/ophyd-async#648

Merged

coretl mentioned this pull request Nov 14, 2024

Add epics test for busy signals bluesky/ophyd-async#650

Draft

olliesilvester closed this Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pin tip detect getting stuck in loop #897

Fix pin tip detect getting stuck in loop #897

olliesilvester commented Nov 11, 2024

codecov bot commented Nov 11, 2024 •

edited

Loading

rtuck99 Nov 12, 2024

olliesilvester Nov 13, 2024 •

edited

Loading

rtuck99 left a comment

DominicOram commented Nov 13, 2024

coretl commented Nov 13, 2024

coretl commented Nov 13, 2024

DominicOram commented Nov 14, 2024

olliesilvester commented Nov 14, 2024

Fix pin tip detect getting stuck in loop #897

Fix pin tip detect getting stuck in loop #897

Conversation

olliesilvester commented Nov 11, 2024

Instructions to reviewer on how to test:

Checks for reviewer

codecov bot commented Nov 11, 2024 • edited Loading

Codecov Report

rtuck99 Nov 12, 2024

Choose a reason for hiding this comment

olliesilvester Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

rtuck99 left a comment

Choose a reason for hiding this comment

DominicOram commented Nov 13, 2024

coretl commented Nov 13, 2024

coretl commented Nov 13, 2024

DominicOram commented Nov 14, 2024

olliesilvester commented Nov 14, 2024

codecov bot commented Nov 11, 2024 •

edited

Loading

olliesilvester Nov 13, 2024 •

edited

Loading