Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pin tip detect getting stuck in loop #897

Closed
wants to merge 4 commits into from

Conversation

olliesilvester
Copy link
Collaborator

Potential fix for DiamondLightSource/mx-bluesky#634. See DiamondLightSource/mx-bluesky#634 (comment) for explanation

Instructions to reviewer on how to test:

To test:

  1. Run pip install pytest-timeouts
  2. Confirm new test fails without the added asyncio.sleep(0)
  3. Confirm test passes with it

Checks for reviewer

  • Would the PR title make sense to a scientist on a set of release notes
  • If a new device has been added does it follow the standards
  • If changing the API for a pre-existing device, ensure that any beamlines using this device have updated their Bluesky plans accordingly
  • Have the connection tests for the relevant beamline(s) been run via dodal connect ${BEAMLINE}

Copy link

codecov bot commented Nov 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.63%. Comparing base (fc296b5) to head (94e62c3).
Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #897   +/-   ##
=======================================
  Coverage   95.63%   95.63%           
=======================================
  Files         128      128           
  Lines        5273     5274    +1     
=======================================
+ Hits         5043     5044    +1     
  Misses        230      230           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -153,6 +153,7 @@ async def _set_triggered_tip():
"""
async for value in observe_value(self.array_data):
try:
await asyncio.sleep(0)
location = await self._get_tip_and_edge_data(value)
self._set_triggered_values(location)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if the InvalidPinException was raised with a message as currently the log message doesn't print any information about the exception.

Copy link
Collaborator Author

@olliesilvester olliesilvester Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the logs, don't we see the
LOGGER.warning(f"Failed to detect pin-tip location, will retry with next image: {e}")
message each time this exception is raised?

Copy link
Contributor

@rtuck99 rtuck99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

I'm not sure how much sense the explanation makes - since there are plenty of awaits. However I suppose none of them actually do any IO so perhaps the thread never yields.

This does seem like quite a significant thing to omit from the documentation for wait_for() though. Unless that's implied by the caveat regarding having to wait for the cancellation of the future.

I guess this means this could be something to watch out for in future where we are waiting for something which is only dependent on the values of soft signals and doesn't involve real IO.

@DominicOram
Copy link
Contributor

I think this is worth discussing a bit further afield, maybe with @callumforrester or @coretl. It feels like a gotcha that may affect dodal and ophyd-async a lot in the future.

@coretl
Copy link
Collaborator

coretl commented Nov 13, 2024

I've tried to reproduce this in ophyd-async but can't so far. This passes without needing the asyncio.sleep:

async def test_observe_value_times_out_with_busy_sleep():
    sig, setter = soft_signal_r_and_setter(float)

    async def tick():
        for i in range(5):
            await asyncio.sleep(0.1)
            setter(i + 1)

    recv = []

    async def watch():
        async for val in observe_value(sig, timeout=0.5):
            time.sleep(0.15)
            recv.append(val)

    t = asyncio.create_task(tick())
    start = time.time()
    try:
        with pytest.raises(asyncio.TimeoutError):
            await asyncio.wait_for(watch(), timeout=0.2)
        assert recv == [0, 1]
        assert time.time() - start == pytest.approx(0.3, abs=0.05)
    finally:
        t.cancel()

@coretl
Copy link
Collaborator

coretl commented Nov 13, 2024

Ok, manages to reproduce it in the same manner as your test:

async def test_observe_value_times_out_with_no_external_task():
    sig, setter = soft_signal_r_and_setter(float)

    recv = []

    async def watch():
        async for val in observe_value(sig):
            # If you put "await asyncio.sleep(0.00001)" here it times out
            recv.append(val)
            setter(val + 1)

    start = time.time()
    with pytest.raises(asyncio.TimeoutError):
        await asyncio.wait_for(watch(), timeout=0.2)
    assert recv
    assert time.time() - start == pytest.approx(0.3, abs=0.05)

However I question if this is your actual failure mode on I03, as this depends on the busy task prompting the callback in the same asyncio coro, while on I03 you have to yield already to an already busy queue. I'll see if I can reproduce this with an actual EPICS signal...

@DominicOram
Copy link
Contributor

However I question if this is your actual failure mode on I03, as this depends on the busy task prompting the callback in the same asyncio coro, while on I03 you have to yield already to an already busy queue. I'll see if I can reproduce this with an actual EPICS signal...

It's worth noting that we only saw this issue once a couple of weeks ago and have not seen it again since so it's likely some timing race condition that's quite rare.

@olliesilvester
Copy link
Collaborator Author

Closing in favour of bluesky/ophyd-async#648 (comment). Let's reopen this if we see the issue again (after the ophyd-async change is live on i03)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants