Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle connection resets in the status server more gracefully #5881

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Mar 14, 2024

  1. Handle connection resets in the status server more gracefully

    Connection resets due to network instability can lead to the status
    server not catching a test status, an asyncio error like
    
    full.log│2024-03-08 03:08:44,053 asyncio base_events      L1744 ERROR| Task exception was never retrieved                                                                                                                                  │
    full.log│future: <Task finished name='Task-2038' coro=<StatusServer.cb() done, defined at /usr/lib/python3.10/site-packages/avocado/core/status/server.py:51> exception=ConnectionResetError(104, 'Connection reset by peer')>             ├
    full.log│Traceback (most recent call last):                                                                                                                                                                                                │
    full.log│  File "/usr/lib/python3.10/site-packages/avocado/core/status/server.py", line 53, in cb                                                                                                                                          │
    full.log│    raw_message = await reader.readline()                                                                                                                                                                                         │
    full.log│  File "/usr/lib64/python3.10/asyncio/streams.py", line 525, in readline                                                                                                                                                          │
    full.log│    line = await self.readuntil(sep)                                                                                                                                                                                              │
    full.log│  File "/usr/lib64/python3.10/asyncio/streams.py", line 617, in readuntil                                                                                                                                                         │
    full.log│    await self._wait_for_data('readuntil')                                                                                                                                                                                        │
    full.log│  File "/usr/lib64/python3.10/asyncio/streams.py", line 502, in _wait_for_data                                                                                                                                                    │
    full.log│    await self._waiter                                                                                                                                                                                                            │
    full.log│  File "/usr/lib64/python3.10/asyncio/selector_events.py", line 854, in _read_ready__data_received                                                                                                                                │
    full.log│    data = self._sock.recv(self.max_size)                                                                                                                                                                                         │
    full.log│ConnectionResetError: [Errno 104] Connection reset by peer
    
    and worst yet to test tasks hanging indefinitely without the job
    ever completing properly. This was mostly observed in cases of
    LXC and remote spawner isolation where the isolated task process
    completes but the task on the side of the task machine remains
    unfinished.
    
    Signed-off-by: Plamen Dimitrov <plamen.dimitrov@intra2net.com>
    pevogam committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    673324f View commit details
    Browse the repository at this point in the history