How to get a (async) sentinel connection pool working correctly #3433

mzealey · 2024-11-14T17:25:45Z

Version: 5.0.1

Platform: Python 3.12 on Linux

Description: The end-goal is to get a reliable writable or read-only connection from a set of redis servers running sentinel on top. And to get this efficiently from a pool. If we just had a single server we could simply do:

import redis.asyncio as aioredis
pool = aioredis.ConnectionPool.from_url(...)

while True:
  conn = aioredis.Redis.from_pool(pool)
  ... do stuff with the connection ...
  await conn.aclose()

But we want reliability in case the server fails. So we connect to sentinel:

sentinel = aioredis.sentinel.Sentinel(...)

and the docs then advise using .slave_for() or .master_for(). However looking at tcpdump these establish a new connection to the target server each time, rather than using any sort of long-lived pooling on the connection.

Looking at the code, for example https://github.com/redis/redis-py/blob/master/redis/asyncio/sentinel.py#L349-L351 (but non-async and slave are the same) it seems to generate a new connection pool on each of the calls to master_for/slave_for. A naive fix would be to have a function like .master_for_pool() which returns a pool on the master, from which we can pull new connections, but this wouldn't handle sentinel failovers correctly.

So I think there's a need for some sort of SentinelPool abstract over the top of the Sentinel layer which will maintain connection pools to all of the servers in the cluster and dish those connections out to the caller according to the availability of that server and the state of sentinel on the cluster.

I may well be missing something obvious here but from looking at the docs/code I cannot see a simple way to achieve what I'd like to do, and what I'd think would be the purpose of sentinel?

The text was updated successfully, but these errors were encountered:

mzealey · 2024-11-15T21:31:45Z

OK so after a fair bit of experimentation, I think this is roughly the way to go:

  class MoyaSentinelManagedConnection(SentinelManagedConnection):
      """
      Patched version of SentinelManagedConnection that handles timeout errors also.
      """
  
      async def _connect_retry(self):  # type: ignore
          if self._reader:
              return  # already connected
          if self.connection_pool.is_master:
              await self.connect_to(await self.connection_pool.get_master_address())
          else:
              async for slave in self.connection_pool.rotate_slaves():
                  # print(f"Trying slave {slave}")
                  try:
                      return await self.connect_to(slave)
                  except (TimeoutError, ConnectionError):
                      continue
  
              raise SlaveNotFoundError  # Never be here
  
  
  class MoyaSentinelManagedSSLConnection(SentinelManagedConnection, SSLConnection):
      pass

sentinel = aioredis.sentinel.Sentinel(...)
pool = SentinelConnectionPool(
          ...,
          is_master=not readonly,
          check_connection=True,
          connection_class=MoyaSentinelManagedSSLConnection if kwargs.get("ssl", False) else MoyaSentinelManagedConnection,
)

@asynccontextmanager
async def get_client():
      client = aioredis.Redis(connection_pool=pool)
      try:
          yield client
      finally:
          await client.aclose()  # Releases the connection to the pool but doesn't close it

This seems to handle failovers and restores reasonably gracefully, and doesn't close the connection.

There's an awful lot of subtle gotchas here though:

Don't use Redis.from_pool() as otherwise aclose() closes the connections automatically rather than just releasing back to the pool
You have to have socket_connect_timeout and socket_timeout set correctly to eg 1 second, perhaps also retry=Retry(ExponentialBackoff(), 3) and "retry_on_error": [ConnectionError, TimeoutError] are also useful - we're setting these as standard as required on the standalone client anyway..
Had to patch SentinelManagedConnection to handle TimeoutError as retryable also otherwise in my test docker environment at least it was just blowing up (as TimeoutError is not a subclass of ConnectionError)
Pool is created similarly to how .master_for/.slave_for do, but returning a persistent pool - hacky but I don't see an clean API to handle this at present

I'm hoping that this feedback can be taken on board at some point by the maintainers in order to try to provide the necessary fixes/hooks/documentation to be able to do this in a cleaner manner client-side

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get a (async) sentinel connection pool working correctly #3433

How to get a (async) sentinel connection pool working correctly #3433

mzealey commented Nov 14, 2024

mzealey commented Nov 15, 2024

How to get a (async) sentinel connection pool working correctly #3433

How to get a (async) sentinel connection pool working correctly #3433

Comments

mzealey commented Nov 14, 2024

mzealey commented Nov 15, 2024