Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get a (async) sentinel connection pool working correctly #3433

Open
mzealey opened this issue Nov 14, 2024 · 1 comment
Open

How to get a (async) sentinel connection pool working correctly #3433

mzealey opened this issue Nov 14, 2024 · 1 comment

Comments

@mzealey
Copy link

mzealey commented Nov 14, 2024

Version: 5.0.1

Platform: Python 3.12 on Linux

Description: The end-goal is to get a reliable writable or read-only connection from a set of redis servers running sentinel on top. And to get this efficiently from a pool. If we just had a single server we could simply do:

import redis.asyncio as aioredis
pool = aioredis.ConnectionPool.from_url(...)

while True:
  conn = aioredis.Redis.from_pool(pool)
  ... do stuff with the connection ...
  await conn.aclose()

But we want reliability in case the server fails. So we connect to sentinel:

sentinel = aioredis.sentinel.Sentinel(...)

and the docs then advise using .slave_for() or .master_for(). However looking at tcpdump these establish a new connection to the target server each time, rather than using any sort of long-lived pooling on the connection.

Looking at the code, for example https://github.com/redis/redis-py/blob/master/redis/asyncio/sentinel.py#L349-L351 (but non-async and slave are the same) it seems to generate a new connection pool on each of the calls to master_for/slave_for. A naive fix would be to have a function like .master_for_pool() which returns a pool on the master, from which we can pull new connections, but this wouldn't handle sentinel failovers correctly.

So I think there's a need for some sort of SentinelPool abstract over the top of the Sentinel layer which will maintain connection pools to all of the servers in the cluster and dish those connections out to the caller according to the availability of that server and the state of sentinel on the cluster.

I may well be missing something obvious here but from looking at the docs/code I cannot see a simple way to achieve what I'd like to do, and what I'd think would be the purpose of sentinel?

@mzealey
Copy link
Author

mzealey commented Nov 15, 2024

OK so after a fair bit of experimentation, I think this is roughly the way to go:

  class MoyaSentinelManagedConnection(SentinelManagedConnection):
      """
      Patched version of SentinelManagedConnection that handles timeout errors also.
      """
  
      async def _connect_retry(self):  # type: ignore
          if self._reader:
              return  # already connected
          if self.connection_pool.is_master:
              await self.connect_to(await self.connection_pool.get_master_address())
          else:
              async for slave in self.connection_pool.rotate_slaves():
                  # print(f"Trying slave {slave}")
                  try:
                      return await self.connect_to(slave)
                  except (TimeoutError, ConnectionError):
                      continue
  
              raise SlaveNotFoundError  # Never be here
  
  
  class MoyaSentinelManagedSSLConnection(SentinelManagedConnection, SSLConnection):
      pass

sentinel = aioredis.sentinel.Sentinel(...)
pool = SentinelConnectionPool(
          ...,
          is_master=not readonly,
          check_connection=True,
          connection_class=MoyaSentinelManagedSSLConnection if kwargs.get("ssl", False) else MoyaSentinelManagedConnection,
)

@asynccontextmanager
async def get_client():
      client = aioredis.Redis(connection_pool=pool)
      try:
          yield client
      finally:
          await client.aclose()  # Releases the connection to the pool but doesn't close it

This seems to handle failovers and restores reasonably gracefully, and doesn't close the connection.

There's an awful lot of subtle gotchas here though:

  1. Don't use Redis.from_pool() as otherwise aclose() closes the connections automatically rather than just releasing back to the pool
  2. You have to have socket_connect_timeout and socket_timeout set correctly to eg 1 second, perhaps also retry=Retry(ExponentialBackoff(), 3) and "retry_on_error": [ConnectionError, TimeoutError] are also useful - we're setting these as standard as required on the standalone client anyway..
  3. Had to patch SentinelManagedConnection to handle TimeoutError as retryable also otherwise in my test docker environment at least it was just blowing up (as TimeoutError is not a subclass of ConnectionError)
  4. Pool is created similarly to how .master_for/.slave_for do, but returning a persistent pool - hacky but I don't see an clean API to handle this at present

I'm hoping that this feedback can be taken on board at some point by the maintainers in order to try to provide the necessary fixes/hooks/documentation to be able to do this in a cleaner manner client-side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant