-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split long messages to avoid being kicked for excess flood #191
Comments
Hi, You can set |
Do servers communicate what maximum message length they accept? In that case, I would like |
So, the server I'm connecting to is GameSurge. Here is the configuration the server sends to the bot:
The bot subsequently prints these values as:
The When the bot sends a message that is >512 characters long (I can see that message in the debug log), the server replies:
That could mean that either:
HTH. |
As I remember you can track what the bot receive/send by using |
After debugging the issue, I found two problems, and one suspicious thing:
Lines 272 to 274 in 2d7d65c
Lines 300 to 304 in 2d7d65c
Here is a debugging patch that I used to diagnose the above issues: diff --git a/irc3/__init__.py b/irc3/__init__.py
index 2cd5fe7..12ee2c9 100644
--- a/irc3/__init__.py
+++ b/irc3/__init__.py
@@ -179,6 +179,7 @@ class IrcBot(base.IrcObject):
def send_line(self, data, nowait=False):
"""send a line to the server. replace CR by spaces"""
+ self.log.debug("DBG: (%d) %s", len(data), data)
data = data.replace('\n', ' ').replace('\r', ' ')
f = asyncio.Future(loop=self.loop)
if self.queue is not None and nowait is False:
@@ -225,15 +226,23 @@ class IrcBot(base.IrcObject):
def privmsg(self, target, message, nowait=False):
"""send a privmsg to target"""
if message:
- messages = utils.split_message(message, self.config.max_length)
if isinstance(target, DCCChat):
+ messages = utils.split_message(message, self.config.max_length)
for message in messages:
target.send_line(message)
elif target:
+ command_prefix = 'PRIVMSG %s :' % target
+ self.log.debug("MAX_LENGTH: %d" % self.config.max_length)
+ assert self.config.max_length > len(command_prefix) + 2
+ self.log.debug("SPLIT: %s" % message)
+ messages = utils.split_message(message, self.config.max_length - len(command_prefix) - 2, self.log)
+ self.log.debug("MESSAGES: %s", messages)
f = None
for message in messages:
- f = self.send_line('PRIVMSG %s :%s' % (target, message),
+ self.log.debug("DBG: MSG (%d) %s", len(message), message)
+ f = self.send_line('%s%s' % (command_prefix, message),
nowait=nowait)
+ break
return f
def action(self, target, message, nowait=False):
diff --git a/irc3/utils.py b/irc3/utils.py
index 4f158f8..fb91620 100644
--- a/irc3/utils.py
+++ b/irc3/utils.py
@@ -164,8 +164,38 @@ class IrcString(BaseString):
STRIPPED_CHARS = '\t '
-def split_message(message, max_length):
+def split_message(message, max_length, log=None):
"""Split long messages"""
+ def utf8_lead_byte(b):
+ '''A UTF-8 intermediate byte starts with the bits 10xxxxxx.'''
+ return (b & 0xC0) != 0x80
+
+ def utf8_byte_truncate(text, max_bytes):
+ '''If text[max_bytes] is not a lead byte, back up until a lead byte is
+ found and truncate before that character.'''
+ log.debug("IN")
+ utf8 = text.encode('utf8')
+ log.debug("utf8: %s", utf8)
+ if len(utf8) <= max_bytes:
+ log.debug("SPLIT: max bytes")
+ return utf8
+ log.debug("max_bytes: %d", max_bytes)
+ i = max_bytes
+ while i > 0 and not utf8_lead_byte(utf8[i]):
+ log.debug("I: %d", i)
+ i -= 1
+ log.debug("RET")
+ return utf8[:i]
+
+ if log is not None:
+ log.debug("SPLIT MSG: %d - %s", max_length, message)
+ s = utf8_byte_truncate(message, max_length)
+ log.debug("S: %s", s)
+ s = s.decode('utf8')
+ log.debug("DECODED")
+ yield s
+ return
+
if len(message) > max_length:
for message in textwrap.wrap(message, max_length):
yield message I used this implementation to split strings according to byte length: https://stackoverflow.com/a/43848928 HTH. |
Thanks for digging into that. I don't like the idea of encoding/decoding the whole messages to... re-encode them before sending to the server. encoding/decoding are very CPU expensive AFAIK. |
Any chance we'll get a fix soon? My bot trips on half of the URLs (apparently the internet has decided that having 1k+ characters long sentences in the metadata is a good idea). I understand the concern about the encoding round trips, however there doesn't seem to be any alternative that I can see. |
I'll be ok with that if there is no need to re-decode the data. I'm pretty sure a few I'll try to give it a try soon... |
For now I've added a prefix argument to split_message (inspired by your snippet). This should reduce your amount of errors. Maybe it's enough to "fix" the problem if you set max_length to an appropriate value to allow some extra unicode characters.. Maybe 500 instead of 512. Looks like supporting both unicode and bytes in the whole process will not be easy at all.. |
Hi,
My bot sends messages to a channel whose length can sometimes be too long for the server to accept.
Would it be possible for
irc3
(maybe via an opt-in mechanism) to split long message queries into several ones, to avoid the bot being kicked for excess flood? The message itself could be split on word boundaries.This could theoretically be handled on the bot's side, but I don't think it would be very robust, as the bot doesn't know exactly what the resulting IRC query will look like (I assume something like
PRIVMSG recipient :message
), and thus not be able to know how many characters should be subtracted to the theoretical maximum message size (512 I believe).HTH.
The text was updated successfully, but these errors were encountered: