Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WordFilter] Better validate and handle regex #646

Open
Injabie3 opened this issue May 30, 2023 · 4 comments · May be fixed by #665
Open

[WordFilter] Better validate and handle regex #646

Injabie3 opened this issue May 30, 2023 · 4 comments · May be fixed by #665
Assignees
Labels
bug Umbrella label for things that shouldn't happen and should be addressed relatively soon.

Comments

@Injabie3
Copy link
Member

In other cogs that hook into WordFilter, we hit a multiple repeat error on the following regex string: bl[o0]*{0,3}w j[o0]*{0,3}bs. Of particular interest is the * operator, which was what it was complaining about, because {0,3} is trying to operate on * which doesn't make sense.

We should find a way to better handle this on insertion and within the _filterWord function.

Logs below:

ren  | Traceback (most recent call last):
ren  |   File "/data/venv/lib/python3.11/site-packages/discord/client.py", line 441, in _run_event
ren  |     await coro(*args, **kwargs)
ren  |   File "/data/cogs/CogManager/cogs/highlight/highlight.py", line 825, in onMessage
ren  |     await self.checkHighlights(msg)
ren  |   File "/data/cogs/CogManager/cogs/highlight/highlight.py", line 665, in checkHighlights
ren  |     elif await self.wordFilter.containsFilterableWords(msg):
ren  |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/data/cogs/CogManager/cogs/wordfilter/wordfilter.py", line 452, in containsFilterableWords
ren  |     filteredMsg = _filterWord(filters, filteredMsg)
ren  |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/data/cogs/CogManager/cogs/wordfilter/wordfilter.py", line 640, in _filterWord
ren        return re.sub(regex, _censorMatch, string, flags=re.IGNORECASE)
ren  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/__init__.py", line 185, in sub
ren  |     return _compile(pattern, flags).sub(repl, string, count)
ren  |            ^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/__init__.py", line 294, in _compile
ren  |     p = _compiler.compile(pattern, flags)
ren  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_compiler.py", line 743, in compile
ren  |     p = _parser.parse(p, flags)
ren  |         ^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 980, in parse
ren  |     p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
ren  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 455, in _parse_sub
ren  |     itemsappend(_parse(source, state, verbose, nested + 1,
ren  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 863, in _parse
ren  |     p = _parse_sub(source, state, sub_verbose, nested + 1)
ren  |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 455, in _parse_sub
ren  |     itemsappend(_parse(source, state, verbose, nested + 1,
ren  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ren  |   File "/usr/local/lib/python3.11/re/_parser.py", line 685, in _parse
ren  |     raise source.error("multiple repeat",
ren  | re.error: multiple repeat at position 2848
@quachtridat
Copy link
Member

Regarding the particular RegEx pattern concerned, is it a pattern that we inserted into the cog? If the problem doesn't take place programmatically, then I'd see it as a human error and we need to fix the problematic patterns.
Am I misunderstanding the issue post here?

@Injabie3
Copy link
Member Author

This pattern was restored from a backup, so it was entered in by someone, but we should validate the pattern before adding it in.

@Injabie3
Copy link
Member Author

Might be better to handle this without raising an exception; could log it in the console instead so as to keep it working

@quachtridat
Copy link
Member

This pattern was restored from a backup, so it was entered in by someone, but we should validate the pattern before adding it in.

Would constructing a re.Pattern object work? I'm not sure how we go about pattern validation. If anything, we should only put in good, valid patterns.

Might be better to handle this without raising an exception; could log it in the console instead so as to keep it working

Right. Some error log lines should definitely help.

quachtridat added a commit to quachtridat/Ren that referenced this issue Oct 20, 2024
This commit resolves SFUAnime#646 by compiling RegEx before adding a RegEx
string into WordFilter's books. With this commit, at word filter
insertion time (`[p]wordfilter regex add <word>`), `re.compile` will be
invoked, and if there is a `re.error`, a message will be sent to where
the command has been invoked to notify the user executing the command
about the invalid RegEx pattern that was input. Without this commit, the
cog will crash when it tries to substitute/match/search invalid RegEx
patterns during handling of an `on_message` event. See SFUAnime#646 for an
example traceback.
quachtridat added a commit to quachtridat/Ren that referenced this issue Oct 20, 2024
This commit resolves SFUAnime#646 by compiling RegEx before adding a RegEx
string into WordFilter's books. With this commit, at word filter
insertion time (`[p]wordfilter regex add <word>`), `re.compile` will be
invoked, and if there is a `re.error`, a message will be sent to where
the command has been invoked to notify the user executing the command
about the invalid RegEx pattern that was input. Without this commit, the
cog will crash when it tries to substitute/match/search invalid RegEx
patterns during handling of an `on_message` event. See SFUAnime#646 for an
example traceback.
@quachtridat quachtridat linked a pull request Oct 20, 2024 that will close this issue
@quachtridat quachtridat added the bug Umbrella label for things that shouldn't happen and should be addressed relatively soon. label Oct 20, 2024
@quachtridat quachtridat self-assigned this Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Umbrella label for things that shouldn't happen and should be addressed relatively soon.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants