-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix search issue with special characters, and escape them in SQL. #844 #846
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Garvin Hicking <38074677+fe-hicking@users.noreply.github.com>
Co-authored-by: Garvin Hicking <38074677+fe-hicking@users.noreply.github.com>
@@ -883,7 +883,8 @@ function &serendipity_searchEntries($term, $limit = '', $searchresults = '') { | |||
$term = str_replace('"', '"', $term); | |||
$relevance_enabled = true; | |||
if (preg_match('@["\+\-\*~<>\(\)]+@', $term)) { | |||
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)"; | |||
$term = serendipity_db_escape_string($term); | |||
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('\"$term\"' IN BOOLEAN MODE)"; | |||
} else { | |||
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, we should use db_escape_string here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that mean you can remove if cond if you apply it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, we should use db_escape_string here too?
I will also fix db_escape_string for mysqli because it use deprecated mysqli alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @@[\+\-\*~<>\(\)"].[\S]*@
is currently wrong there, maybe a copy+paste mistake?
Anyhow... I'm afraid we regular-expression based evaluation of this might not bring us further. Because the regexp would match true on a string like: "+valid +valid +".
So I think we have two options here:
A: Introduce a tokenizer that analyzes the whole string and ALL parts in it, and performs checks on it
B: Execute the query and check if it works or not
Approach A will be quite complex and way too much effort to do this reliably. Not sure there is a parser/tokenizer for this out there, maybe part of Doctrine DBAL.
Approach B is a bit nasty but might be little effort. We could execute the query, use the "expect error" flag, and check if there is a result, something like:
$searchMode = 'non-boolean';
if (preg_match('@["\+\-\*~<>\(\)]+@', $term)) {
$termRows = serendipity_db_query("SELECT * FROM serendipity_entries WHERE $term LIMIT 1", true, 'both', false, false, false, true); // Last "true" is important here
if ($termRows !== false) {
$searchMode = 'boolean'; // Terms works as boolean expression
}
}
if ($searchMode === 'bolean') {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term' IN BOOLEAN MODE)";
} else {
$cond['find_part'] = "MATCH(title,body,extended) AGAINST('$term')";
}
(untested, just a thought)
…arch anyway + change alias to mysqli_real_escape_string
I'm not sure the approach will work, but I might have misunderstood the issue. The example string failing was with a Also, the old code had a switch there between boolean and regular mode, won't that still be needed? Wasn't the regular I absolutely might miss something here :) But I'd suggest to check whether the search still fully works with this patch applied. |
Oohh. Right. I see it now - the patch uses "" quotes around the term. So it actually removes the "BOOLEAN MODE" functionality for exact matching. So we need something like: Search input A: This is a long search query Input A should match "This is a long search query" without boolean mode. (Disclaimer: I haven't checked the initial bugreport thoroughly) |
So just bring back the IF like the commit before was ? |
…not + change regex to only catch BOOLEAN Operator that prefix a word
Thanks for the further refinement - that looks promising, I couldn't test various scenarious myself though. Can you tell which examples you used where it works and where it broke before? I still feel a bit like this can still lead to cases where "BOOLEAN MODE" would be invalid, but I doubt we can hack together a more solid parser on our own... |
Ok I misunderstood the boolean operator systems. So I bring back the If statement and boolean mode or not. I think problem was the regex who match if you put a single operator like "test -". |
So now you enter boolean mode only with {operator}whatever. no more "test -" goes to boolean mode anymore it goes on "normal" mode that cause no problems. Feel free to test or comment if its not enough. |
Oh, that's a good idea! |
Thank @GuillaumeValadas. In my test this works. In the pathological case with the - at the end the logic mode was disabled, and with proper search terms it worked, also other logic mode searches worked fine. I think this is definitely an improvement. @garvinhicking Let's merge this! :) Or is there still a blocker from your side? |
@onli I trust your decission, go ahead :) |
I write a fix to escape specials char and escape them in SQL to allow research without 500 server error.
Its related to the issue #844
Hope it will help.