False positives: "can use", "via ssh" #28

quackduck · 2022-04-17T21:29:39Z

Of course, I could add these to the false positives list, but maybe there's a better, more general way to tackle these.

TwiN · 2022-04-17T23:32:59Z

Yeah, adding canuse and viassh to the default list of false positives is probably going to be the easiest way to tackle this.

quackduck · 2022-04-21T04:46:07Z

True. My issue was more about whether there could be a way to detect these innocent legitimate two word messages.

TwiN · 2022-04-22T23:47:50Z

Yeah there isn't really one besides using the false positives list.

You could create a PR to add them to the default false positives if you'd like:

go-away/falsepositives.go

Line 4 in b5570db

var DefaultFalsePositives = []string{

finnbear · 2022-05-10T06:59:45Z

Yeah there isn't really one besides using the false positives list.

It would take some work on your end, but you could process my comprehensive false positives list in a code generator, as follows:

Read file line by line
Feed each line into goaway
If it detects something, add it as a false positive (or tell me if something bad ended up in the list 😉)

If you're wondering, I generated it using a dictionary search of words and pairs of words, combined with my own additions.

The downside is that my filter operates a bit differently (has some interesting heuristics), and doesn't require certain false positives to be explicitly included in its list. In these cases, you would still need to maintain your own false positive list and/or replicate the dictionary search.

quackduck · 2022-05-10T07:10:17Z

Thanks for commenting! @TwiN this could also be a good place to use go:embed (then decode on init() possibly)

(I’m curious: how did you find this thread @finnbear?)

finnbear · 2022-05-10T07:21:48Z

this could also be a good place to use go:embed (then decode on init() possibly)

True! The downside here is that you would be including the entire list, when only a subset is relevant to goaway. A build step/code generator is more work, but could avoid wasting space in the compiled binary by filtering in advance.

(I’m curious: how did you find this thread @finnbear?)

I check in on this repository every once in a while, as it was and is a great source of inspiration for my profanity filters 😃

quackduck · 2022-05-10T13:32:35Z

True! The downside here is that you would be including the entire list, when only a subset is relevant to goaway.

We could trim the file once as needed

TwiN added the bug Something isn't working label Apr 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False positives: "can use", "via ssh" #28

False positives: "can use", "via ssh" #28

quackduck commented Apr 17, 2022

TwiN commented Apr 17, 2022

quackduck commented Apr 21, 2022

TwiN commented Apr 22, 2022

finnbear commented May 10, 2022 •

edited

Loading

quackduck commented May 10, 2022 •

edited

Loading

finnbear commented May 10, 2022

quackduck commented May 10, 2022

False positives: "can use", "via ssh" #28

False positives: "can use", "via ssh" #28

Comments

quackduck commented Apr 17, 2022

TwiN commented Apr 17, 2022

quackduck commented Apr 21, 2022

TwiN commented Apr 22, 2022

finnbear commented May 10, 2022 • edited Loading

quackduck commented May 10, 2022 • edited Loading

finnbear commented May 10, 2022

quackduck commented May 10, 2022

finnbear commented May 10, 2022 •

edited

Loading

quackduck commented May 10, 2022 •

edited

Loading