Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy searching? #155

Open
thomthom opened this issue Jun 8, 2020 · 3 comments
Open

Fuzzy searching? #155

thomthom opened this issue Jun 8, 2020 · 3 comments

Comments

@thomthom
Copy link

thomthom commented Jun 8, 2020

I was wondering if there was any interesting in allowing fuzzy search?

Some of the background for this is that I mantain a C API where there's a lot of prefixes to keep things unique (product initials + "class" name + function name).

Right now it appear that the search works only on the start of each object being searched.

For example:
image

Compare to if I just type "editing":
image

I'm very fond of how Sublime Text (and many other editors like VSCode etc) let you search. In the example above, if I typed "tee" it would value the upper case letters in the symbols so that "TextEditingEvents" ranked high.

I have in the past experimented with fts_fuzzy_match for such functionality for some projects I'm working on. (https://github.com/forrestthewoods/lib_fts/) It's been working rather well.

More details on the logic here:
https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/

Any interest in this?

@mosra
Copy link
Owner

mosra commented Jun 8, 2020

See here for a WIP implementation of a similar thing by @sizmailov: #149

I wanted to implement something like this (in particular the {tee, texee, texede} -> TextEditingEvent variant) when doing the original search implementation but I put it aside because it wasn't strictly needed for the MVP. The search is implemented as a trie, so I'm not sure if the libs you linked would be of any use here, but I think I could still dig up the original implementation somewhere and finish it -- if my time allows, I guess you get the idea based on the frequency I reply on the issues here 😅

I mantain a C API where there's a lot of prefixes to keep things unique (product initials + "class" name + function name)

With the change I did for #127, I finally have my hands free to add some config option allowing this (a similar case is wanting to search without get_ / set_ prefixes).

@thomthom
Copy link
Author

thomthom commented Jun 9, 2020

See here for a WIP implementation of a similar thing by @sizmailov: #149

Oh, that's interesting. I'm subscribing to that thread. Even though that's not complete fuzzy search, extracting words by camelCase or under_score separation still helps.

The search is implemented as a trie, so I'm not sure if the libs you linked would be of any use here

I did give it a quick try a few weeks ago. But I got stomped trying to understand the structure of the search data. The article you mentioned explains a lot!

Hav­ing good de­bug vi­su­al­iza­tion is key to un­der­stand­ing the da­ta.

Indeed! Does m.css come with the tools to debug the search data?

if my time allows, I guess you get the idea based on the frequency I reply on the issues here

No worries, I fully understand the challenges of maintaining a project like this. Not expecting you to do anything. I started this thread because I had tried to provide a PR myself for such functionality, but unfortunatly I got rather lost in the data structure and how data was obtained.

With the change I did for #127, I finally have my hands free to add some config option allowing this (a similar case is wanting to search without get_ / set_ prefixes).

Yes, we have a number of Get/Set prefixed functions. (well, prefixed before the function name, but after the product and class prefixes. One thing I wanted to look into was sorting that still kept Get/Set functions next to each other without having to resort to documentation markup. I might revisit that later on.

@mosra
Copy link
Owner

mosra commented Jun 9, 2020

Does m.css come with the tools to debug the search data?

Yes, the documentation/test_doxygen/test_search.py can be used to visualize the search data the same way as was done in the article, including colors. I hope it still works properly, didn't touch the visualization code for over two years :)

One thing I wanted to look into was sorting that still kept Get/Set functions next to each other

Unless I misremember how the lookup and result population behaves, that'll happen automagically when the get/set prefixes get stripped. Or .. you mean in the doxygen-generated output? Disable the SORT_MEMBER_DOCS option, that one is enabled by default and absolutely useless in my opinion -- with it disabled, you'll get the functions ordered the same way as in the file (and there I assume you keep getter and setter pairs together).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants