Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to generate summaries of each note and index them. #43

Merged
merged 6 commits into from
Jul 14, 2024

Conversation

justyns
Copy link
Owner

@justyns justyns commented Jul 13, 2024

Extension to #34

I'm not sure about this yet.

The general idea is to generate a short one-paragraph summary of each note, generate embeddings for it, and index it. This is in addition to the recently added embeddings index that is per paragraph.

So far I'm testing out the phi-3 model with ollama and some summaries are okay, but a lot include random hallucinations.

I'm probably going to add this to the AI: Search page and merge it as an experimental feature for now, but the prompt will definitely need some tweaking, and will need to test other local models.

edit:

I'm merging this in with the setting off by default (like the normal embeddings). I had much better luck using gemma2 as the model to generate summaries. It does make indexing take quite a bit longer though when using a locally hosted model.

I also added some in-memory caching so that repeatedly editing a page doesn't cause the same thing to be regenerating over and over.

@justyns justyns marked this pull request as ready for review July 14, 2024 09:07
@justyns justyns merged commit 336632e into main Jul 14, 2024
3 checks passed
@justyns justyns deleted the note-excerpts branch July 14, 2024 09:09
@zefhemel
Copy link
Contributor

I'd be interested in your experience using various local models. I haven't had time for this myself yet, but I had some hopes for phi-3 because it seemed quite small yet well performing. Perhaps at some point you can document this on the plug's website as well? Would be good to give people some guidance if you have it.

@justyns
Copy link
Owner Author

justyns commented Jul 15, 2024

for sure! I don't have a ton of experience with local models either, but I'm hoping others will eventually chime in too.

I was hoping to use phi-3 too, but it would generate weird stuff sometimes.

As an example, here's a note I was testing with:

Lunar is a cat. He’s only 2 years old, and enjoys walking with his leash.
Lunar is a fluffy kitty, and we love him very much.
He recently got a new collar that he is really proud of. It has fish on it.

gemma2 generates:

Lunar is a two-year-old, fluffy cat who enjoys walking on a leash and has a new collar with fish on it.

whereas phi-3 returns this:

Lunar is a well-loved, fluffy two-year-donkey kitty who enjoys leash walking adventures with his human companions. Known for their bond and shared activities like strolling outside safely restrained by a harness rather than traditional pet collars, these feline friends bring joy to those they touch in the small community where Lunar resides.

Note the "two-year-donkey" and extra descriptions not in the original note. I had some other test notes that added random things like that too. They're mostly passable, and I suspect could be better with better prompting with some examples, but gemma2 was nicer out of the box.

@justyns
Copy link
Owner Author

justyns commented Jul 15, 2024

I haven't figured out the best way to do it yet, but I'm wanting to set up some sort of benchmark for local models related to silverbullet-ai or at least related to notes in general.

Nothing too complex, but something easy enough that I can plug a bunch of models in and then compare the results of different commands against each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants