Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch/stored fields #333

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Patch/stored fields #333

wants to merge 5 commits into from

Conversation

BeckySharp
Copy link
Contributor

By mixing the metadata stored fields (e.g., filename) and the sentence stored fields (e.g., raw, etc) there was an issue when the Mention tried to populate itself to the VerboseLevels.all level. Basically, the mention would try to populate the metadata stored fields, but the lucene doc for the sentence didn't have that info (bc it was in a diff parent-level document since it's metadata). This resulted in an exception.

This patch fixes that by separating the two and using them in the right places.

@BeckySharp
Copy link
Contributor Author

let me fix the failing tests and reopen

@BeckySharp BeckySharp closed this Aug 23, 2021
added helper util to make it easier and to try to centralize where this happens
@BeckySharp BeckySharp reopened this Aug 23, 2021
@BeckySharp
Copy link
Contributor Author

fixed the tests, I had failed to update usage of sentenceStoredFields in the tests

@codecov
Copy link

codecov bot commented Aug 23, 2021

Codecov Report

Merging #333 (e75786a) into master (f399036) will increase coverage by 0.02%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #333      +/-   ##
==========================================
+ Coverage   75.81%   75.84%   +0.02%     
==========================================
  Files          94       94              
  Lines        4337     4342       +5     
  Branches      320      324       +4     
==========================================
+ Hits         3288     3293       +5     
  Misses       1049     1049              
Impacted Files Coverage Δ
...ain/scala/ai/lum/odinson/utils/IndexSettings.scala 92.30% <87.50%> (+2.30%) ⬆️
...e/src/main/scala/ai/lum/odinson/DataGatherer.scala 68.42% <100.00%> (ø)
...main/scala/ai/lum/odinson/OdinsonIndexWriter.scala 91.89% <100.00%> (+0.05%) ⬆️
...a/ai/lum/odinson/utils/TestUtils/OdinsonTest.scala 99.37% <100.00%> (+<0.01%) ⬆️
...in/scala/ai/lum/odinson/extra/IndexDocuments.scala 82.35% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f399036...e75786a. Read the comment docs.

@myedibleenso
Copy link
Member

It seems like this change will further complicate the config and quietly break old configs by breaking stored fields into two categories. Can we not simply ignore stored fields that are missing from the sentence when making mentions?

@BeckySharp
Copy link
Contributor Author

It seems like this change will further complicate the config and quietly break old configs by breaking stored fields into two categories.

I def agree with this point

Can we not simply ignore stored fields that are missing from the sentence when making mentions?

It's crashing where we're getting the fields from the index -- which we did that way as an optimization:
https://github.com/lum-ai/odinson/blob/master/core/src/main/scala/ai/lum/odinson/lucene/analysis/TokenStreamUtils.scala#L20

basically, instead of asking each time what fields a doc has, or else getting all the fields, it gets a field that "should" be there. We can do this diff, but we'd change something here, not in the Mention

@marcovzla
Copy link
Contributor

I think the changes to the config provide extra flexibility and may be a good idea to keep them.
But it is true that these changes may caught some people off guard.
Maybe just incrementing the version and being explicit about this in the docs would be enough?

@myedibleenso myedibleenso force-pushed the master branch 6 times, most recently from 611d79b to 1b8a2ca Compare July 5, 2023 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants