Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide starts_with endpoint/database method #348

Open
jsstevenson opened this issue Jun 5, 2024 · 2 comments
Open

Provide starts_with endpoint/database method #348

jsstevenson opened this issue Jun 5, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request priority:medium Medium priority

Comments

@jsstevenson
Copy link
Member

@katiestahl has been working on setting up autocomplete for VarCat for all of the normalizers. Broadly, we'd like something where the input is whatever the user has typed so far, and the output is a list of objects with

  1. the completed term
  2. what kind of entity the term is (e.g. symbol, alias, etc)
  3. the concept ID for the normalized concept that the completed term maps to
  4. (optionally) some sort of human-readable name for the normalized concept, e.g. the gene symbol

We could set this up pretty easily for the PostgreSQL backend with something like an ILIKE %TERM statement, but we aren't running any PostgreSQL in production so that doesn't help our immediate problem.

For DynamoDB, it's a little more complicated, and involves some combination of indexes/superfluous columns/a reworked schema. The best that I've come up with so far is a Global Secondary Index where the hash key is the "item_type" column and the sort key is the "label_and_type" column. This lets you run queries like diseases.query(IndexName="CompletionIndex", KeyConditionExpression=Key("item_type").eq("alias") & Key("label_and_type").begins_with("braf")).

^^ Note that this forces you to commit to a specific item type. If you wanted to get completions for ALL item types in one query, you'd need to create another index where the hash key is some sort of dummy column with the same value every time. I.e. diseases.query(IndexName="ItemNeutralCompletionIndex", KeyConditionExpression=Key("dummy").eq("dummy") & Key("label_and_type").begins_with("braf"))

@jsstevenson jsstevenson added enhancement New feature or request priority:medium Medium priority labels Jun 5, 2024
@katiestahl
Copy link

if there's any way to make this like a "contains", that would be the most ideal, since most use-cases for this will actually be where what the user is searching could be in the middle of the term

@jsstevenson
Copy link
Member Author

jsstevenson commented Jun 7, 2024

Yeah if you need contains or fzf, you need an indexer service like elasticsearch (or a different DB backend). In that event, it might not be necessary or possible to implement this within the normalizer code bases themselves -- maybe it could live in the API infrastructure repos or in a standalone repo

In light of that, maybe this should be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants