Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for vector store features #5

Open
amotl opened this issue Dec 11, 2023 · 3 comments
Open

Add support for vector store features #5

amotl opened this issue Dec 11, 2023 · 3 comments

Comments

@amotl
Copy link
Contributor

amotl commented Dec 11, 2023

@pnadolny13 was so kind to suggest how vector store support could be added reasonably to this target at MeltanoLabs/target-pinecone#20. Thank you.

Maybe you add a config like a type_override where the user can configure columns matching a particular name i.e. embeddings to use the vector data type instead of whatever the target would have normally used as a data type (e.g. jsonb when using PostgreSQL).

@amotl
Copy link
Contributor Author

amotl commented Dec 12, 2023

Maybe you add a config like a type_override where the user can configure columns matching a particular name [...]

I like that idea very much. Is it something which already has been discussed within the Singer/Meltano communities? It feels like a generic feature which could be served by the Meltano layer in a similar spirit how it adds the SelectService on top of the baseline Singer implementation through its select: attribute in the Meltano project definition 1.

Thinking about it in that way, a generic type_override service would have the capability to provide type hints to the relevant element it is attached to (in this case, a database target), which could be considered when building the corresponding DDL statement?

Most probably I am overthinking this, or alternatively, there are already other mechanisms within Singer/Meltano for corresponding "dynamic type manipulations through declarative rules"? If you feel you could support us on this matter, please educate us by sharing corresponding documentation links we may have missed. On the other hand, if our proposal resonates with you, and would be worth exploring, please also let us know.

Footnotes

  1. We struggled a bit to discover the corresponding mechanisms around "stream/table selection" as a newcomer/developer, so we summarized our experience at https://github.com/crate-workbench/meltano-tap-cratedb/issues/2.

@amotl
Copy link
Contributor Author

amotl commented Dec 12, 2023

Oh, am just now discovering that you picked up that topic at MeltanoLabs/target-pinecone#20 (comment) already. Apologies for that, I am currently not receiving GitHub notifications via email.

@pnadolny13 said:

I'm curious if you have other ideas of how to let the target know that a property in the stream should be written as a vector column. I wonder if sending additional metadata with the json schema would be a possible implementation too.

@edgarrmondragon said (paraphrasing):

We probably need to enrich the SQL type metadata support within the SDK, so I've created meltano/sdk#2102.

@edgarrmondragon also said:

I would like not to only declare the type but also any arbitrary parameteres for it, like length, dimension, etc.:

# catalog metadata in Meltano syntax
schema:
  my_stream:
    my_vector:
      sqlalchemy_type:
        (): "pgvector.sqlalchemy.Vector"
        dim: 3

I can't express how much that resonates with me. Offering my support would be audacious as I'd probably need to previously level up my knowledge about Singer schema/metadata details, but still, please let me know if you need help somewhere, be it just for testing and such.

@amotl
Copy link
Contributor Author

amotl commented Dec 21, 2023

GH-14 adds an improved variant of the FloatVector SQLAlchemy type implementation for CrateDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant