Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

问一个小白的问题,我就是想让一些句子与另一些句子向量更接近,还有一些句子向量更远,是组织成有query,pos和neg的训练数据,微调就可以吗? #1224

Open
czhcc opened this issue Nov 14, 2024 · 2 comments

Comments

@czhcc
Copy link

czhcc commented Nov 14, 2024

问一个小白的问题,我就是想让一些句子与另一些句子向量更接近,还有一些句子向量更远,是组织成有query,pos和neg的训练数据,微调就可以吗?

训练数据中不用再加上pos_scores,neg_scores,prompt和type这些吧?

微调时执行的命令参考
https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/embedder#2-bge-m3
吗?

@xushan116
Copy link

同问

@hanhainebula
Copy link
Collaborator

hanhainebula commented Nov 16, 2024

你好,@czhcc。pos_scores 和 neg_scores 在开启 knowledge_distillation 进行知识蒸馏时才会用到。prompt 表示训练时在 query 侧添加的指令。type 则是在训练数据中有不同任务的训练数据时会用于区分不同的任务类型。

针对这里给出的问题,这些都是可选项,可以不加。如果只微调类似于 bge-large-zh-v1.5 的模型,微调命令参考 https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/embedder#1-standard-model 就可以,如果想微调 bge-m3,则可以参考这里的命令:https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune/embedder#2-bge-m3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants