Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用自定义数据微调GeneralRecognitionV2_PPLCNetV2_base.yaml #3297

Open
chen1234520 opened this issue Nov 14, 2024 · 1 comment
Open
Assignees

Comments

@chen1234520
Copy link

chen1234520 commented Nov 14, 2024

使用自定的数据微调GeneralRecognitionV2_PPLCNetV2_base.yaml特征提取模型,里面有两个参数
relabel: True
sampler:
id_list: [50030, 80700, 92019, 96015] # be careful when set relabel=True
使用自定义的数据集,类别有17000类,请问上面两个参数是什么含义?需要根据自己的数据集类别数量更改吗?另外请问哪里能找到参考文档。https://paddleclas.readthedocs.io/zh-cn/latest/这个地址上看不到所有的参数信息。

@TingquanGao TingquanGao self-assigned this Nov 14, 2024
@TingquanGao
Copy link
Collaborator

建议优先使用 PaddleX 进行模型开发,PaddleX 大幅简化了模型的训练参数配置,相关文档。使用 PaddleClas 的话,相关文档。你这个需求的话,建议用大一点的模型:GeneralRecognitionV2_CLIP_vit_base

关于两个参数:

  1. relabel :对class id从0重新开始排序,适用于当id不是从0开始或是非连续的情况,通常置为False即可,只有当起始id远大于0,或是严重的不连续时建议设置为True;
  2. id_list:只在PKSampler中会用到,是用来调整数据解决样本不均衡问题的,就是将id_list[0]~id_list[1]之间的类别的数据重复采样ratio[0]倍。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants