Data priority, incremental training? #22

JanCizmar · 2022-10-29T06:27:54Z

Hi there!

I would like to use the data currently provided in data-index.json, but at the same time, I would like to use my custom data. Can I tell the script to generate a model considering my custom data is more relevant / has a bigger priority?
Let's say I have one large dataset I am using all the time, and then I have multiple smaller datasets which I would like to train different models for each. Is something like an incremental build possible, so I would reuse some previous output and just "append" my custom data to save some training time and resources?

Thanks!

PJ-Finlay · 2022-11-04T13:21:11Z

There's no direct support for this but you can accomplish this by modifying argostrain/train.py.

I would add input("Downloaded Argos Data") after the data has been downloaded here and then append your custom data to run/source and run/target.

You could also train one base model and then fine tune it using custom data. However, this will also require using custom code.

I want to improve using custom data and fine tuning so if anyone has suggestions or pull requests they're appreciated.

martin-leoorg · 2023-07-27T07:18:38Z

Would incremental training also be possible with the suggestions from libretranslate? I think the base models that are available are quite good already, but having the feedback from libretranslate incorporated might make corner cases even better - this might depend on the actual use case (e.g. a medical use case might need a different fine-tuning than a scuba-diving one, to pick random examples).

Having a possibility to quickly improve the base model without having to use a high-power machine for training the complete model again with 99.9% same input data would be great!

PJ-Finlay added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data priority, incremental training? #22

Data priority, incremental training? #22

JanCizmar commented Oct 29, 2022

PJ-Finlay commented Nov 4, 2022

martin-leoorg commented Jul 27, 2023

Data priority, incremental training? #22

Data priority, incremental training? #22

Comments

JanCizmar commented Oct 29, 2022

PJ-Finlay commented Nov 4, 2022

martin-leoorg commented Jul 27, 2023