You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to test several models on the GEM-benchmark metrics. I followed the tutorial provided both on GitHub and the official website and have been able to generate a submission file with generations and GEM-ID keys. However, when I attempt to generate output scores, I notice that I am missing several scores I wish to have.
For example, in the requirements.txt file, the package of rouge-score is included, however, the output scores do not contain any rouge metric. Furthermore, I attempted several times to generate output scores with --heavy-metric flag, however, this is always skipped. Regardless of whether I include the flag or leave it out, the same metrics are returned.
I attached an example of my output scores below:
An example of the generation is shown below here:
More information:
I cloned the repo in my google drive, cd'd in the file and pip installed both the normal requirements file as heavy requirements. I did this several times ensuring that everything was installed
I also tried generating the metrics by manually choosing the metrics with --metric-list, but that did not work either
I attempted to pip import gem_metrics, but this did not resolve my issues.
Could someone help uncover what I am doing wrong?
Kind regards
The text was updated successfully, but these errors were encountered:
Hi,
I have been trying to test several models on the GEM-benchmark metrics. I followed the tutorial provided both on GitHub and the official website and have been able to generate a submission file with generations and GEM-ID keys. However, when I attempt to generate output scores, I notice that I am missing several scores I wish to have.
For example, in the requirements.txt file, the package of rouge-score is included, however, the output scores do not contain any rouge metric. Furthermore, I attempted several times to generate output scores with --heavy-metric flag, however, this is always skipped. Regardless of whether I include the flag or leave it out, the same metrics are returned.
I attached an example of my output scores below:
An example of the generation is shown below here:
More information:
Could someone help uncover what I am doing wrong?
Kind regards
The text was updated successfully, but these errors were encountered: