Dataset-specific metrics #21

evanmiltenburg · 2021-02-01T13:33:55Z

I have a couple of metrics in mind that are dataset-specific. For example:

When evaluating global recall (i.e. how much of the vocabulary from the intersection of the training and test set is being produced? And how is this recall influenced by training set frequency? Does the model only produce words that occur frequently in the training data, or also less-frequent terms?
For some of the special test sets, we want to run some evaluations that don't make sense for other datasets. (Can't give any details right now.)

How should I go about this?

E.g. for the global recall metric I could preprocess the training data, and if the references have an identifier, I can use that to load the relevant data. Is that the best solution?

tuetschek · 2021-04-03T10:53:14Z

The newly added Questeval now uses task-specific models (#40 ), where task is specified in the system outputs file (to be set by default using a global config file, see #43 ). Potentially the same approach could be used here?

sebastianGehrmann · 2021-12-16T18:11:51Z

As just discussed in the larger group, we need the following:

A way to assign metrics to task_types which should (not) be run
A way to assign multiple configurations of metrics to a task type, for example a neural metric based on TWO different neural models (BERTScore with mBERT AND RoBERTa, or BLEURT and BLEURT-20)

danieldeutsch · 2022-02-14T19:39:16Z

Perhaps we could define a schema for an AllenNLP-style jsonnet file that specifies the task or metrics to be run:

{
  "input_file": "/path/to/input.txt",
  "output_file": "/path/to/output.json",
  "metrics": [
    {
      "name": "bertscore",
      "model": "mbert",
      "output_key": "bertscore_mbert"
    },
    {
       "name": "bertscore",
       "model": "roberta",
       "output_key": "bertscore_roberta"
    }
  ]
}

We could potentially define task-specific suites of metrics to run, which would run a pre-defined set of metrics:

{
  "input_file": "/path/to/input.txt",
  "output_file": "/path/to/output.json",
  "task_suite": "summarization"
}

tuetschek · 2022-02-15T10:31:27Z

@sebastianGehrmann you're assuming a global config for GEM tasks, right?

@danieldeutsch (if it's a global config, then) maybe it would make sense to integrate this in gem_metrics.config.py, which already has a lot of configuration?

sebastianGehrmann added the enhancement New feature or request label Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset-specific metrics #21

Dataset-specific metrics #21

evanmiltenburg commented Feb 1, 2021

tuetschek commented Apr 3, 2021

sebastianGehrmann commented Dec 16, 2021

danieldeutsch commented Feb 14, 2022

tuetschek commented Feb 15, 2022

Dataset-specific metrics #21

Dataset-specific metrics #21

Comments

evanmiltenburg commented Feb 1, 2021

tuetschek commented Apr 3, 2021

sebastianGehrmann commented Dec 16, 2021

danieldeutsch commented Feb 14, 2022

tuetschek commented Feb 15, 2022