You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When batching data, Saber truncates / right-pads each sequence to match a length of saber.constants.MAX_SENT_LEN.
Truncating sequences should only happen on the train set, ensuring that we don't drop examples in the evaluation partitions (dataset_folder/valid.* and dataset_folder/test.*)
Furthermore, a user should be able to specify some kind of percentile (e.g. 0.99), which would set the max sequence length to whatever length truncates only 1% of all training examples. This would be a principled way to choose the value. This could lead to big reductions in training time if there were a handful of very long sentences.
The text was updated successfully, but these errors were encountered:
When batching data, Saber truncates / right-pads each sequence to match a length of
saber.constants.MAX_SENT_LEN
.Truncating sequences should only happen on the train set, ensuring that we don't drop examples in the evaluation partitions (
dataset_folder/valid.*
anddataset_folder/test.*
)Furthermore, a user should be able to specify some kind of percentile (e.g.
0.99
), which would set the max sequence length to whatever length truncates only 1% of all training examples. This would be a principled way to choose the value. This could lead to big reductions in training time if there were a handful of very long sentences.The text was updated successfully, but these errors were encountered: