You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm trying to compute the bleurt metric on a sample submission for the GEM benchmark (attached). However, running the following command throws a Blas GEMM launch failed error:
[W 220316 15:40:50 texts:191] Model parameter count not present in the submission file.
[I 220316 15:40:50 texts:32] Loading predictions for SeqPlan/mlsum_de_validation
[I 220316 15:40:50 texts:32] Loading predictions for SeqPlan/mlsum_de_test
[I 220316 15:40:50 texts:32] Loading predictions for SeqPlan/mlsum_de_challenge_test_covid
[W 220316 15:40:50 data:54] /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_validation.json not found -- downloading https://huggingface.co/datasets/GEM/references/resolve/main/mlsum_de_validation.json. This may take a few minutes.
[W 220316 15:40:50 __init__:258] Could not format references for mlsum_de_validation: HTTP Error 404: Not Found
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/gem_metrics/__init__.py", line 251, in load_references
dataset_file = ensure_download(
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/gem_metrics/data.py", line 76, in ensure_download
urllib.request.urlretrieve(
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
[W 220316 15:40:50 data:54] /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_validation.json not found -- downloading https://huggingface.co/datasets/GEM/references/resolve/main/mlsum_de_validation.json. This may take a few minutes.
[I 220316 15:40:50 __init__:275] mlsum_de_validation does not have source associated.
[I 220316 15:40:50 texts:32] Loading references for /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_test.json
[I 220316 15:40:50 texts:32] Loading sources for /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_test.json
[I 220316 15:40:50 __init__:275] mlsum_de_test does not have source associated.
[I 220316 15:40:50 texts:32] Loading references for /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_challenge_test_covid.json
[I 220316 15:40:51 texts:32] Loading sources for /home/lewis/miniconda3/envs/gem-metrics/lib/python3.8/site-packages/data/references/mlsum_de_challenge_test_covid.json
[I 220316 15:40:51 __init__:275] mlsum_de_challenge_test_covid does not have source associated.
[I 220316 15:40:51 __init__:385] Found parent ID in mlsum_de_challenge_test_covid but no corresponding parent dataset
[I 220316 15:40:51 __init__:219] Computing metrics for mlsum_de_validation...
[I 220316 15:40:51 __init__:219] Computing metrics for mlsum_de_test...
[I 220316 15:40:51 __init__:219] Computing metrics for mlsum_de_challenge_test_covid...
[I 220316 15:40:51 __init__:152] Computing BLEURT for SeqPlan/mlsum_de_test...
[I 220316 15:40:51 __init__:152] Computing BLEURT for SeqPlan/mlsum_de_challenge_test_covid...
INFO:tensorflow:Reading checkpoint ../bleurt-base-128.
I0316 15:40:58.413195 140619271960384 score.py:161] Reading checkpoint ../bleurt-base-128.
INFO:tensorflow:Config file found, reading.
I0316 15:40:58.413323 140619271960384 checkpoint.py:92] Config file found, reading.
INFO:tensorflow:Will load checkpoint bert_custom
I0316 15:40:58.413443 140619271960384 checkpoint.py:96] Will load checkpoint bert_custom
INFO:tensorflow:Loads full paths and checks that files exists.
I0316 15:40:58.413485 140619271960384 checkpoint.py:98] Loads full paths and checks that files exists.
INFO:tensorflow:... name:bert_custom
I0316 15:40:58.413520 140619271960384 checkpoint.py:102] ... name:bert_custom
INFO:tensorflow:... vocab_file:vocab.txt
I0316 15:40:58.413564 140619271960384 checkpoint.py:102] ... vocab_file:vocab.txt
INFO:tensorflow:... bert_config_file:bert_config.json
I0316 15:40:58.413612 140619271960384 checkpoint.py:102] ... bert_config_file:bert_config.json
INFO:tensorflow:... do_lower_case:True
I0316 15:40:58.413659 140619271960384 checkpoint.py:102] ... do_lower_case:True
INFO:tensorflow:... max_seq_length:128
I0316 15:40:58.413696 140619271960384 checkpoint.py:102] ... max_seq_length:128
INFO:tensorflow:Creating BLEURT scorer.
I0316 15:40:58.413734 140619271960384 score.py:168] Creating BLEURT scorer.
INFO:tensorflow:Creating WordPiece tokenizer.
I0316 15:40:58.413768 140619271960384 tokenizers.py:40] Creating WordPiece tokenizer.
INFO:tensorflow:WordPiece tokenizer instantiated.
I0316 15:40:58.478093 140619271960384 tokenizers.py:45] WordPiece tokenizer instantiated.
INFO:tensorflow:Creating Eager Mode predictor.
I0316 15:40:58.478170 140619271960384 score.py:57] Creating Eager Mode predictor.
INFO:tensorflow:Loading model.
I0316 15:40:58.478209 140619271960384 score.py:62] Loading model.
2022-03-16 15:40:58.843356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-03-16 15:40:58.882447: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:58.882741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: NVIDIA TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2022-03-16 15:40:58.882956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-03-16 15:40:58.884625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-03-16 15:40:58.886231: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-03-16 15:40:58.886503: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-03-16 15:40:58.887874: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-03-16 15:40:58.888352: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-03-16 15:40:58.890600: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-03-16 15:40:58.890693: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:58.890939: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:58.891113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2022-03-16 15:40:58.891322: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-03-16 15:40:58.896174: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3792765000 Hz
2022-03-16 15:40:58.897041: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fe2d0000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-03-16 15:40:58.897061: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2022-03-16 15:40:59.000449: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:59.000935: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6f9fd30 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-03-16 15:40:59.000954: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA TITAN RTX, Compute Capability 7.5
2022-03-16 15:40:59.001150: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:59.001417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: NVIDIA TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2022-03-16 15:40:59.001450: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-03-16 15:40:59.001463: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-03-16 15:40:59.001478: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-03-16 15:40:59.001490: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-03-16 15:40:59.001500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-03-16 15:40:59.001513: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-03-16 15:40:59.001524: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-03-16 15:40:59.001609: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:59.001901: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:59.002133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2022-03-16 15:40:59.002165: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-03-16 15:40:59.002925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-16 15:40:59.002938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2022-03-16 15:40:59.002944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2022-03-16 15:40:59.003061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:59.003356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:40:59.003612: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22611 MB memory) -> physical GPU (device: 0, name: NVIDIA TITAN RTX, pci bus id: 0000:08:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0316 15:40:59.450889 140619271960384 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:BLEURT initialized.
I0316 15:41:00.563959 140619271960384 score.py:174] BLEURT initialized.
INFO:tensorflow:Computing BLEURT scores...
I0316 15:41:00.564104 140619271960384 score_files.py:133] Computing BLEURT scores...
INFO:tensorflow:Reading checkpoint ../bleurt-base-128.
I0316 15:41:00.649482 139858446710592 score.py:161] Reading checkpoint ../bleurt-base-128.
INFO:tensorflow:Config file found, reading.
I0316 15:41:00.649625 139858446710592 checkpoint.py:92] Config file found, reading.
INFO:tensorflow:Will load checkpoint bert_custom
I0316 15:41:00.649743 139858446710592 checkpoint.py:96] Will load checkpoint bert_custom
INFO:tensorflow:Loads full paths and checks that files exists.
I0316 15:41:00.649785 139858446710592 checkpoint.py:98] Loads full paths and checks that files exists.
INFO:tensorflow:... name:bert_custom
I0316 15:41:00.649821 139858446710592 checkpoint.py:102] ... name:bert_custom
INFO:tensorflow:... vocab_file:vocab.txt
I0316 15:41:00.649855 139858446710592 checkpoint.py:102] ... vocab_file:vocab.txt
INFO:tensorflow:... bert_config_file:bert_config.json
I0316 15:41:00.649900 139858446710592 checkpoint.py:102] ... bert_config_file:bert_config.json
INFO:tensorflow:... do_lower_case:True
I0316 15:41:00.649946 139858446710592 checkpoint.py:102] ... do_lower_case:True
INFO:tensorflow:... max_seq_length:128
I0316 15:41:00.649982 139858446710592 checkpoint.py:102] ... max_seq_length:128
INFO:tensorflow:Creating BLEURT scorer.
I0316 15:41:00.650019 139858446710592 score.py:168] Creating BLEURT scorer.
INFO:tensorflow:Creating WordPiece tokenizer.
I0316 15:41:00.650053 139858446710592 tokenizers.py:40] Creating WordPiece tokenizer.
INFO:tensorflow:WordPiece tokenizer instantiated.
I0316 15:41:00.714641 139858446710592 tokenizers.py:45] WordPiece tokenizer instantiated.
INFO:tensorflow:Creating Eager Mode predictor.
I0316 15:41:00.714712 139858446710592 score.py:57] Creating Eager Mode predictor.
INFO:tensorflow:Loading model.
I0316 15:41:00.714751 139858446710592 score.py:62] Loading model.
2022-03-16 15:41:01.075217: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-03-16 15:41:01.099614: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.099885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: NVIDIA TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2022-03-16 15:41:01.100065: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-03-16 15:41:01.101612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-03-16 15:41:01.103159: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-03-16 15:41:01.103421: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-03-16 15:41:01.104980: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-03-16 15:41:01.105824: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-03-16 15:41:01.107973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-03-16 15:41:01.108065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.108303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.108475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2022-03-16 15:41:01.108682: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-03-16 15:41:01.113914: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3792765000 Hz
2022-03-16 15:41:01.114642: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f31a4000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-03-16 15:41:01.114660: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2022-03-16 15:41:01.170013: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.170263: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x73c46a0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-03-16 15:41:01.170284: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA TITAN RTX, Compute Capability 7.5
2022-03-16 15:41:01.170470: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.170703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: NVIDIA TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2022-03-16 15:41:01.170738: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-03-16 15:41:01.170756: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-03-16 15:41:01.170774: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2022-03-16 15:41:01.170790: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2022-03-16 15:41:01.170805: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2022-03-16 15:41:01.170816: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2022-03-16 15:41:01.170828: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-03-16 15:41:01.170897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.171147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.171339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2022-03-16 15:41:01.171368: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2022-03-16 15:41:01.172099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-16 15:41:01.172110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2022-03-16 15:41:01.172116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2022-03-16 15:41:01.172219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.172492: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-16 15:41:01.172725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 923 MB memory) -> physical GPU (device: 0, name: NVIDIA TITAN RTX, pci bus id: 0000:08:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0316 15:41:01.458086 139858446710592 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py:1817: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:BLEURT initialized.
I0316 15:41:02.578015 139858446710592 score.py:174] BLEURT initialized.
INFO:tensorflow:Computing BLEURT scores...
I0316 15:41:02.578154 139858446710592 score_files.py:133] Computing BLEURT scores...
2022-03-16 15:41:10.040301: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-03-16 15:41:13.784464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2022-03-16 15:41:13.999250: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2022-03-16 15:41:14.004738: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2022-03-16 15:41:14.006644: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2022-03-16 15:41:14.011551: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2022-03-16 15:41:14.011578: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/app/bleurt/bleurt/score_files.py", line 168, in <module>
tf.compat.v1.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/app/bleurt/bleurt/score_files.py", line 164, in main
score_files(sentence_pairs_generator, FLAGS.bleurt_checkpoint)
File "/app/bleurt/bleurt/score_files.py", line 138, in score_files
_consume_buffer()
File "/app/bleurt/bleurt/score_files.py", line 128, in _consume_buffer
batch_size=FLAGS.bleurt_batch_size)
File "/app/bleurt/bleurt/score.py", line 215, in score
predict_out = self._predictor.predict(tf_input)
File "/app/bleurt/bleurt/score.py", line 71, in predict
input_dict["segment_ids"]))["predictions"].numpy()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1605, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1645, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(8192, 2), b.shape=(2, 768), m=8192, n=768, k=2
[[node bert/embeddings/MatMul (defined at app/bleurt/bleurt/score.py:63) ]] [Op:__inference_pruned_6660]
Function call stack:
pruned
As far as I can tell, this error stems from a CUDA OOM error. I'm running on an NVIDIA TITAN RTX with 23.65GiB of memory, so this is quite surprising. One possibility is that the submission file has very long inputs, but these are from one of the baseline models and would presumably be similar for other GEM participants.
For context, I installed the library following the README instructions for "heavy" metrics, plus some additional Docker configuration (login & installing NVIDIA Container toolkit).
Hello, I'm trying to compute the
bleurt
metric on a sample submission for the GEM benchmark (attached). However, running the following command throws aBlas GEMM launch failed
error:Stack trace
As far as I can tell, this error stems from a CUDA OOM error. I'm running on an NVIDIA TITAN RTX with 23.65GiB of memory, so this is quite surprising. One possibility is that the submission file has very long inputs, but these are from one of the baseline models and would presumably be similar for other GEM participants.
For context, I installed the library following the README instructions for "heavy" metrics, plus some additional Docker configuration (login & installing NVIDIA Container toolkit).
cc @sebastianGehrmann @danieldeutsch
sample-submission.json.zip
The text was updated successfully, but these errors were encountered: