error while testing #11

SeekPoint · 2019-06-08T17:05:18Z

ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$ ll result/all/001/
total 142260
drwxrwxr-x 2 ub16c9 ub16c9 4096 Jun 9 00:09 ./
drwxrwxr-x 3 ub16c9 ub16c9 4096 Dec 7 2018 ../
-rw-rw-r-- 1 ub16c9 ub16c9 73 Jun 8 23:55 checkpoint
-rw-rw-r-- 1 ub16c9 ub16c9 5600172 Jun 8 23:55 dev-out-1.txt
-rw-rw-r-- 1 ub16c9 ub16c9 5595476 Jun 9 00:01 dev-out-2.txt
-rw-rw-r-- 1 ub16c9 ub16c9 17367151 Dec 2 2018 events.out.tfevents.1543651522.gpuws32g
-rw-rw-r-- 1 ub16c9 ub16c9 17376875 Jun 9 00:01 events.out.tfevents.1560008765.ub16c9-gpu
-rw-rw-r-- 1 ub16c9 ub16c9 16359526 Jun 9 00:06 events.out.tfevents.1560009996.ub16c9-gpu
-rw-rw-r-- 1 ub16c9 ub16c9 16359526 Jun 9 00:09 events.out.tfevents.1560010197.ub16c9-gpu
-rw-rw-r-- 1 ub16c9 ub16c9 29718437 Jun 9 00:09 graph.pbtxt
-rw-rw-r-- 1 ub16c9 ub16c9 55084 Jun 9 00:09 info.log
-rw-rw-r-- 1 ub16c9 ub16c9 28294852 Jun 8 23:55 model.tf.data-00000-of-00001
-rw-rw-r-- 1 ub16c9 ub16c9 1644 Jun 8 23:55 model.tf.index
-rw-rw-r-- 1 ub16c9 ub16c9 8698157 Jun 8 23:55 model.tf.meta
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$ python3.6 train.py --train_dir dataset/pku/train --dev_dir dataset/pku/dev --maps_dir dataset/pku/train --model_dir result/all/001 --embed_dim 100 --embeddings data/embeddings/character.vec --lstm_dim 400 --input_dropout 0.85 --hidden_dropout 0.55 --middle_dropout 0.85 --word_dropout 0.85 --lr 0.001 --l2 0.00001 --batch_size 128 --nonlinearity relu --char_dim 0 --char_tok_dim 0 --shape_dim 50 --layers "{'conv1': {'dilation': 1, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': false}, 'conv2': {'dilation': 2, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': false}, 'conv3': {'dilation': 1, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': true}}" --model cnn --clip_norm 5 --regularize_drop_penalty 0.00001 --projection False --margin 0.0 --loss mean --epsilon 1e-6 --beta2 0.9 --char_model lstm --block_repeats 1 --share_repeats True --max_epochs 100 --viterbi
CUDA_VISIBLE_DEVICES= 0
train.py --train_dir dataset/pku/train --dev_dir dataset/pku/dev --maps_dir dataset/pku/train --model_dir result/all/001 --embed_dim 100 --embeddings data/embeddings/character.vec --lstm_dim 400 --input_dropout 0.85 --hidden_dropout 0.55 --middle_dropout 0.85 --word_dropout 0.85 --lr 0.001 --l2 0.00001 --batch_size 128 --nonlinearity relu --char_dim 0 --char_tok_dim 0 --shape_dim 50 --layers {'conv1': {'dilation': 1, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': false}, 'conv2': {'dilation': 2, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': false}, 'conv3': {'dilation': 1, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': true}} --model cnn --clip_norm 5 --regularize_drop_penalty 0.00001 --projection False --margin 0.0 --loss mean --epsilon 1e-6 --beta2 0.9 --char_model lstm --block_repeats 1 --share_repeats True --max_epochs 100 --viterbi

num classes: 4
dataset/pku/train/sizes.txt
num train examples: 129916
num train tokens: 1609674
dataset/pku/dev
num dev examples: 0
num dev tokens: 0
{'': 0}
Loaded 4654/4679 embeddings (99.47% coverage)
[<tf.Tensor 'forward/embedding_lookup:0' shape=(?, ?, 100) dtype=float32>, <tf.Tensor 'forward/embedding_lookup_1:0' shape=(?, ?, 50) dtype=float32>]
Adding initial layer conv0: width: 3; filters: 400
input feats expanded drop (?, 1, ?, 150)
last out shape (?, 1, ?, 400)
last dims 400
Adding layer conv1: dilation: 1; width: 3; filters: 400; take: False
Adding layer conv2: dilation: 2; width: 3; filters: 400; take: False
Adding layer conv3: dilation: 1; width: 3; filters: 400; take: True
input feats expanded drop (?, 1, ?, 150)
last out shape (?, 1, ?, 400)
last dims 400
model vars: 16
<map object at 0x7f8dfcc56b70>
Total trainable parameters: 2183870
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From train.py:241: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
From train.py:241: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-06-09 00:48:29.377807: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-09 00:48:29.493681: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-09 00:48:29.494180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.27GiB
2019-06-09 00:48:29.494196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-06-09 00:48:29.707509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-09 00:48:29.707542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-06-09 00:48:29.707548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-06-09 00:48:29.707769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9939 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Running local_init_op.
Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Done running local_init_op.
INFO:tensorflow:Starting standard services.
Starting standard services.
INFO:tensorflow:Starting queue runners.
Starting queue runners.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [string_input_producer requires a non-null input tensor]
[[Node: input_producer_1/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer_1/Greater/_133, input_producer_1/Assert/Assert/data_0)]]
[[Node: input_producer_1/RandomShuffle/_143 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_input_producer_1/RandomShuffle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [string_input_producer requires a non-null input tensor]
[[Node: input_producer_1/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer_1/Greater/_133, input_producer_1/Assert/Assert/data_0)]]
[[Node: input_producer_1/RandomShuffle/_143 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_input_producer_1/RandomShuffle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Training on 129916 sentences (129916 examples)
Deserializing model: result/all/001/model.tf
INFO:tensorflow:Restoring parameters from result/all/001/model.tf
Restoring parameters from result/all/001/model.tf
2019-06-09 00:48:32.857055: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key transitions not found in checkpoint
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key transitions not found in checkpoint
[[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 990, in managed_session
yield sess
File "train.py", line 518, in main
saver.restore(sess, FLAGS.model_dir + "/model.tf")
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1802, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key transitions not found in checkpoint
[[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]

Caused by op 'save_1/RestoreV2', defined at:
File "train.py", line 620, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 235, in main
saver = tf.train.Saver(var_list=model_vars)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1338, in init
self.build()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1347, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1384, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key transitions not found in checkpoint
[[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 620, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 531, in main
logger.info("Best dev F1: %2.2f" % (best_score))
File "/usr/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 1000, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 828, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1244, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [string_input_producer requires a non-null input tensor]
[[Node: input_producer_1/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer_1/Greater/_133, input_producer_1/Assert/Assert/data_0)]]
[[Node: input_producer_1/RandomShuffle/_143 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_input_producer_1/RandomShuffle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error while testing #11

error while testing #11

SeekPoint commented Jun 8, 2019

error while testing #11

error while testing #11

Comments

SeekPoint commented Jun 8, 2019