Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error while testing #11

Open
SeekPoint opened this issue Jun 8, 2019 · 0 comments
Open

error while testing #11

SeekPoint opened this issue Jun 8, 2019 · 0 comments

Comments

@SeekPoint
Copy link

ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$ ll result/all/001/
total 142260
drwxrwxr-x 2 ub16c9 ub16c9 4096 Jun 9 00:09 ./
drwxrwxr-x 3 ub16c9 ub16c9 4096 Dec 7 2018 ../
-rw-rw-r-- 1 ub16c9 ub16c9 73 Jun 8 23:55 checkpoint
-rw-rw-r-- 1 ub16c9 ub16c9 5600172 Jun 8 23:55 dev-out-1.txt
-rw-rw-r-- 1 ub16c9 ub16c9 5595476 Jun 9 00:01 dev-out-2.txt
-rw-rw-r-- 1 ub16c9 ub16c9 17367151 Dec 2 2018 events.out.tfevents.1543651522.gpuws32g
-rw-rw-r-- 1 ub16c9 ub16c9 17376875 Jun 9 00:01 events.out.tfevents.1560008765.ub16c9-gpu
-rw-rw-r-- 1 ub16c9 ub16c9 16359526 Jun 9 00:06 events.out.tfevents.1560009996.ub16c9-gpu
-rw-rw-r-- 1 ub16c9 ub16c9 16359526 Jun 9 00:09 events.out.tfevents.1560010197.ub16c9-gpu
-rw-rw-r-- 1 ub16c9 ub16c9 29718437 Jun 9 00:09 graph.pbtxt
-rw-rw-r-- 1 ub16c9 ub16c9 55084 Jun 9 00:09 info.log
-rw-rw-r-- 1 ub16c9 ub16c9 28294852 Jun 8 23:55 model.tf.data-00000-of-00001
-rw-rw-r-- 1 ub16c9 ub16c9 1644 Jun 8 23:55 model.tf.index
-rw-rw-r-- 1 ub16c9 ub16c9 8698157 Jun 8 23:55 model.tf.meta
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$ python3.6 train.py --train_dir dataset/pku/train --dev_dir dataset/pku/dev --maps_dir dataset/pku/train --model_dir result/all/001 --embed_dim 100 --embeddings data/embeddings/character.vec --lstm_dim 400 --input_dropout 0.85 --hidden_dropout 0.55 --middle_dropout 0.85 --word_dropout 0.85 --lr 0.001 --l2 0.00001 --batch_size 128 --nonlinearity relu --char_dim 0 --char_tok_dim 0 --shape_dim 50 --layers "{'conv1': {'dilation': 1, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': false}, 'conv2': {'dilation': 2, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': false}, 'conv3': {'dilation': 1, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': true}}" --model cnn --clip_norm 5 --regularize_drop_penalty 0.00001 --projection False --margin 0.0 --loss mean --epsilon 1e-6 --beta2 0.9 --char_model lstm --block_repeats 1 --share_repeats True --max_epochs 100 --viterbi
CUDA_VISIBLE_DEVICES= 0
train.py --train_dir dataset/pku/train --dev_dir dataset/pku/dev --maps_dir dataset/pku/train --model_dir result/all/001 --embed_dim 100 --embeddings data/embeddings/character.vec --lstm_dim 400 --input_dropout 0.85 --hidden_dropout 0.55 --middle_dropout 0.85 --word_dropout 0.85 --lr 0.001 --l2 0.00001 --batch_size 128 --nonlinearity relu --char_dim 0 --char_tok_dim 0 --shape_dim 50 --layers {'conv1': {'dilation': 1, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': false}, 'conv2': {'dilation': 2, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': false}, 'conv3': {'dilation': 1, 'width': 3, 'filters': 400, 'initialization': 'identity', 'take': true}} --model cnn --clip_norm 5 --regularize_drop_penalty 0.00001 --projection False --margin 0.0 --loss mean --epsilon 1e-6 --beta2 0.9 --char_model lstm --block_repeats 1 --share_repeats True --max_epochs 100 --viterbi

num classes: 4
dataset/pku/train/sizes.txt
num train examples: 129916
num train tokens: 1609674
dataset/pku/dev
num dev examples: 0
num dev tokens: 0
{'': 0}
Loaded 4654/4679 embeddings (99.47% coverage)
[<tf.Tensor 'forward/embedding_lookup:0' shape=(?, ?, 100) dtype=float32>, <tf.Tensor 'forward/embedding_lookup_1:0' shape=(?, ?, 50) dtype=float32>]
Adding initial layer conv0: width: 3; filters: 400
input feats expanded drop (?, 1, ?, 150)
last out shape (?, 1, ?, 400)
last dims 400
Adding layer conv1: dilation: 1; width: 3; filters: 400; take: False
Adding layer conv2: dilation: 2; width: 3; filters: 400; take: False
Adding layer conv3: dilation: 1; width: 3; filters: 400; take: True
input feats expanded drop (?, 1, ?, 150)
last out shape (?, 1, ?, 400)
last dims 400
model vars: 16
<map object at 0x7f8dfcc56b70>
Total trainable parameters: 2183870
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From train.py:241: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
From train.py:241: Supervisor.init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-06-09 00:48:29.377807: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-09 00:48:29.493681: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-09 00:48:29.494180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.27GiB
2019-06-09 00:48:29.494196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-06-09 00:48:29.707509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-09 00:48:29.707542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-06-09 00:48:29.707548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-06-09 00:48:29.707769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9939 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Running local_init_op.
Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Done running local_init_op.
INFO:tensorflow:Starting standard services.
Starting standard services.
INFO:tensorflow:Starting queue runners.
Starting queue runners.
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [string_input_producer requires a non-null input tensor]
[[Node: input_producer_1/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer_1/Greater/_133, input_producer_1/Assert/Assert/data_0)]]
[[Node: input_producer_1/RandomShuffle/_143 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_input_producer_1/RandomShuffle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, assertion failed: [string_input_producer requires a non-null input tensor]
[[Node: input_producer_1/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer_1/Greater/_133, input_producer_1/Assert/Assert/data_0)]]
[[Node: input_producer_1/RandomShuffle/_143 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_input_producer_1/RandomShuffle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Training on 129916 sentences (129916 examples)
Deserializing model: result/all/001/model.tf
INFO:tensorflow:Restoring parameters from result/all/001/model.tf
Restoring parameters from result/all/001/model.tf
2019-06-09 00:48:32.857055: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key transitions not found in checkpoint
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key transitions not found in checkpoint
[[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 990, in managed_session
yield sess
File "train.py", line 518, in main
saver.restore(sess, FLAGS.model_dir + "/model.tf")
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1802, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key transitions not found in checkpoint
[[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]

Caused by op 'save_1/RestoreV2', defined at:
File "train.py", line 620, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 235, in main
saver = tf.train.Saver(var_list=model_vars)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1338, in init
self.build()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1347, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 1384, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 472, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 886, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1463, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): Key transitions not found in checkpoint
[[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 620, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 531, in main
logger.info("Best dev F1: %2.2f" % (best_score))
File "/usr/lib/python3.6/contextlib.py", line 99, in exit
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 1000, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/supervisor.py", line 828, in stop
ignore_live_threads=ignore_live_threads)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 252, in _run
enqueue_callable()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1244, in _single_operation_run
self._call_tf_sessionrun(None, {}, [], target_list, None)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [string_input_producer requires a non-null input tensor]
[[Node: input_producer_1/Assert/Assert = Assert[T=[DT_STRING], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](input_producer_1/Greater/_133, input_producer_1/Assert/Assert/data_0)]]
[[Node: input_producer_1/RandomShuffle/_143 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_12_input_producer_1/RandomShuffle", tensor_type=DT_STRING, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/ID-CNN-CWS$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant