result about run inference.py #163

sunbin1205 · 2018-01-23T02:46:13Z

Perform inference. py tests a picture, and the segmentation result shows no complete outline of the object, but a lot of color spots. How to solve it? Looking forward to your reply!!
@DrSleep

sunbin1205 · 2018-01-23T03:42:32Z

According to the link provided by readme, only deeplab_resnet_init. CKPT file, not the pre-trained model file, is this the reason for the inaccuracy of inferernce?? Could you please provide the pre- training document? Thank you very much!
@DrSleep

DrSleep · 2018-01-24T01:16:50Z

there is deeplab_resnet.ckpt provided; you need to download it and run inference with it

sunbin1205 · 2018-01-24T03:10:12Z

I am very glad to receive your reply. The last problem has been solved!
But the other problem is that I have downloaded the segmentationclassaug dataset in the dataset, and I have this error when I run the train.py.
step 62 loss = 1.486, (62.292 sec/step)
step 63 loss = 1.632, (61.552 sec/step)
2018-01-25 02:10:06.684534: W tensorflow/core/framework/op_kernel.cc:1152] Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg
step 64 loss = 1.602, (62.308 sec/step)
step 65 loss = 1.555, (61.499 sec/step)
step 66 loss = 1.723, (62.328 sec/step)
Traceback (most recent call last):
File "train.py", line 258, in
main()
File "train.py", line 251, in main
loss_value, _ = sess.run([reduced_loss, train_op], feed_dict=feed_dict)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at:
File "train.py", line 258, in
main()
File "train.py", line 146, in main
image_batch, label_batch = reader.dequeue(args.batch_size)
File "/media/Linux/sun/Segmentation/tensorflow-deeplab-resnet-master/deeplab_resnet/image_reader.py", line 179, in dequeue
num_elements)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch
name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
This is the result of incorrect data set. I tried to delete the relevant path in the train.txt, but it seems that there are too many problems, and I think deleting the path is not a correct method. What is the reason for this?

DrSleep · 2018-01-24T03:29:21Z

Make sure that all the images from the training list are present. Otherwise, you will be getting this error: Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg

…

On 24 January 2018 at 13:40, sunbin1205 ***@***.***> wrote: I am very glad to receive your reply. The last problem has been solved! But the other problem is that I have downloaded the segmentationclassaug dataset in the dataset, and I have this error when I run the train.py. step 62 loss = 1.486, (62.292 sec/step) step 63 loss = 1.632, (61.552 sec/step) 2018-01-25 02:10:06.684534: W tensorflow/core/framework/op_kernel.cc:1152] Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg step 64 loss = 1.602, (62.308 sec/step) step 65 loss = 1.555, (61.499 sec/step) step 66 loss = 1.723, (62.328 sec/step) Traceback (most recent call last): File "train.py", line 258, in main() File "train.py", line 251, in main loss_value, _ = sess.run([reduced_loss, train_op], feed_dict=feed_dict) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/ replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]] Caused by op u'create_inputs/batch', defined at: File "train.py", line 258, in main() File "train.py", line 146, in main image_batch, label_batch = reader.dequeue(args.batch_size) File "/media/Linux/sun/Segmentation/tensorflow- deeplab-resnet-master/deeplab_resnet/image_reader.py", line 179, in dequeue num_elements) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/training/input.py", line 917, in batch name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/training/input.py", line 712, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/ops.py", line 1228, in *init* self._traceback = _extract_stack() OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/ replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]] This is the result of incorrect data set. I tried to delete the relevant path in the train.txt, but it seems that there are too many problems, and I think deleting the path is not a correct method. What is the reason for this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#163 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHemmOv8_rTRp6LEAKMjqKzh1W95rcjlks5tNp8UgaJpZM4Ro-2a> .

sunbin1205 · 2018-01-25T04:30:45Z

@DrSleep I'm so sorry to distrub you again.Run the (python fine_tun.py -not-restore-last) error.it seems that jpg and png can be load ,but what is the problem?I guess may be the reason is cuda??

2018-01-26 03:50:20.103520: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-01-26 03:50:20.127612: # E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: # CUDA_ERROR_NO_DEVICE
2018-01-26 03:50:20.127687: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: zbw-System-Product-Name
2018-01-26 03:50:20.127700: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: zbw-System-Product-Name
2018-01-26 03:50:20.127740: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 387.26.0
2018-01-26 03:50:20.127773: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.90 Tue Sep 19 19:17:35 PDT 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
"""
2018-01-26 03:50:20.127796: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.90.0
2018-01-26 03:50:20.127807: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 384.90.0 does not match DSO version 387.26.0 -- cannot find working devices in this configuration
Restored model parameters from ./deeplab_resnet.ckpt
Traceback (most recent call last):
File "fine_tune.py", line 207, in
main()
File "fine_tune.py", line 196, in main
loss_value, images, labels, preds, summary, _ = sess.run([reduced_loss, image_batch, label_batch, pred, total_summary, optim])
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 2, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at:
File "fine_tune.py", line 207, in
main()
File "fine_tune.py", line 125, in main
image_batch, label_batch = reader.dequeue(args.batch_size)
File "/media/Linux/sun/Segmentation/building_segmentation/deeplab_resnet/image_reader.py", line 179, in dequeue
num_elements)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch
name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 2, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Jingyao12 · 2018-02-09T14:45:42Z

Hi @DrSleep,

I also meet the problem when I run the inference.py. I retain the model using the following setup, because the original batch size is out of memory I changed it to 4
BATCH_SIZE = 4
DATA_DIRECTORY = '/PASCAL/SemanticImg'
DATA_LIST_PATH = './dataset/train.txt'
IGNORE_LABEL = 255
INPUT_SIZE = '321,321'
LEARNING_RATE = 2.5e-4
MOMENTUM = 0.9
NUM_CLASSES = 21
NUM_STEPS = 20001
POWER = 0.9
RANDOM_SEED = 1234
RESTORE_FROM = './deeplab_resnet.ckpt'
SAVE_NUM_IMAGES = 2
SAVE_PRED_EVERY = 1000
SNAPSHOT_DIR = './snapshots/'
WEIGHT_DECAY = 0.0005

the train.txt is as follows:

/JPEGImages/2007_000032.jpg /SegmentationClass/2007_000032.png
/JPEGImages/2007_000039.jpg /SegmentationClass/2007_000039.png
/JPEGImages/2007_000063.jpg /SegmentationClass/2007_000063.png
/JPEGImages/2007_000068.jpg /SegmentationClass/2007_000068.png
/JPEGImages/2007_000121.jpg /SegmentationClass/2007_000121.png
/JPEGImages/2007_000170.jpg /SegmentationClass/2007_000170.png
/JPEGImages/2007_000241.jpg /SegmentationClass/2007_000241.png
/JPEGImages/2007_000243.jpg /SegmentationClass/2007_000243.png
/JPEGImages/2007_000250.jpg /SegmentationClass/2007_000250.png
/JPEGImages/2007_000256.jpg /SegmentationClass/2007_000256.png
/JPEGImages/2007_000333.jpg /SegmentationClass/2007_000333.png
/JPEGImages/2007_000363.jpg /SegmentationClass/2007_000363.png
/JPEGImages/2007_000364.jpg /SegmentationClass/2007_000364.png
/JPEGImages/2007_000392.jpg /SegmentationClass/2007_000392.png
/JPEGImages/2007_000480.jpg /SegmentationClass/2007_000480.png
/JPEGImages/2007_000504.jpg /SegmentationClass/2007_000504.png
/JPEGImages/2007_000515.jpg /SegmentationClass/2007_000515.png
/JPEGImages/2007_000528.jpg /SegmentationClass/2007_000528.png
/JPEGImages/2007_000549.jpg /SegmentationClass/2007_000549.png
/JPEGImages/2007_000584.jpg /SegmentationClass/2007_000584.png
/JPEGImages/2007_000645.jpg /SegmentationClass/2007_000645.png
/JPEGImages/2007_000648.jpg /SegmentationClass/2007_000648.png
/JPEGImages/2007_000713.jpg /SegmentationClass/2007_000713.png
/JPEGImages/2007_000720.jpg /SegmentationClass/2007_000720.png
/JPEGImages/2007_000733.jpg /SegmentationClass/2007_000733.png
/JPEGImages/2007_000738.jpg /SegmentationClass/2007_000738.png
/JPEGImages/2007_000768.jpg /SegmentationClass/2007_000768.png
.
.
.
The total number of training images is 1464 , the name of the images is from the VOC2012 trian list.

The logs is showing as follow:
Restored model parameters from ./deeplab_resnet.ckpt
The checkpoint has been created.
step 0 loss = 1.268, (16.858 sec/step)
step 1 loss = 3.971, (1.734 sec/step)
step 2 loss = 1.308, (1.339 sec/step)
step 3 loss = 2.991, (1.329 sec/step)
step 4 loss = 1.252, (1.346 sec/step)
step 5 loss = 1.344, (1.335 sec/step)
step 6 loss = 8.126, (1.331 sec/step)
step 7 loss = 4.652, (1.339 sec/step)
step 8 loss = 5.097, (1.339 sec/step)
step 9 loss = 1.318, (1.334 sec/step)
step 10 loss = 1.769, (1.353 sec/step)
.
.
.
step 19990 loss = 1.191, (1.419 sec/step)
step 19991 loss = 1.183, (1.425 sec/step)
step 19992 loss = 1.197, (1.424 sec/step)
step 19993 loss = 1.183, (1.422 sec/step)
step 19994 loss = 1.184, (1.408 sec/step)
step 19995 loss = 1.192, (1.416 sec/step)
step 19996 loss = 1.183, (1.419 sec/step)
step 19997 loss = 1.183, (1.414 sec/step)
step 19998 loss = 1.183, (1.420 sec/step)
step 19999 loss = 1.183, (1.437 sec/step)
The checkpoint has been created.
step 20000 loss = 1.183, (12.276 sec/step)

It looks normal right?

but when I run the inferency. py the results are all background,even though I use the training image.
Also if I use the deeplab_resnet.ckpt for inference, the output has both background and the segmented image.

Do you have any suggestion about this error?

Thank you!

DrSleep · 2018-04-08T04:49:06Z

@sunbin1205

E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 384.90.0 does not match DSO version 387.26.0 -- cannot find working devices in this configuration

Something with the drivers I assume. Can't help with that one.

@minnieyao
You are restoring from the already pre-trained model, hence the learning rate might be too high. Try to restore from the init model and check the progress in tensorboard, should be alright

RaphaelDuan · 2019-08-26T09:03:06Z

Hi, @minnieyao.
I got the same problem. Did you solve it?
I restored from the deeplab_resnet_init.ckpt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

result about run inference.py #163

result about run inference.py #163

sunbin1205 commented Jan 23, 2018

sunbin1205 commented Jan 23, 2018

DrSleep commented Jan 24, 2018

sunbin1205 commented Jan 24, 2018

DrSleep commented Jan 24, 2018 via email

sunbin1205 commented Jan 25, 2018 •

edited

Loading

Jingyao12 commented Feb 9, 2018

DrSleep commented Apr 8, 2018

RaphaelDuan commented Aug 26, 2019 •

edited

Loading

result about run inference.py #163

result about run inference.py #163

Comments

sunbin1205 commented Jan 23, 2018

sunbin1205 commented Jan 23, 2018

DrSleep commented Jan 24, 2018

sunbin1205 commented Jan 24, 2018

DrSleep commented Jan 24, 2018 via email

sunbin1205 commented Jan 25, 2018 • edited Loading

Jingyao12 commented Feb 9, 2018

DrSleep commented Apr 8, 2018

RaphaelDuan commented Aug 26, 2019 • edited Loading

sunbin1205 commented Jan 25, 2018 •

edited

Loading

RaphaelDuan commented Aug 26, 2019 •

edited

Loading