Skip to content
This repository has been archived by the owner on Sep 16, 2024. It is now read-only.

result about run inference.py #163

Open
sunbin1205 opened this issue Jan 23, 2018 · 8 comments
Open

result about run inference.py #163

sunbin1205 opened this issue Jan 23, 2018 · 8 comments

Comments

@sunbin1205
Copy link

Perform inference. py tests a picture, and the segmentation result shows no complete outline of the object, but a lot of color spots. How to solve it? Looking forward to your reply!!
@DrSleep

@sunbin1205
Copy link
Author

According to the link provided by readme, only deeplab_resnet_init. CKPT file, not the pre-trained model file, is this the reason for the inaccuracy of inferernce?? Could you please provide the pre- training document? Thank you very much!
@DrSleep

@DrSleep
Copy link
Owner

DrSleep commented Jan 24, 2018

there is deeplab_resnet.ckpt provided; you need to download it and run inference with it

@sunbin1205
Copy link
Author

I am very glad to receive your reply. The last problem has been solved!
But the other problem is that I have downloaded the segmentationclassaug dataset in the dataset, and I have this error when I run the train.py.
step 62 loss = 1.486, (62.292 sec/step)
step 63 loss = 1.632, (61.552 sec/step)
2018-01-25 02:10:06.684534: W tensorflow/core/framework/op_kernel.cc:1152] Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg
step 64 loss = 1.602, (62.308 sec/step)
step 65 loss = 1.555, (61.499 sec/step)
step 66 loss = 1.723, (62.328 sec/step)
Traceback (most recent call last):
File "train.py", line 258, in
main()
File "train.py", line 251, in main
loss_value, _ = sess.run([reduced_loss, train_op], feed_dict=feed_dict)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at:
File "train.py", line 258, in
main()
File "train.py", line 146, in main
image_batch, label_batch = reader.dequeue(args.batch_size)
File "/media/Linux/sun/Segmentation/tensorflow-deeplab-resnet-master/deeplab_resnet/image_reader.py", line 179, in dequeue
num_elements)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch
name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
This is the result of incorrect data set. I tried to delete the relevant path in the train.txt, but it seems that there are too many problems, and I think deleting the path is not a correct method. What is the reason for this?

@DrSleep
Copy link
Owner

DrSleep commented Jan 24, 2018 via email

@sunbin1205
Copy link
Author

sunbin1205 commented Jan 25, 2018

@DrSleep I'm so sorry to distrub you again.Run the (python fine_tun.py -not-restore-last) error.it seems that jpg and png can be load ,but what is the problem?I guess may be the reason is cuda??

2018-01-26 03:50:20.103520: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2018-01-26 03:50:20.127612: # E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: # CUDA_ERROR_NO_DEVICE
2018-01-26 03:50:20.127687: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: zbw-System-Product-Name
2018-01-26 03:50:20.127700: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: zbw-System-Product-Name
2018-01-26 03:50:20.127740: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 387.26.0
2018-01-26 03:50:20.127773: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.90 Tue Sep 19 19:17:35 PDT 2017
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
"""
2018-01-26 03:50:20.127796: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.90.0
2018-01-26 03:50:20.127807: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 384.90.0 does not match DSO version 387.26.0 -- cannot find working devices in this configuration
Restored model parameters from ./deeplab_resnet.ckpt
Traceback (most recent call last):
File "fine_tune.py", line 207, in
main()
File "fine_tune.py", line 196, in main
loss_value, images, labels, preds, summary, _ = sess.run([reduced_loss, image_batch, label_batch, pred, total_summary, optim])
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 2, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at:
File "fine_tune.py", line 207, in
main()
File "fine_tune.py", line 125, in main
image_batch, label_batch = reader.dequeue(args.batch_size)
File "/media/Linux/sun/Segmentation/building_segmentation/deeplab_resnet/image_reader.py", line 179, in dequeue
num_elements)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch
name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init
self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 2, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

@Jingyao12
Copy link

Hi @DrSleep,

I also meet the problem when I run the inference.py. I retain the model using the following setup, because the original batch size is out of memory I changed it to 4
BATCH_SIZE = 4
DATA_DIRECTORY = '/PASCAL/SemanticImg'
DATA_LIST_PATH = './dataset/train.txt'
IGNORE_LABEL = 255
INPUT_SIZE = '321,321'
LEARNING_RATE = 2.5e-4
MOMENTUM = 0.9
NUM_CLASSES = 21
NUM_STEPS = 20001
POWER = 0.9
RANDOM_SEED = 1234
RESTORE_FROM = './deeplab_resnet.ckpt'
SAVE_NUM_IMAGES = 2
SAVE_PRED_EVERY = 1000
SNAPSHOT_DIR = './snapshots/'
WEIGHT_DECAY = 0.0005

the train.txt is as follows:

/JPEGImages/2007_000032.jpg /SegmentationClass/2007_000032.png
/JPEGImages/2007_000039.jpg /SegmentationClass/2007_000039.png
/JPEGImages/2007_000063.jpg /SegmentationClass/2007_000063.png
/JPEGImages/2007_000068.jpg /SegmentationClass/2007_000068.png
/JPEGImages/2007_000121.jpg /SegmentationClass/2007_000121.png
/JPEGImages/2007_000170.jpg /SegmentationClass/2007_000170.png
/JPEGImages/2007_000241.jpg /SegmentationClass/2007_000241.png
/JPEGImages/2007_000243.jpg /SegmentationClass/2007_000243.png
/JPEGImages/2007_000250.jpg /SegmentationClass/2007_000250.png
/JPEGImages/2007_000256.jpg /SegmentationClass/2007_000256.png
/JPEGImages/2007_000333.jpg /SegmentationClass/2007_000333.png
/JPEGImages/2007_000363.jpg /SegmentationClass/2007_000363.png
/JPEGImages/2007_000364.jpg /SegmentationClass/2007_000364.png
/JPEGImages/2007_000392.jpg /SegmentationClass/2007_000392.png
/JPEGImages/2007_000480.jpg /SegmentationClass/2007_000480.png
/JPEGImages/2007_000504.jpg /SegmentationClass/2007_000504.png
/JPEGImages/2007_000515.jpg /SegmentationClass/2007_000515.png
/JPEGImages/2007_000528.jpg /SegmentationClass/2007_000528.png
/JPEGImages/2007_000549.jpg /SegmentationClass/2007_000549.png
/JPEGImages/2007_000584.jpg /SegmentationClass/2007_000584.png
/JPEGImages/2007_000645.jpg /SegmentationClass/2007_000645.png
/JPEGImages/2007_000648.jpg /SegmentationClass/2007_000648.png
/JPEGImages/2007_000713.jpg /SegmentationClass/2007_000713.png
/JPEGImages/2007_000720.jpg /SegmentationClass/2007_000720.png
/JPEGImages/2007_000733.jpg /SegmentationClass/2007_000733.png
/JPEGImages/2007_000738.jpg /SegmentationClass/2007_000738.png
/JPEGImages/2007_000768.jpg /SegmentationClass/2007_000768.png
.
.
.
The total number of training images is 1464 , the name of the images is from the VOC2012 trian list.

The logs is showing as follow:
Restored model parameters from ./deeplab_resnet.ckpt
The checkpoint has been created.
step 0 loss = 1.268, (16.858 sec/step)
step 1 loss = 3.971, (1.734 sec/step)
step 2 loss = 1.308, (1.339 sec/step)
step 3 loss = 2.991, (1.329 sec/step)
step 4 loss = 1.252, (1.346 sec/step)
step 5 loss = 1.344, (1.335 sec/step)
step 6 loss = 8.126, (1.331 sec/step)
step 7 loss = 4.652, (1.339 sec/step)
step 8 loss = 5.097, (1.339 sec/step)
step 9 loss = 1.318, (1.334 sec/step)
step 10 loss = 1.769, (1.353 sec/step)
.
.
.
step 19990 loss = 1.191, (1.419 sec/step)
step 19991 loss = 1.183, (1.425 sec/step)
step 19992 loss = 1.197, (1.424 sec/step)
step 19993 loss = 1.183, (1.422 sec/step)
step 19994 loss = 1.184, (1.408 sec/step)
step 19995 loss = 1.192, (1.416 sec/step)
step 19996 loss = 1.183, (1.419 sec/step)
step 19997 loss = 1.183, (1.414 sec/step)
step 19998 loss = 1.183, (1.420 sec/step)
step 19999 loss = 1.183, (1.437 sec/step)
The checkpoint has been created.
step 20000 loss = 1.183, (12.276 sec/step)

It looks normal right?

but when I run the inferency. py the results are all background,even though I use the training image.
Also if I use the deeplab_resnet.ckpt for inference, the output has both background and the segmented image.

Do you have any suggestion about this error?

Thank you!

@DrSleep
Copy link
Owner

DrSleep commented Apr 8, 2018

@sunbin1205

E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 384.90.0 does not match DSO version 387.26.0 -- cannot find working devices in this configuration

Something with the drivers I assume. Can't help with that one.

@minnieyao
You are restoring from the already pre-trained model, hence the learning rate might be too high. Try to restore from the init model and check the progress in tensorboard, should be alright

@RaphaelDuan
Copy link

RaphaelDuan commented Aug 26, 2019

Hi, @minnieyao.
I got the same problem. Did you solve it?
I restored from the deeplab_resnet_init.ckpt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants