Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about occlusion estimation #52

Open
JamesYang110043 opened this issue Jul 26, 2024 · 5 comments
Open

Question about occlusion estimation #52

JamesYang110043 opened this issue Jul 26, 2024 · 5 comments

Comments

@JamesYang110043
Copy link

Hi @hurjunhwa

First, I'd like to thank you for your excellent work on this project. It has been incredibly valuable for my research.

I have a question regarding the ablation study in Table 1 of your paper. Specifically, I'm curious about the "Occ" module. Is the module in Table 1 the same as the one highlighted in red in Figure 5? If so, could you explain why incorporating the "Occ" module improves the performance of optical flow estimation?

Thank you for your time and assistance.

Screenshot 2024-07-27 at 1 32 45 AM
Screenshot 2024-07-27 at 1 33 50 AM

@hurjunhwa
Copy link
Collaborator

Hi James,
Thanks for your interest in our work. Yes, the Occ module in Table 1 corresponds to the oclcusion decoder in Fig. 5.

Optical flow and occlusion estimation are complementary tasks. The intermediate estimation of occlusion at the previous pyramid level is input to the flow decoder at the next level and provides a useful cue for flow estimation.

Also, gradients that are backpropagated from the occlusion decoder affect the feature encoder and can make features more discriminative for flow estimation. Hope this helped!

@JamesYang110043
Copy link
Author

Thank you for your explanation. However, I'm having trouble understanding the part where you mentioned, "The intermediate estimation of occlusion at the previous pyramid level is input to the flow decoder at the next level and provides a useful cue for flow estimation."

From my understanding of the code, the optical flow and occlusion are calculated separately at each pyramid level, and I didn't see the occlusion from the previous level being used as input for the optical flow in the next level. Could you please clarify this part?

Or do you mean that although the occlusion estimation is not explicitly passed directly to the flow decoder, the predictions at each level (including optical flow and occlusion) are part of the input features for the next level? This provides an indirect cue that helps improve the accuracy of the estimation.

https://github.com/visinf/irr/blob/dacd07b1dc963fb8d3db7c75b562691af33f47b2/models/flownet1s_irr_occ.py#L80C9-L124C62

    # Flow Decoder
    predict_flow6        = self._predict_flow6(conv6_1)


    upsampled_flow6_to_5 = self._upsample_flow6_to_5(predict_flow6)
    deconv5              = self._deconv5(conv6_1)
    concat5              = concatenate_as((conv5_1, deconv5, upsampled_flow6_to_5), conv5_1, dim=1)
    predict_flow5        = self._predict_flow5(concat5)


    upsampled_flow5_to_4 = self._upsample_flow5_to_4(predict_flow5)
    deconv4              = self._deconv4(concat5)
    concat4              = concatenate_as((conv4_1, deconv4, upsampled_flow5_to_4), conv4_1, dim=1)
    predict_flow4        = self._predict_flow4(concat4)


    upsampled_flow4_to_3 = self._upsample_flow4_to_3(predict_flow4)
    deconv3              = self._deconv3(concat4)
    concat3              = concatenate_as((conv3_1, deconv3, upsampled_flow4_to_3), conv3_1, dim=1)
    predict_flow3        = self._predict_flow3(concat3)


    upsampled_flow3_to_2 = self._upsample_flow3_to_2(predict_flow3)
    deconv2              = self._deconv2(concat3)
    concat2              = concatenate_as((conv2_im1, deconv2, upsampled_flow3_to_2), conv2_im1, dim=1)
    predict_flow2        = self._predict_flow2(concat2)


    # Occ Decoder
    predict_occ6 = self._predict_occ6(conv6_1)


    upsampled_occ6_to_5 = self._upsample_occ6_to_5(predict_occ6)
    deconv_occ5         = self._deconv_occ5(conv6_1)
    concat_occ5         = concatenate_as((conv5_1, deconv_occ5, upsampled_occ6_to_5), conv5_1, dim=1)
    predict_occ5        = self._predict_occ5(concat_occ5)


    upsampled_occ5_to_4 = self._upsample_occ5_to_4(predict_occ5)
    deconv_occ4         = self._deconv_occ4(concat_occ5)
    concat_occ4         = concatenate_as((conv4_1, deconv_occ4, upsampled_occ5_to_4), conv4_1, dim=1)
    predict_occ4        = self._predict_occ4(concat_occ4)


    upsampled_occ4_to_3 = self._upsample_occ4_to_3(predict_occ4)
    deconv_occ3         = self._deconv_occ3(concat_occ4)
    concat_occ3         = concatenate_as((conv3_1, deconv_occ3, upsampled_occ4_to_3), conv3_1, dim=1)
    predict_occ3        = self._predict_occ3(concat_occ3)


    upsampled_occ3_to_2 = self._upsample_occ3_to_2(predict_occ3)
    deconv_occ2         = self._deconv_occ2(concat_occ3)
    concat_occ2         = concatenate_as((conv2_im1, deconv_occ2, upsampled_occ3_to_2), conv2_im1, dim=1)
    predict_occ2        = self._predict_occ2(concat_occ2)

@JamesYang110043
Copy link
Author

Hi , @hurjunhwa
Could you please explain this part? It would be very helpful to me, thank you.

@hurjunhwa
Copy link
Collaborator

Hi @JamesYang110043, sorry for the late reply!
Yes, you are totally right. The two decoders are completely separated. At the Upsampling Layer at the end of the network in Fig. 5, estimated flow is a part of the input. That's where flow explicitly helps to refine the occlusion.

In general, gradients that are backpropagated from the occlusion decoder update the feature encoder as well, so that's where the occlusion decoder indirectly helps the flow estimation. I think feature visualization might give a better explanation.

Thanks!
Junhwa

@JamesYang110043
Copy link
Author

Thank you for your reply, I understand it now. This has been very helpful to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants