Why Use a Third-Party Implementation of `CameraPredictor`? #692

PairZhu · 2024-10-24T06:48:08Z

Search before asking

I have searched the X-AnyLabeling Docs and issues and found no similar questions.

Question

I have been trying to modify the SAM2Video-related code to enable adding annotations on arbitrary frames. However, I was unable to find documentation on the SAM2CameraPredictor, leaving me to interpret its functionality based solely on the function names. During this process, I encountered several bugs.

After further investigation, I found that this code doesn’t seem to originate from the official SAM2 repository but instead comes from a third-party repository. The added functionality from this source appears to be immature and unstable. Despite multiple modifications to the code, I was still unable to achieve accurate predictions.

One crucial issue is that this code seems to be designed for real-time camera applications. However, for annotation tasks on non-real-time images, there’s no need to use SAM2CameraPredictor. While it may have some benefit in reducing startup time for longer sequences, it should at the very least be an optional feature, not a default one.

Additional

No response

The text was updated successfully, but these errors were encountered:

CVHub520 · 2024-10-24T07:40:48Z

Hey there! @PairZhu,

Thanks for bringing this up - I completely understand your frustration with the current CameraPredictor implementation.

You're spot on about this being from a third-party source rather than the official SAM2 repo.

Actually, the official SAM2 implementation uses propagate_in_video to process entire video chunks at once (If my memory serves me right 👀), which is great for batch processing but not so great for what we're trying to do with interactive annotations.

Here's what I'm thinking would work better:

Anyway, instead of using this third-party CameraPredictor (which is really intended for real-time camera applications), it would be better to let users have:

Track any objects they're interested in on any frame
Make adjustments whenever needed
Have more control over the whole annotation process

I'm actively working on improving this software when I have time, but honestly, this is something that would really benefit from community input. The current version is just a basic implementation to get things working - kind of like a proof of concept.

Would love to hear your thoughts on this approach! Have you tried any other methods that might work better? I'm definitely open to suggestions and would be happy to discuss different solutions. Let's make this work better for everyone's annotation needs! 😊

PairZhu · 2024-10-24T08:09:01Z

Hey there! @PairZhu,

Thanks for bringing this up - I completely understand your frustration with the current CameraPredictor implementation.

You're spot on about this being from a third-party source rather than the official SAM2 repo.

Actually, the official SAM2 implementation uses propagate_in_video to process entire video chunks at once (If my memory serves me right 👀), which is great for batch processing but not so great for what we're trying to do with interactive annotations.

Here's what I'm thinking would work better:

Anyway, instead of using this third-party CameraPredictor (which is really intended for real-time camera applications), it would be better to let users have:

Track any objects they're interested in on any frame

Make adjustments whenever needed

Have more control over the whole annotation process

I'm actively working on improving this software when I have time, but honestly, this is something that would really benefit from community input. The current version is just a basic implementation to get things working - kind of like a proof of concept.

Would love to hear your thoughts on this approach! Have you tried any other methods that might work better? I'm definitely open to suggestions and would be happy to discuss different solutions. Let's make this work better for everyone's annotation needs! 😊

My idea is to allow users to manually annotate multiple frames and then use the "Run (I)" button to run the model on a specific frame, or use the "Auto run all images at once" option for batch inference. Additionally, new annotations can be added at any time. This approach aligns with the intuitive nature of annotation tasks and is more convenient.

I have already made some adaptations in my local repository to support this, but I haven’t yet switched from CameraPredictor to VideoPredictor. My previous attempts to modify CameraPredictor were unsuccessful, and I don't believe modifying the third-party library to adapt the code is the right approach.

@CVHub520 If possible, I would like to create a new branch to submit part of the code first. I may not have the bandwidth to quickly complete this feature in the near future, and assistance from other developers would be appreciated to finalize this change.

CVHub520 · 2024-10-24T08:30:40Z

Thanks for the detailed explanation! Your approach makes a lot of sense - separating manual annotation and inference modes would indeed make the workflow more intuitive and flexible.

I really like your suggestion about the dual-mode operation:

Manual frame-by-frame annotation with "Run (I)"
Batch processing with "Auto run all images at once"

This design would give users more control while maintaining efficiency for batch operations. Since you've already started the implementation, creating a new branch would be great! Even partial progress would be valuable for the community to build upon.

Please feel free to submit your current work - incomplete features are welcome, and we can use the PR discussion to plan out the remaining tasks. We would greatly appreciate the support and collaboration of community members to help move this initiative forward.

You can either create a new branch in your fork, or let me know if you'd prefer I set up a feature branch in the main repository. Whatever works best for your workflow!

PairZhu · 2024-10-24T09:10:32Z

Thank you for the encouraging feedback! I’m glad to hear that the dual-mode operation idea resonates with you. I agree that separating manual annotation and batch inference will offer more flexibility and control for users.

Regarding the implementation, I plan to make several incremental submissions, and the code may not be fully functional after each commit. To avoid the risk of merging incomplete code into the main branch and to make it easier for other developers to collaborate, I believe it would be better to create a dedicated branch in the main repository.

This way, we can work on the feature collaboratively without affecting the stability of the main branch. Let me know if you’re able to set up the branch, or if you’d prefer me to proceed in another way.

CVHub520 · 2024-10-24T10:04:07Z

I've created a new branch dev-sam2-video in the main repository for this feature development. This dedicated branch will indeed be perfect for incremental submissions while keeping the main branch stable.

Feel free to start pushing your changes to this branch whenever you're ready. As you mentioned, we can work on improving the feature collaboratively, and having a dedicated branch will make it easier for other developers to join in and contribute.

Let me know if you need any help getting started with the branch or have any questions about the next steps!

PairZhu added the question Further information is requested label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why Use a Third-Party Implementation of `CameraPredictor`? #692

Why Use a Third-Party Implementation of `CameraPredictor`? #692

PairZhu commented Oct 24, 2024

CVHub520 commented Oct 24, 2024

PairZhu commented Oct 24, 2024

CVHub520 commented Oct 24, 2024

PairZhu commented Oct 24, 2024

CVHub520 commented Oct 24, 2024

Why Use a Third-Party Implementation of CameraPredictor? #692

Why Use a Third-Party Implementation of CameraPredictor? #692

Comments

PairZhu commented Oct 24, 2024

Search before asking

Question

Additional

CVHub520 commented Oct 24, 2024

PairZhu commented Oct 24, 2024

CVHub520 commented Oct 24, 2024

PairZhu commented Oct 24, 2024

CVHub520 commented Oct 24, 2024

Why Use a Third-Party Implementation of `CameraPredictor`? #692

Why Use a Third-Party Implementation of `CameraPredictor`? #692