Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Use a Third-Party Implementation of CameraPredictor? #692

Open
1 task done
PairZhu opened this issue Oct 24, 2024 · 5 comments
Open
1 task done

Why Use a Third-Party Implementation of CameraPredictor? #692

PairZhu opened this issue Oct 24, 2024 · 5 comments
Labels
question Further information is requested

Comments

@PairZhu
Copy link
Contributor

PairZhu commented Oct 24, 2024

Search before asking

  • I have searched the X-AnyLabeling Docs and issues and found no similar questions.

Question

I have been trying to modify the SAM2Video-related code to enable adding annotations on arbitrary frames. However, I was unable to find documentation on the SAM2CameraPredictor, leaving me to interpret its functionality based solely on the function names. During this process, I encountered several bugs.

After further investigation, I found that this code doesn’t seem to originate from the official SAM2 repository but instead comes from a third-party repository. The added functionality from this source appears to be immature and unstable. Despite multiple modifications to the code, I was still unable to achieve accurate predictions.

One crucial issue is that this code seems to be designed for real-time camera applications. However, for annotation tasks on non-real-time images, there’s no need to use SAM2CameraPredictor. While it may have some benefit in reducing startup time for longer sequences, it should at the very least be an optional feature, not a default one.

Additional

No response

@PairZhu PairZhu added the question Further information is requested label Oct 24, 2024
@CVHub520
Copy link
Owner

Hey there! @PairZhu,

Thanks for bringing this up - I completely understand your frustration with the current CameraPredictor implementation.

You're spot on about this being from a third-party source rather than the official SAM2 repo.

Actually, the official SAM2 implementation uses propagate_in_video to process entire video chunks at once (If my memory serves me right 👀), which is great for batch processing but not so great for what we're trying to do with interactive annotations.

Here's what I'm thinking would work better:

Anyway, instead of using this third-party CameraPredictor (which is really intended for real-time camera applications), it would be better to let users have:

  • Track any objects they're interested in on any frame
  • Make adjustments whenever needed
  • Have more control over the whole annotation process

I'm actively working on improving this software when I have time, but honestly, this is something that would really benefit from community input. The current version is just a basic implementation to get things working - kind of like a proof of concept.

Would love to hear your thoughts on this approach! Have you tried any other methods that might work better? I'm definitely open to suggestions and would be happy to discuss different solutions. Let's make this work better for everyone's annotation needs! 😊

@PairZhu
Copy link
Contributor Author

PairZhu commented Oct 24, 2024

Hey there! @PairZhu,

Thanks for bringing this up - I completely understand your frustration with the current CameraPredictor implementation.

You're spot on about this being from a third-party source rather than the official SAM2 repo.

Actually, the official SAM2 implementation uses propagate_in_video to process entire video chunks at once (If my memory serves me right 👀), which is great for batch processing but not so great for what we're trying to do with interactive annotations.

Here's what I'm thinking would work better:

Anyway, instead of using this third-party CameraPredictor (which is really intended for real-time camera applications), it would be better to let users have:

  • Track any objects they're interested in on any frame
  • Make adjustments whenever needed
  • Have more control over the whole annotation process

I'm actively working on improving this software when I have time, but honestly, this is something that would really benefit from community input. The current version is just a basic implementation to get things working - kind of like a proof of concept.

Would love to hear your thoughts on this approach! Have you tried any other methods that might work better? I'm definitely open to suggestions and would be happy to discuss different solutions. Let's make this work better for everyone's annotation needs! 😊

My idea is to allow users to manually annotate multiple frames and then use the "Run (I)" button to run the model on a specific frame, or use the "Auto run all images at once" option for batch inference. Additionally, new annotations can be added at any time. This approach aligns with the intuitive nature of annotation tasks and is more convenient.

I have already made some adaptations in my local repository to support this, but I haven’t yet switched from CameraPredictor to VideoPredictor. My previous attempts to modify CameraPredictor were unsuccessful, and I don't believe modifying the third-party library to adapt the code is the right approach.

@CVHub520 If possible, I would like to create a new branch to submit part of the code first. I may not have the bandwidth to quickly complete this feature in the near future, and assistance from other developers would be appreciated to finalize this change.

@CVHub520
Copy link
Owner

Thanks for the detailed explanation! Your approach makes a lot of sense - separating manual annotation and inference modes would indeed make the workflow more intuitive and flexible.

I really like your suggestion about the dual-mode operation:

  • Manual frame-by-frame annotation with "Run (I)"
  • Batch processing with "Auto run all images at once"

This design would give users more control while maintaining efficiency for batch operations. Since you've already started the implementation, creating a new branch would be great! Even partial progress would be valuable for the community to build upon.

Please feel free to submit your current work - incomplete features are welcome, and we can use the PR discussion to plan out the remaining tasks. We would greatly appreciate the support and collaboration of community members to help move this initiative forward.

You can either create a new branch in your fork, or let me know if you'd prefer I set up a feature branch in the main repository. Whatever works best for your workflow!

@PairZhu
Copy link
Contributor Author

PairZhu commented Oct 24, 2024

Thank you for the encouraging feedback! I’m glad to hear that the dual-mode operation idea resonates with you. I agree that separating manual annotation and batch inference will offer more flexibility and control for users.

Regarding the implementation, I plan to make several incremental submissions, and the code may not be fully functional after each commit. To avoid the risk of merging incomplete code into the main branch and to make it easier for other developers to collaborate, I believe it would be better to create a dedicated branch in the main repository.

This way, we can work on the feature collaboratively without affecting the stability of the main branch. Let me know if you’re able to set up the branch, or if you’d prefer me to proceed in another way.

@CVHub520
Copy link
Owner

I've created a new branch dev-sam2-video in the main repository for this feature development. This dedicated branch will indeed be perfect for incremental submissions while keeping the main branch stable.

Feel free to start pushing your changes to this branch whenever you're ready. As you mentioned, we can work on improving the feature collaboratively, and having a dedicated branch will make it easier for other developers to join in and contribute.

Let me know if you need any help getting started with the branch or have any questions about the next steps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants