Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement model cursor for visual feedback #760

Open
abrichr opened this issue Jun 16, 2024 · 16 comments · May be fixed by #823
Open

Implement model cursor for visual feedback #760

abrichr opened this issue Jun 16, 2024 · 16 comments · May be fixed by #823
Labels
$ bounty $ Please suggest a price range 🙏 💎 Bounty enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@abrichr
Copy link
Member

abrichr commented Jun 16, 2024

Feature request

Update: see #760 (comment) for the latest requirements.

We want to be able to give the model the ability to:

  1. paint a red dot on its suggested target location
  2. look at the screenshot with the dot on it,
  3. optionally self correct.

Thank you @LunjunZhang for the suggestion 🙏

This involves creating a CursorReplayStrategy (based on the VanillaReplayStrategy) that implements the required prompting.

Motivation

Correct errors, e.g. missed segmentations.

Possibly related: https://arxiv.org/abs/2406.09403:

Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. The LM conducts planning and reasoning according to the visual artifacts it has drawn.
...
Sketchpad substantially improves performance on all tasks over strong base models with no sketching, yielding an average gain of 12.7% on math tasks, and 8.6% on vision tasks. GPT-4o with Sketchpad sets a new state of the art on all tasks, including V*Bench (80.3%), BLINK spatial reasoning (83.9%), and visual correspondence (80.8%). All codes and data are in this https URL.

@abrichr abrichr added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed $ bounty $ Please suggest a price range 🙏 💎 Bounty and removed 💎 Bounty labels Jun 16, 2024
@abrichr
Copy link
Member Author

abrichr commented Jun 17, 2024

/bounty $1000

Copy link

algora-pbc bot commented Jun 17, 2024

💎 $1,000 bounty • OpenAdaptAI

Steps to solve:

  1. Start working: Comment /attempt #760 with your implementation plan
  2. Submit work: Create a pull request including /claim #760 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to OpenAdaptAI/OpenAdapt!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @Amanullah1002 Jun 17, 2024, 3:18:43 AM WIP
🔴 @Subh231004 Jun 17, 2024, 6:29:42 AM WIP
🔴 @Anshgrover23 Jun 17, 2024, 6:31:46 AM WIP
🟢 @onyedikachi-david Jun 25, 2024, 3:47:44 PM #823
🔴 @Ahmadkhan02 Jul 2, 2024, 8:09:09 PM WIP
🟢 @varshith257 Jul 4, 2024, 8:27:40 PM WIP
🟢 @stdthoth Sep 12, 2024, 8:37:31 PM WIP
🟢 @hoklims #923

@Subh231004
Copy link

Subh231004 commented Jun 17, 2024

/attempt #760

@Anshgrover23
Copy link

Anshgrover23 commented Jun 17, 2024

/attempt #760

Implementation Plan for Model Cursor Feedback (Issue #760)
Create CursorReplayStrategy: I'll develop a new CursorReplayStrategy class extending VanillaReplayStrategy.
Paint Red Dot: I'll implement a method to paint a red dot on the target location within a given image.
Screenshot Capture: I'll implement a method to capture a screenshot and overlay the red dot on it.
Self-Correction: I'll add an optional self-correction mechanism based on the screenshot with the dot.
Testing: I'll write and execute unit tests to ensure the functionality works as intended.
Documentation: I'll update the project documentation to include usage instructions for the new strategy.
Pull Request: I'll submit a PR for review, incorporating any feedback provided.
This plan will systematically address the issue by creating a targeted strategy, ensuring it functions correctly, and updating the documentation for users.

@abrichr abrichr changed the title Implement model cursor feedback Implement model cursor for visual feedback Jun 17, 2024
@abrichr
Copy link
Member Author

abrichr commented Jun 20, 2024

@Subh231004 please keep the discussion related to your pull request on your pull request and not here. I have replied to your comment there.

@onyedikachi-david
Copy link

onyedikachi-david commented Jun 25, 2024

/attempt #760

Algora profile Completed bounties Tech Active attempts Options
@onyedikachi-david 2 bounties from 1 project
JavaScript, Shell
﹟764
Cancel attempt

@Ahmadkhan02
Copy link

Ahmadkhan02 commented Jul 2, 2024

/attempt #760

Algora profile Completed bounties Tech Active attempts Options
@Ahmadkhan02 1 bounty from 1 project
TypeScript, Jupyter Notebook
Cancel attempt

Copy link

algora-pbc bot commented Jul 4, 2024

💡 @onyedikachi-david submitted a pull request that claims the bounty. You can visit your bounty board to reward.

@varshith257
Copy link

varshith257 commented Jul 4, 2024

/attempt #760

Algora profile Completed bounties Tech Active attempts Options
@varshith257 4 bounties from 2 projects
Python, Rust, TypeScript, Go
Cancel attempt

@stdthoth
Copy link

Hi @abrichr is this still available ?

@abrichr
Copy link
Member Author

abrichr commented Sep 12, 2024

Hi @stdthoth , thanks for your interest.

We attempted a few different approaches at #867. It is available if you can implement a different approach that improves on the performance of any of these!

@stdthoth
Copy link

stdthoth commented Sep 12, 2024

/attempt #760

@stdthoth
Copy link

@abrichr i am working on it now... could you possibly assign this to me for a week ?

@abrichr
Copy link
Member Author

abrichr commented Sep 12, 2024

Hi @stdthoth , thank you! Can you please clarify your request?

I just updated the description to include more details about the current approaches, recreated here:

I believe the next step here is to systematically evaluate the performance of these in a repeatable way (e.g. programmatically). Only then will we be able to implement the requirement to:

implement a different approach that improves on the performance of any of these

Please let me know if you have any questions!

Edit: if you prefer, you can also implement a novel approach, without evaluating these ones. But we will be unable to award the bounty until we can confirm that your approach outperforms all of these.

Edit: https://visualsketchpad.github.io/ may perform very well.

@hoklims
Copy link

hoklims commented Nov 19, 2024

/attempt #760

Copy link

algora-pbc bot commented Nov 19, 2024

💡 @hoklims submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
$ bounty $ Please suggest a price range 🙏 💎 Bounty enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
8 participants