Implement model cursor for visual feedback #760

abrichr · 2024-06-16T12:58:58Z

Feature request

Update: see #760 (comment) for the latest requirements.

We want to be able to give the model the ability to:

paint a red dot on its suggested target location
look at the screenshot with the dot on it,
optionally self correct.

Thank you @LunjunZhang for the suggestion 🙏

This involves creating a CursorReplayStrategy (based on the VanillaReplayStrategy) that implements the required prompting.

Motivation

Correct errors, e.g. missed segmentations.

Possibly related: https://arxiv.org/abs/2406.09403:

Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry problems; we mark and circle when reasoning on maps; we use sketches to amplify our ideas and relieve our limited-capacity working memory. However, such actions are missing in current multimodal language models (LMs). Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. The LM conducts planning and reasoning according to the visual artifacts it has drawn.
...
Sketchpad substantially improves performance on all tasks over strong base models with no sketching, yielding an average gain of 12.7% on math tasks, and 8.6% on vision tasks. GPT-4o with Sketchpad sets a new state of the art on all tasks, including V*Bench (80.3%), BLINK spatial reasoning (83.9%), and visual correspondence (80.8%). All codes and data are in this https URL.

abrichr · 2024-06-17T00:31:01Z

/bounty $1000

algora-pbc · 2024-06-17T00:31:05Z

💎 $1,000 bounty • OpenAdaptAI

Steps to solve:

Start working: Comment /attempt #760 with your implementation plan
Submit work: Create a pull request including /claim #760 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to OpenAdaptAI/OpenAdapt!

Add a bounty • Share on socials

Attempt	Started (GMT+0)	Solution
🟢 @Amanullah1002	Jun 17, 2024, 3:18:43 AM	WIP
🔴 @Subh231004	Jun 17, 2024, 6:29:42 AM	WIP
🔴 @Anshgrover23	Jun 17, 2024, 6:31:46 AM	WIP
🟢 @onyedikachi-david	Jun 25, 2024, 3:47:44 PM	#823
🔴 @Ahmadkhan02	Jul 2, 2024, 8:09:09 PM	WIP
🟢 @varshith257	Jul 4, 2024, 8:27:40 PM	WIP
🟢 @stdthoth	Sep 12, 2024, 8:37:31 PM	WIP
🟢 @hoklims		#923

Subh231004 · 2024-06-17T06:29:40Z

/attempt #760

Options

Cancel my attempt

Anshgrover23 · 2024-06-17T06:31:44Z

/attempt #760

Implementation Plan for Model Cursor Feedback (Issue #760)
Create CursorReplayStrategy: I'll develop a new CursorReplayStrategy class extending VanillaReplayStrategy.
Paint Red Dot: I'll implement a method to paint a red dot on the target location within a given image.
Screenshot Capture: I'll implement a method to capture a screenshot and overlay the red dot on it.
Self-Correction: I'll add an optional self-correction mechanism based on the screenshot with the dot.
Testing: I'll write and execute unit tests to ensure the functionality works as intended.
Documentation: I'll update the project documentation to include usage instructions for the new strategy.
Pull Request: I'll submit a PR for review, incorporating any feedback provided.
This plan will systematically address the issue by creating a targeted strategy, ensuring it functions correctly, and updating the documentation for users.

Options

Cancel my attempt

abrichr · 2024-06-20T13:52:20Z

@Subh231004 please keep the discussion related to your pull request on your pull request and not here. I have replied to your comment there.

onyedikachi-david · 2024-06-25T15:47:42Z

/attempt #760

Algora profile	Completed bounties	Tech	Active attempts	Options
@onyedikachi-david	2 bounties from 1 project	JavaScript, Shell	﹟764	Cancel attempt

Ahmadkhan02 · 2024-07-02T20:09:07Z

/attempt #760

Algora profile	Completed bounties	Tech	Active attempts	Options
@Ahmadkhan02	1 bounty from 1 project	TypeScript, Jupyter Notebook		Cancel attempt

algora-pbc · 2024-07-04T10:10:21Z

💡 @onyedikachi-david submitted a pull request that claims the bounty. You can visit your bounty board to reward.

varshith257 · 2024-07-04T20:27:38Z

/attempt #760

Algora profile	Completed bounties	Tech	Active attempts	Options
@varshith257	4 bounties from 2 projects	Python, Rust, TypeScript, Go		Cancel attempt

stdthoth · 2024-09-12T18:50:25Z

Hi @abrichr is this still available ?

abrichr · 2024-09-12T19:39:08Z

Hi @stdthoth , thanks for your interest.

We attempted a few different approaches at #867. It is available if you can implement a different approach that improves on the performance of any of these!

stdthoth · 2024-09-12T20:37:28Z

/attempt #760

Options

Cancel my attempt

stdthoth · 2024-09-12T20:40:21Z

@abrichr i am working on it now... could you possibly assign this to me for a week ?

abrichr · 2024-09-12T22:16:19Z

Hi @stdthoth , thank you! Can you please clarify your request?

I just updated the description to include more details about the current approaches, recreated here:

experiments/cursor/coords.py: Uses AI prompts to iteratively locate a target in an image by drawing concentric circles.
experiments/cursor/direction.py: Moves a cursor towards a target using AI-driven direction and magnitude adjustments.
experiments/cursor/grid.py: Identifies target cells in an image by overlaying a grid and using AI feedback.
experiments/cursor/joystick.py: Adjusts a cursor's position toward a target with joystick-like AI-guided movements.
experiments/cursor/joystick_history.py: Similar to joystick.py but tracks a longer history of movements.
experiments/cursor/quadrant.py: Locates a target by iteratively narrowing down search areas in image quadrants.
experiments/cursor/sample.py: Uses AI voting to find the closest cursor to a target in an image.
experiments/cursor/search.py: Refines cursor coordinates toward a target using binary search-like AI feedback.

I believe the next step here is to systematically evaluate the performance of these in a repeatable way (e.g. programmatically). Only then will we be able to implement the requirement to:

implement a different approach that improves on the performance of any of these

Please let me know if you have any questions!

Edit: if you prefer, you can also implement a novel approach, without evaluating these ones. But we will be unable to award the bounty until we can confirm that your approach outperforms all of these.

Edit: https://visualsketchpad.github.io/ may perform very well.

hoklims · 2024-11-19T14:39:19Z

/attempt #760

algora-pbc · 2024-11-19T16:01:58Z

💡 @hoklims submitted a pull request that claims the bounty. You can visit your bounty board to reward.

abrichr added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed $ bounty $ Please suggest a price range 🙏 💎 Bounty and removed 💎 Bounty labels Jun 16, 2024

algora-pbc bot added the 💎 Bounty label Jun 17, 2024

abrichr changed the title ~~Implement model cursor feedback~~ Implement model cursor for visual feedback Jun 17, 2024

R-ohit-B-isht mentioned this issue Jun 18, 2024

Implement CursorReplayStrategy for Visual Feedback R-ohit-B-isht/OpenAdapt#2

Open

Subh231004 mentioned this issue Jun 19, 2024

Implemented model cursor for visual feedback #760 #781

Closed

7 tasks

Ahmadkhan02 mentioned this issue Jul 2, 2024

cursorReplay #820

Closed

7 tasks

onyedikachi-david linked a pull request Jul 4, 2024 that will close this issue

feat(cursor): Add CursorReplayStrategy with red dot painting and self-correction #823

Open

7 tasks

hoklims mentioned this issue Nov 19, 2024

feat(cursor): Implement self-correcting cursor strategy with visual f… #923

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement model cursor for visual feedback #760

Implement model cursor for visual feedback #760

abrichr commented Jun 16, 2024 •

edited

Loading

abrichr commented Jun 17, 2024

algora-pbc bot commented Jun 17, 2024 •

edited

Loading

Subh231004 commented Jun 17, 2024 •

edited by algora-pbc bot

Loading

Anshgrover23 commented Jun 17, 2024 •

edited by algora-pbc bot

Loading

abrichr commented Jun 20, 2024

onyedikachi-david commented Jun 25, 2024 •

edited by algora-pbc bot

Loading

Ahmadkhan02 commented Jul 2, 2024 •

edited by algora-pbc bot

Loading

algora-pbc bot commented Jul 4, 2024

varshith257 commented Jul 4, 2024 •

edited

Loading

stdthoth commented Sep 12, 2024

abrichr commented Sep 12, 2024

stdthoth commented Sep 12, 2024 •

edited by algora-pbc bot

Loading

stdthoth commented Sep 12, 2024

abrichr commented Sep 12, 2024 •

edited

Loading

hoklims commented Nov 19, 2024

algora-pbc bot commented Nov 19, 2024

Implement model cursor for visual feedback #760

Implement model cursor for visual feedback #760

Comments

abrichr commented Jun 16, 2024 • edited Loading

Feature request

Motivation

abrichr commented Jun 17, 2024

algora-pbc bot commented Jun 17, 2024 • edited Loading

💎 $1,000 bounty • OpenAdaptAI

Steps to solve:

Subh231004 commented Jun 17, 2024 • edited by algora-pbc bot Loading

Anshgrover23 commented Jun 17, 2024 • edited by algora-pbc bot Loading

abrichr commented Jun 20, 2024

onyedikachi-david commented Jun 25, 2024 • edited by algora-pbc bot Loading

Ahmadkhan02 commented Jul 2, 2024 • edited by algora-pbc bot Loading

algora-pbc bot commented Jul 4, 2024

varshith257 commented Jul 4, 2024 • edited Loading

stdthoth commented Sep 12, 2024

abrichr commented Sep 12, 2024

stdthoth commented Sep 12, 2024 • edited by algora-pbc bot Loading

stdthoth commented Sep 12, 2024

abrichr commented Sep 12, 2024 • edited Loading

hoklims commented Nov 19, 2024

algora-pbc bot commented Nov 19, 2024

abrichr commented Jun 16, 2024 •

edited

Loading

algora-pbc bot commented Jun 17, 2024 •

edited

Loading

Subh231004 commented Jun 17, 2024 •

edited by algora-pbc bot

Loading

Anshgrover23 commented Jun 17, 2024 •

edited by algora-pbc bot

Loading

onyedikachi-david commented Jun 25, 2024 •

edited by algora-pbc bot

Loading

Ahmadkhan02 commented Jul 2, 2024 •

edited by algora-pbc bot

Loading

varshith257 commented Jul 4, 2024 •

edited

Loading

stdthoth commented Sep 12, 2024 •

edited by algora-pbc bot

Loading

abrichr commented Sep 12, 2024 •

edited

Loading