Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are cross-modal feature and cross-model representation vector same? #21

Open
CHENG-danyang opened this issue Nov 13, 2024 · 2 comments
Open

Comments

@CHENG-danyang
Copy link

CHENG-danyang commented Nov 13, 2024

In your parper you write:"we concatenate the visual and textual representations to form the cross-modal features $$r\in \mathbb{R} ^{1\times D}$$", but the formular below writes:" $$o_u=Concate(o_u^{i(f)},o_u^t)$$", Are they the same vector? and in this formular: $$PM(k,i)=\frac{1}{N_{k,i}^s}\sum_{j=0}^N r_j^{k,i}$$ what's the meaning of $$N_{k,i}^s$$ ? I didn't find these details in the source code.
It is my understand that you first extract visual and textual representation and concate them to form the cross-modal feature $$r_u=Concat(o_u^{i(f)},o^t_u)$$, and grouped them into $$N_l$$ sets{ $$R_k;0 \le k \le N_l$$ } according to the sample label, then applying K-Means on each $$R_k$$ which split $$R_k$$ into $$N^p$$ cluster. Finally, take the average of the vectors within the cluster as the prototype vector $$PM(k,i)$$ . Is this understanding correct?

@Markin-Wang
Copy link
Owner

Markin-Wang commented Nov 15, 2024

In your parper you write:"we concatenate the visual and textual representations to form the cross-modal features r∈R1×D", but the formular below writes:" ou=Concate(oui(f),out)", Are they the same vector? and in this formular: PM(k,i)=1Nk,is∑j=0Nrjk,i what's the meaning of Nk,is ? I didn't find these details in the source code. It is my understand that you first extract visual and textual representation and concate them to form the cross-modal feature ru=Concat(oui(f),out), and grouped them into Nl sets{ Rk;0≤k≤Nl } according to the sample label, then applying K-Means on each Rk which split Rk into Np cluster. Finally, take the average of the vectors within the cluster as the prototype vector PM(k,i) . Is this understanding correct?

Hi, thank you for your interest to our work. o and r are both the cross-modal features. We use two chracters to refer the cross-modal features as o_u is associate with specific sample u, while r is used to index the cross-modal feature after clustering.

$N^s_{k,i}$ , sorry this is a typo here, it should be $N^d_{k,i}$.

You are right, the procedure of the prototype initialization is the same as you summarize.

Hope this information could help you figure out the problem.

Best Regards,
Jun

@CHENG-danyang
Copy link
Author

Your reply helped me a lot, and your work is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants