Are cross-modal feature and cross-model representation vector same? #21

CHENG-danyang · 2024-11-13T12:41:37Z

In your parper you write:"we concatenate the visual and textual representations to form the cross-modal features $$r\in \mathbb{R} ^{1\times D}$$", but the formular below writes:" $$o_u=Concate(o_u^{i(f)},o_u^t)$$", Are they the same vector? and in this formular: $$PM(k,i)=\frac{1}{N_{k,i}^s}\sum_{j=0}^N r_j^{k,i}$$ what's the meaning of $$N_{k,i}^s$$ ? I didn't find these details in the source code.
It is my understand that you first extract visual and textual representation and concate them to form the cross-modal feature $$r_u=Concat(o_u^{i(f)},o^t_u)$$, and grouped them into $$N_l$$ sets{ $$R_k;0 \le k \le N_l$$ } according to the sample label, then applying K-Means on each $$R_k$$ which split $$R_k$$ into $$N^p$$ cluster. Finally, take the average of the vectors within the cluster as the prototype vector $$PM(k,i)$$ . Is this understanding correct?

Markin-Wang · 2024-11-15T14:19:49Z

In your parper you write:"we concatenate the visual and textual representations to form the cross-modal features r∈R1×D", but the formular below writes:" ou=Concate(oui(f),out)", Are they the same vector? and in this formular: PM(k,i)=1Nk,is∑j=0Nrjk,i what's the meaning of Nk,is ? I didn't find these details in the source code. It is my understand that you first extract visual and textual representation and concate them to form the cross-modal feature ru=Concat(oui(f),out), and grouped them into Nl sets{ Rk;0≤k≤Nl } according to the sample label, then applying K-Means on each Rk which split Rk into Np cluster. Finally, take the average of the vectors within the cluster as the prototype vector PM(k,i) . Is this understanding correct?

Hi, thank you for your interest to our work. o and r are both the cross-modal features. We use two chracters to refer the cross-modal features as o_u is associate with specific sample u, while r is used to index the cross-modal feature after clustering.

$N^s_{k,i}$ , sorry this is a typo here, it should be $N^d_{k,i}$.

You are right, the procedure of the prototype initialization is the same as you summarize.

Hope this information could help you figure out the problem.

Best Regards,
Jun

CHENG-danyang · 2024-11-17T09:20:15Z

Your reply helped me a lot, and your work is great.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are cross-modal feature and cross-model representation vector same? #21

Are cross-modal feature and cross-model representation vector same? #21

CHENG-danyang commented Nov 13, 2024 •

edited

Loading

Markin-Wang commented Nov 15, 2024 •

edited

Loading

CHENG-danyang commented Nov 17, 2024

Are cross-modal feature and cross-model representation vector same? #21

Are cross-modal feature and cross-model representation vector same? #21

Comments

CHENG-danyang commented Nov 13, 2024 • edited Loading

Markin-Wang commented Nov 15, 2024 • edited Loading

CHENG-danyang commented Nov 17, 2024

CHENG-danyang commented Nov 13, 2024 •

edited

Loading

Markin-Wang commented Nov 15, 2024 •

edited

Loading