Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About LSD metric #8

Open
QA-MDT opened this issue Aug 10, 2024 · 9 comments
Open

About LSD metric #8

QA-MDT opened this issue Aug 10, 2024 · 9 comments

Comments

@QA-MDT
Copy link

QA-MDT commented Aug 10, 2024

Hi, I am now working on the evaluation on audio super metrics, and i am wondering whether the LSD metric lead to sub-optimal results?
For example, the following STFT-image consists of three systems(the first one is ground truth, and the following two ones are the comparation of two super resolution systems)
It may be obviously that the second one is better than the third, but it suffers a bad LSD.
As it is mentioned in AudioSR, we can also see that LSD score unmatch with subjective MOS score, so i am just wondering about the replacement or analysis of this metric? thanks a lot again for your excellent work.
.
image

@yoyolicoris
Copy link
Member

yoyolicoris commented Aug 11, 2024

Unless the methods are all comparable to each other, LSD is quite enough to indicate the relative performance differences.
For me, I don't think the second one looks better than the third, so I'm not surprised it has bad LSD.

@QA-MDT
Copy link
Author

QA-MDT commented Aug 12, 2024

Thanks for your response. I have some others little questions,
In my view, "MCG" is used for unconditional models such that score(x|y) = score(x) + score (y|x), and MCG is used to estimate score (y|x). So i am wondering why it can be used into conditional models such as "nu-wave2", this may be of my misunderstanding of this constraint. Thanks again and looking forward to your reply!

@QA-MDT
Copy link
Author

QA-MDT commented Aug 12, 2024

Unless the methods are all comparable to each other, LSD is quite enough to indicate the relative performance differences. For me, I don't think the second one looks better than the third, so I'm not surprised it has bad LSD.

Yes, however within subjective experienment and other object metrics such as Si-SNR, PSNR, SSIM, system 2 benifits system 3. thus in my opinion, i commit that LSD is a useful and accurate metric and system 2 hasn't gain good enough performance , however I don't think such point-to-point metrics are a good measure of super-resolution tasks. (For example, Noise points often occurs in the ultra-high frequency part of the spectrogram, which will significantly affect the judgment of model performance)

@yoyolicoris
Copy link
Member

Thanks for your response. I have some others little questions, In my view, "MCG" is used for unconditional models such that score(x|y) = score(x) + score (y|x), and MCG is used to estimate score (y|x). So i am wondering why it can be used into conditional models such as "nu-wave2", this may be of my misunderstanding of this constraint. Thanks again and looking forward to your reply!

Assuming the approximation of $p(x|y)$ is accurate enough, the conditional score after applying MCG becomes $\nabla p(x) + 2 \nabla p(y|x)$, with the emphasis more on fitting the likelihood function $p(y|x)$.
Prior works empirically show that it gets better results https://arxiv.org/abs/2207.12598 (see the classifier guidance section).

@QA-MDT
Copy link
Author

QA-MDT commented Aug 12, 2024

Thanks for your response. I have some others little questions, In my view, "MCG" is used for unconditional models such that score(x|y) = score(x) + score (y|x), and MCG is used to estimate score (y|x). So i am wondering why it can be used into conditional models such as "nu-wave2", this may be of my misunderstanding of this constraint. Thanks again and looking forward to your reply!

Assuming the approximation of p ( x | y ) is accurate enough, the conditional score after applying MCG becomes ∇ p ( x ) + 2 ∇ p ( y | x ) , with the emphasis more on fitting the likelihood function p ( y | x ) . Prior works empirically show that it gets better results https://arxiv.org/abs/2207.12598 (see the classifier guidance section).

Thank you for your thorough explanation!

@QA-MDT
Copy link
Author

QA-MDT commented Aug 12, 2024

sorry bother again, I am also confused about this section of code in "reverse_manifold".
Specifically, from your essay, i would write: mu -= lr*(g - F_h/2(g)). (1)
However found in your code : mu -= lr * g; (2)
while "mu -= F_h/2(mu); mu += F_h/2(z_t) * var_st[s] / alpha_st[s] + alpha[s] * c[s] * y_hat" is doing the inpainting right?
Thus i am confused about the mismatch between (1) and (2), thanks

@QA-MDT
Copy link
Author

QA-MDT commented Aug 12, 2024

I fall into another question, that is why you segment and overlapped the original audio waveforms when calculating the grad?
I found that your segment size is 144000 // 2 = 72000, while your training window size is 32768, do they have some correlationships?

@yoyolicoris
Copy link
Member

yoyolicoris commented Aug 12, 2024

sorry bother again, I am also confused about this section of code in "reverse_manifold". Specifically, from your essay, i would write: mu -= lr*(g - F_h/2(g)). (1) However found in your code : mu -= lr * g; (2) while "mu -= F_h/2(mu); mu += F_h/2(z_t) * var_st[s] / alpha_st[s] + alpha[s] * c[s] * y_hat" is doing the inpainting right? Thus i am confused about the mismatch between (1) and (2), thanks

The code mixes the inpainting and MCG for efficiency.
Basically, the steps mu -= lr * g and mu -= F_h/2(mu) combined are (1) + the first step for inpainting.

@yoyolicoris
Copy link
Member

I fall into another question, that is why you segment and overlapped the original audio waveforms when calculating the grad? I found that your segment size is 144000 // 2 = 72000, while your training window size is 32768, do they have some correlationships?

The numbers are set empirically and the main concern is the available GPU memory.
If you have more VRAM I think you can safely increase the segment size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants