About LSD metric #8

QA-MDT · 2024-08-10T16:19:39Z

Hi, I am now working on the evaluation on audio super metrics, and i am wondering whether the LSD metric lead to sub-optimal results?
For example, the following STFT-image consists of three systems(the first one is ground truth, and the following two ones are the comparation of two super resolution systems)
It may be obviously that the second one is better than the third, but it suffers a bad LSD.
As it is mentioned in AudioSR, we can also see that LSD score unmatch with subjective MOS score, so i am just wondering about the replacement or analysis of this metric? thanks a lot again for your excellent work.
.

yoyolicoris · 2024-08-11T10:16:40Z

Unless the methods are all comparable to each other, LSD is quite enough to indicate the relative performance differences.
For me, I don't think the second one looks better than the third, so I'm not surprised it has bad LSD.

QA-MDT · 2024-08-12T03:52:46Z

Thanks for your response. I have some others little questions,
In my view, "MCG" is used for unconditional models such that score(x|y) = score(x) + score (y|x), and MCG is used to estimate score (y|x). So i am wondering why it can be used into conditional models such as "nu-wave2", this may be of my misunderstanding of this constraint. Thanks again and looking forward to your reply!

QA-MDT · 2024-08-12T03:57:45Z

Unless the methods are all comparable to each other, LSD is quite enough to indicate the relative performance differences. For me, I don't think the second one looks better than the third, so I'm not surprised it has bad LSD.

Yes, however within subjective experienment and other object metrics such as Si-SNR, PSNR, SSIM, system 2 benifits system 3. thus in my opinion, i commit that LSD is a useful and accurate metric and system 2 hasn't gain good enough performance , however I don't think such point-to-point metrics are a good measure of super-resolution tasks. (For example, Noise points often occurs in the ultra-high frequency part of the spectrogram, which will significantly affect the judgment of model performance)

yoyolicoris · 2024-08-12T07:08:24Z

Thanks for your response. I have some others little questions, In my view, "MCG" is used for unconditional models such that score(x|y) = score(x) + score (y|x), and MCG is used to estimate score (y|x). So i am wondering why it can be used into conditional models such as "nu-wave2", this may be of my misunderstanding of this constraint. Thanks again and looking forward to your reply!

Assuming the approximation of $p(x|y)$ is accurate enough, the conditional score after applying MCG becomes $\nabla p(x) + 2 \nabla p(y|x)$, with the emphasis more on fitting the likelihood function $p(y|x)$.
Prior works empirically show that it gets better results https://arxiv.org/abs/2207.12598 (see the classifier guidance section).

QA-MDT · 2024-08-12T07:41:36Z

Thanks for your response. I have some others little questions, In my view, "MCG" is used for unconditional models such that score(x|y) = score(x) + score (y|x), and MCG is used to estimate score (y|x). So i am wondering why it can be used into conditional models such as "nu-wave2", this may be of my misunderstanding of this constraint. Thanks again and looking forward to your reply!

Assuming the approximation of p ( x | y ) is accurate enough, the conditional score after applying MCG becomes ∇ p ( x ) + 2 ∇ p ( y | x ) , with the emphasis more on fitting the likelihood function p ( y | x ) . Prior works empirically show that it gets better results https://arxiv.org/abs/2207.12598 (see the classifier guidance section).

Thank you for your thorough explanation！

QA-MDT · 2024-08-12T07:54:52Z

sorry bother again, I am also confused about this section of code in "reverse_manifold".
Specifically, from your essay, i would write: mu -= lr*(g - F_h/2(g)). (1)
However found in your code : mu -= lr * g; (2)
while "mu -= F_h/2(mu); mu += F_h/2(z_t) * var_st[s] / alpha_st[s] + alpha[s] * c[s] * y_hat" is doing the inpainting right?
Thus i am confused about the mismatch between (1) and (2), thanks

QA-MDT · 2024-08-12T07:59:51Z

I fall into another question, that is why you segment and overlapped the original audio waveforms when calculating the grad?
I found that your segment size is 144000 // 2 = 72000, while your training window size is 32768, do they have some correlationships?

yoyolicoris · 2024-08-12T13:08:54Z

sorry bother again, I am also confused about this section of code in "reverse_manifold". Specifically, from your essay, i would write: mu -= lr*(g - F_h/2(g)). (1) However found in your code : mu -= lr * g; (2) while "mu -= F_h/2(mu); mu += F_h/2(z_t) * var_st[s] / alpha_st[s] + alpha[s] * c[s] * y_hat" is doing the inpainting right? Thus i am confused about the mismatch between (1) and (2), thanks

The code mixes the inpainting and MCG for efficiency.
Basically, the steps mu -= lr * g and mu -= F_h/2(mu) combined are (1) + the first step for inpainting.

yoyolicoris · 2024-08-12T13:10:37Z

I fall into another question, that is why you segment and overlapped the original audio waveforms when calculating the grad? I found that your segment size is 144000 // 2 = 72000, while your training window size is 32768, do they have some correlationships?

The numbers are set empirically and the main concern is the available GPU memory.
If you have more VRAM I think you can safely increase the segment size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About LSD metric #8

About LSD metric #8

QA-MDT commented Aug 10, 2024

yoyolicoris commented Aug 11, 2024 •

edited

Loading

QA-MDT commented Aug 12, 2024

QA-MDT commented Aug 12, 2024

yoyolicoris commented Aug 12, 2024

QA-MDT commented Aug 12, 2024

QA-MDT commented Aug 12, 2024

QA-MDT commented Aug 12, 2024

yoyolicoris commented Aug 12, 2024 •

edited

Loading

yoyolicoris commented Aug 12, 2024

About LSD metric #8

About LSD metric #8

Comments

QA-MDT commented Aug 10, 2024

yoyolicoris commented Aug 11, 2024 • edited Loading

QA-MDT commented Aug 12, 2024

QA-MDT commented Aug 12, 2024

yoyolicoris commented Aug 12, 2024

QA-MDT commented Aug 12, 2024

QA-MDT commented Aug 12, 2024

QA-MDT commented Aug 12, 2024

yoyolicoris commented Aug 12, 2024 • edited Loading

yoyolicoris commented Aug 12, 2024

yoyolicoris commented Aug 11, 2024 •

edited

Loading

yoyolicoris commented Aug 12, 2024 •

edited

Loading