Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

low Pearson correlation for COAD, READ, HCC #13

Open
NBitBuilder opened this issue Jun 28, 2024 · 8 comments
Open

low Pearson correlation for COAD, READ, HCC #13

NBitBuilder opened this issue Jun 28, 2024 · 8 comments

Comments

@NBitBuilder
Copy link

Thanks for sharing this valuable dataset with the community.

The Pearson correlation for COAD, READ, and HCC is nearly zero for all models. What's the point of including these data in the benchmark?

@guillaumejaume
Copy link
Collaborator

guillaumejaume commented Jun 28, 2024

Thanks for your interest in HEST-Benchmark. Including or not some cohorts, in particular HCC, has been discussed internally. Let me bring a couple of points:

(1) Performance will always vary due to H&E staining variations, and the use of different technologies (Visium vs Xenium).
(2) COAD and READ performance is currently around 0.15, similar to previous work, e.g., He et al., Nat BME, 2022.
(3) Within these datasets, performance between weak encoders and strong ones changes drastically, e.g., in READ: 0.038 with KimiaNet to 0.162 with UNI, showing that despite a low correlation, some information can be extracted. This also suggests that performance can be further improved with better features and/or classification models.
(4) HCC is a key cancer type and deserve to be included. It remains unclear to what extent performance can be improved. Time will tell.

Hope this brings interesting points. Feel free to share your thoughts below.

@NBitBuilder
Copy link
Author

Thank you very much! I learned a lot from your insightful comments.

@jinxixiang
Copy link

PAAD of hest-bench failed while loading.

I tried to reproduce your results, but the data-loading of 'PAAD' failed.

Specifically, in line

adata = load_adata(expr_path, genes=genes, barcodes=barcodes, normalize=args.normalize)

raise KeyError(

KeyError: "Values ['AMY2A', 'GATM', 'CFTR', 'CFB', 'FSTL3', 'PPY', 'MDM2', 'SFRP2', 'FBN1', 'TCIM', 'NTN4', 'GCG', 'DST', 'AQP8', 'COL5A2', 'PECAM1', 'CAVIN1', 'MS4A6A', 'GPRC5A', 'CTSK', 'SFRP4', 'THBS2', 'MYLK', 'FBLN1', 'PDGFRB', 'C1orf162', 'PMP22', 'BASP1', 'CD93', 'THY1', 'ASPN', 'LTBP2', 'ACTG2', 'MEST', 'EHF', 'INS', 'PROX1', 'GPX2', 'TFPI', 'MALL', 'FHL2'], from ['AMY2A', 'GATM', 'CFTR', 'CFB', 'VCAN', 'ANPEP', 'FSTL3', 'PPY', 'EPCAM', 'CXCL6', 'MDM2', 'SFRP2', 'CXCL2', 'FBN1', 'TCIM', 'NTN4', 'GCG', 'DST', 'AQP8', 'COL5A2', 'PECAM1', 'CAVIN1', 'MS4A6A', 'CXCR4', 'ACTA2', 'GPRC5A', 'CTSK', 'SFRP4', 'PTPRC', 'THBS2', 'MYLK', 'FBLN1', 'PDGFRB', 'AIF1', 'C1orf162', 'PMP22', 'BASP1', 'CD93', 'THY1', 'ASPN', 'LTBP2', 'ACTG2', 'MEST', 'EHF', 'INS', 'PROX1', 'GPX2', 'TFPI', 'MALL', 'FHL2'], are not valid obs/ var names or indices."

Projects IDC and PRAD worked smoothly.

I checked the local data compared with the ones in your HF repo; they are not broken.

Please give me some advice. Thanks!

@pauldoucet
Copy link
Collaborator

pauldoucet commented Jul 1, 2024

Hi, thanks for your interest in HEST!

It seems like the benchmark is picking up the wrong gene panel (mean_50genes.json instead of var_50genes.json). Can you attempt removing the mean_50genes.json panel in your PAAD directory please?

@jinxixiang
Copy link

Sure, it works now!

@pauldoucet
Copy link
Collaborator

Great!
Your OS is also probably picking mean_50genes.json by default for the other tasks. I'll make a quick pull request to change that

@pauldoucet
Copy link
Collaborator

Just fixed the bug in pull request #14 can you do a quick git pull before re-running the benchmark?
Thanks !

@jinxixiang
Copy link

Sure, Thank you for your update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants