low Pearson correlation for COAD, READ, HCC #13

NBitBuilder · 2024-06-28T19:16:03Z

Thanks for sharing this valuable dataset with the community.

The Pearson correlation for COAD, READ, and HCC is nearly zero for all models. What's the point of including these data in the benchmark?

guillaumejaume · 2024-06-28T20:03:26Z

Thanks for your interest in HEST-Benchmark. Including or not some cohorts, in particular HCC, has been discussed internally. Let me bring a couple of points:

(1) Performance will always vary due to H&E staining variations, and the use of different technologies (Visium vs Xenium).
(2) COAD and READ performance is currently around 0.15, similar to previous work, e.g., He et al., Nat BME, 2022.
(3) Within these datasets, performance between weak encoders and strong ones changes drastically, e.g., in READ: 0.038 with KimiaNet to 0.162 with UNI, showing that despite a low correlation, some information can be extracted. This also suggests that performance can be further improved with better features and/or classification models.
(4) HCC is a key cancer type and deserve to be included. It remains unclear to what extent performance can be improved. Time will tell.

Hope this brings interesting points. Feel free to share your thoughts below.

NBitBuilder · 2024-06-28T22:42:08Z

Thank you very much! I learned a lot from your insightful comments.

jinxixiang · 2024-07-01T18:47:06Z

PAAD of hest-bench failed while loading.

I tried to reproduce your results, but the data-loading of 'PAAD' failed.

Specifically, in line

adata = load_adata(expr_path, genes=genes, barcodes=barcodes, normalize=args.normalize)

raise KeyError(
KeyError: "Values ['AMY2A', 'GATM', 'CFTR', 'CFB', 'FSTL3', 'PPY', 'MDM2', 'SFRP2', 'FBN1', 'TCIM', 'NTN4', 'GCG', 'DST', 'AQP8', 'COL5A2', 'PECAM1', 'CAVIN1', 'MS4A6A', 'GPRC5A', 'CTSK', 'SFRP4', 'THBS2', 'MYLK', 'FBLN1', 'PDGFRB', 'C1orf162', 'PMP22', 'BASP1', 'CD93', 'THY1', 'ASPN', 'LTBP2', 'ACTG2', 'MEST', 'EHF', 'INS', 'PROX1', 'GPX2', 'TFPI', 'MALL', 'FHL2'], from ['AMY2A', 'GATM', 'CFTR', 'CFB', 'VCAN', 'ANPEP', 'FSTL3', 'PPY', 'EPCAM', 'CXCL6', 'MDM2', 'SFRP2', 'CXCL2', 'FBN1', 'TCIM', 'NTN4', 'GCG', 'DST', 'AQP8', 'COL5A2', 'PECAM1', 'CAVIN1', 'MS4A6A', 'CXCR4', 'ACTA2', 'GPRC5A', 'CTSK', 'SFRP4', 'PTPRC', 'THBS2', 'MYLK', 'FBLN1', 'PDGFRB', 'AIF1', 'C1orf162', 'PMP22', 'BASP1', 'CD93', 'THY1', 'ASPN', 'LTBP2', 'ACTG2', 'MEST', 'EHF', 'INS', 'PROX1', 'GPX2', 'TFPI', 'MALL', 'FHL2'], are not valid obs/ var names or indices."

Projects IDC and PRAD worked smoothly.

I checked the local data compared with the ones in your HF repo; they are not broken.

Please give me some advice. Thanks!

pauldoucet · 2024-07-01T19:04:03Z

Hi, thanks for your interest in HEST!

It seems like the benchmark is picking up the wrong gene panel (mean_50genes.json instead of var_50genes.json). Can you attempt removing the mean_50genes.json panel in your PAAD directory please?

jinxixiang · 2024-07-01T19:09:52Z

Sure, it works now!

pauldoucet · 2024-07-01T19:12:42Z

Great!
Your OS is also probably picking mean_50genes.json by default for the other tasks. I'll make a quick pull request to change that

pauldoucet · 2024-07-01T19:30:14Z

Just fixed the bug in pull request #14 can you do a quick git pull before re-running the benchmark?
Thanks !

jinxixiang · 2024-07-01T19:57:47Z

Sure, Thank you for your update.

pauldoucet added the scientific-discussion label Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low Pearson correlation for COAD, READ, HCC #13

low Pearson correlation for COAD, READ, HCC #13

NBitBuilder commented Jun 28, 2024

guillaumejaume commented Jun 28, 2024 •

edited

Loading

NBitBuilder commented Jun 28, 2024

jinxixiang commented Jul 1, 2024

pauldoucet commented Jul 1, 2024 •

edited

Loading

jinxixiang commented Jul 1, 2024

pauldoucet commented Jul 1, 2024

pauldoucet commented Jul 1, 2024

jinxixiang commented Jul 1, 2024

low Pearson correlation for COAD, READ, HCC #13

low Pearson correlation for COAD, READ, HCC #13

Comments

NBitBuilder commented Jun 28, 2024

guillaumejaume commented Jun 28, 2024 • edited Loading

NBitBuilder commented Jun 28, 2024

jinxixiang commented Jul 1, 2024

pauldoucet commented Jul 1, 2024 • edited Loading

jinxixiang commented Jul 1, 2024

pauldoucet commented Jul 1, 2024

pauldoucet commented Jul 1, 2024

jinxixiang commented Jul 1, 2024

guillaumejaume commented Jun 28, 2024 •

edited

Loading

pauldoucet commented Jul 1, 2024 •

edited

Loading