Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best Practices for Sequential vs. Parallel Regression of Multiple Covariates in Harmony #263

Open
vertesy opened this issue Nov 5, 2024 · 0 comments

Comments

@vertesy
Copy link

vertesy commented Nov 5, 2024

Hello,

I'm using Seurat with Harmony for batch correction in my scRNA-seq analysis, and I have a question regarding the regression of multiple covariates.

Background:

I want to regress out three covariates from my data:

  • Library
  • SampleType
  • CellCyclePhase

Initially, I attempted to regress out all three covariates in parallel by concatenating the corresponding metadata columns, split the merged object, and providing that to Harmony. It fails at splitting, because of too small / empty categories.

Splittingcounts’, ‘datalayers. Not splittingscale.data. If you would like to split other layers, set in `layers` argument.
Error in validObject(object = object) : 
  invalid classAssay5object: Layers must be two-dimensional objects

I understand that small categories will also be a problem for correction, even if I fix the failing data split.

Not sure how I can solve this:

  1. Ignore some covariates
  2. Subset to SampleType 1, and keep covariates (Library, CellCyclePhase). Repeat for s.t.2. Suboptimal.
  3. Regress out cell cycle scores in ScaleData(), and provide covariates SampleType and Library to Harmony. (or variations thereof)
    1. One issue is that regression in ScaleData() works much less well then Harmony to remove differences.
    2. (related to Is Regressing Covariates in Both ScaleData and RunHarmony Redundant and Potentially Problematic? #262)
  4. Iterative / Sequential / Serial Harmony corrections.

I recall that the Harmony authors discussed a "serial Harmony" approach, where covariates are corrected sequentially rather than in parallel, but I haven't been able to re-find that discussion again.

My Questions:

  1. Is there a recommended practice for handling situations where (concatenating covariates leads to / there are) too many, and sparse categories?

    (other than don't do it)

  2. Can I legitimately overcome the small categories problem by sequential Harmony, and should result in equivalent results to parallel regression in Harmony (assuming both are possible)?

    • Could sequential regression help mitigate issues arising from sparse category combinations?
  3. How can I implement sequential regression of covariates in Harmony within Seurat?

    • Feed "harmony" reduction into RunHarmony() instead of "pca" at the 2nd and 3rd variable?
    • Are there recommended workflows or code examples for applying Harmony multiple times, each time correcting for a single covariate?
    • Do I need to adjust Harmony parameters, e.g Library has 25 categories, Phase has 3.

Additional Context:

Thank you for your time taken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant