-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk signature analysis #83
Comments
I think that we're seeing signal with this approach, and combined with a couple orthogonal approaches we've discussed already (see below), I think these results could fit in nicely into a larger story. Besides the analyses and future data collection efforts we discussed today (and outlined above), it would be great to get your opinions, so we can discuss them and so I can better manage/prioritize my time. Thanks! |
I think the most compelling finding would be: (1) find one signature, (2) reassure that it's real by interpreting the meaning of the features that compose it (qualitative analysis), (3) show it's robust across batches (which might benefit by trimming to as few features as possible), (4) show it can prospectively predict sensitivity/resistance (either binary resistant/sensitive or continuous, using EC50 for the latter). Note that (3) isn't strictly necessary if you can do (4); you could always include controls run in the same batch as the query cell lines if necessary. The bullets under robustness are a lot of work and not particularly crucial unless they help you to develop approaches that make (4) work well. As discussed in checkin today , If the WT parental line is not aligned well across batches to classify properly, that's a problem. I'd focus on normalization methods first before attempting (4). Showing the signature is similar across other classes of drugs or other drugs of the same class (your first set of bullets) is neat but not required for a complete story IMO. eLife has a "Research Advance" format that would suit that addendum if it turns out to be interesting. |
Thanks for these thoughts! They will be great to refer to when deciding next steps. I definitely agree that different normalization methods would help, and it is something that we can try next.
I dug a bit deeper into this by applying the CloneAE signature to batch 1 and batch 2 data. These are the only other batches to collect cloneA and cloneE data that I've processed so far. We're getting cloneA and cloneE but the wildtype parental line is a bit wonky. Next steps for bulk analysisNormalizationI will try sphering (aka whitening) the bulk profile data before input into the signature builder. Although the trick here is that sphering requires a negative control and we don't have a good one in these platemaps. We could try harmonizing the single cells before generating bulk profiles and then proceed. The downside here is that Harmony places the data in a non-CellProfiler feature space, which would be bad from the interpretation perspective. I can try the inverse-Harmony approach that @sMyn42 is assessing, but these are still uncharted waters. This exhausts the list of normalization/batch effect correction approaches the lab has used (at least recently). I could try other approaches outside our comfort zone, but this may not be the best use of time. Signature subsamplingWe know that the cloneAE signature is robust within batch (from training/test set perspective). Ignoring the fact that this could be a signature of clonal selection, I can still perform a systematic signature reduction experiment to test feature redundancy (and try to find a reduced signature) |
Could you use a subset of the parental WT line as the negative control? It IS a baseline of sorts, after all. Then the heldout parental WT samples can allow you to check if the alignment worked well (ideally you choose the heldout ones from particular plate locations or something, not just a literal random subset of the parental WTs). |
This would work for single cell profiles, but not for well-aggregated bulk (we only have three replicates). It could work with site-aggregated bulk (~27 pseudo-replicates), but we previously decided not to go with site-level aggregation in #70 There were two reasons to not use this approach: (1) we observed quite a bit of site-to-site variability (2) we've never done it before. I can see a path forward overcoming point 1 since we are trying a normalization approach that would adjust for site-to-site variability. |
In #82, I an analysis of bulk (aggregated) signatures from two compiled datasets. #58 is an initial attempt at this analysis, but was using earlier (and lower quality data).
I summarize the experiment immediately below, and then describe the results in more detail further below.
Summary
The Clone AE results are promising. The signature and method clearly work in both training and testing splits, and there appears to be some sort of dose response. What this dose response means biologically is unclear, but technically (in the resistant lines) it means that the signature features become less extreme in their ranking. This means that the absolute value of signature features are higher in the Wildtype_parental profiles.
The Four Clone signature applied to the Clone AE data is odd. The results (at least for the DMSO treated samples) are mostly outside the null, and the score is less extreme than the clone AE signature, but the sign is flipped! This could be a result of some weird programmatic anomaly in fitting linear models (I confirmed one thing that might do this isn't), a metadata label mixup in the four clone dataset, or the method isn't robust across batches.
The signatures applied to the four clone dataset (even the four clone signature) are less conclusive. I am not confident in these data in nearly the same way that I am about the Batch 8 profiles.
Next steps
Signature titration
The number of features in the signature is high. Since one goal of the project is to identify a smaller set of features to potentially use as a biomarker of drug resistance, I will perform a "signature titration" analysis in which I systematically add features (starting with the most significant), and quantify the average difference of test set
TotalScore
between sensitive and resistant clones. This approach will give us a way to select the minimal set of features required to separate the clone types.More data collection
We will work with the Rockefeller team to decide next steps in data collection. I see two additional data we could collect (note I have not yet processed batch 9 or 10 yet)
I would also like to double check the platemap metadata labels for batches 4, 5, 6, and 7
Data
Clone A/E
CloneA
,CloneE
WT_parental
variance_threshold
,correlation_threshold
,drop_na_columns
,blocklist
,drop_outliers
WT_parental
to two clones. The signature may include features representing clonal selection.Four Clone
WT002
,WT008
,WT009
,WT009
BZ001
,BZ008
,BZ017
,BZ018
WT_parental
variance_threshold
,correlation_threshold
,drop_na_columns
,blocklist
,drop_outliers
Signature generation
Procedure
For each dataset, I perform the following procedure:
four_clone
only;cloneAE
is one batch)cloneAE
dataset and in the linear model, testing an individual feature using the plates covariate is actually 6 comparisons!)Metadata_clone_type_indicator
covariate this is the "PreSignature".Metadata_Plate
andMetadata_batch
covariates.Volcano plots
These plots visualize feature significance for each linear model covariate
Clone AE
Click to show figure
Four Clone
Click to show figure
Result
The signatures contain many features, and we make a distinction between features "up" and features "down":
Apply signatures
Approach
Because we have two datasets and two signatures, I applied each signature to each dataset independently. I also apply each signature with 1,000 random permutations to define a null distribution.
Method
I use the
singscore
method Foroutan et al.. This is a "single sample" method to detect signature enrichment. It is a relatively simple, rank-based approach bounded between -1 and 1, where a score of 1 means that the sample is enriched for signature features.Results
Comparison 1: Clone AE Dataset - Clone AE Signature
Comparison 2: Clone AE Dataset - Four Clone Signature
Comparison 3: Four Clone Dataset - Clone AE Signature
Comparison 4: Four Clone Dataset - Four Clone Signature
The text was updated successfully, but these errors were encountered: