Skip to content

Commit

Permalink
Merge pull request #1623 from milaboratory/feature-imputed
Browse files Browse the repository at this point in the history
imputed features for trees
  • Loading branch information
gnefedev authored Apr 22, 2024
2 parents 1852330 + 3e02b2d commit 88d34d8
Show file tree
Hide file tree
Showing 7 changed files with 79 additions and 49 deletions.
2 changes: 1 addition & 1 deletion build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ val toObfuscate: Configuration by configurations.creating {
val obfuscationLibs: Configuration by configurations.creating


val mixcrAlgoVersion = "4.6.0-87-fix-local-preset-on-windows"
val mixcrAlgoVersion = "4.6.0-93-feature-imputed"
// may be blank (will be inherited from mixcr-algo)
val milibVersion = ""
// may be blank (will be inherited from mixcr-algo or milib)
Expand Down
3 changes: 3 additions & 0 deletions changelogs/v4.6.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
- Consensus assembly in `assemble` now is performed separately for each chain. This allows to prevent effects from
different expression levels on the consensus assembly algorithm. This change is specifically important for single-cell
presets with cell-level assembly (most of the MiXCR presets for single-cell data).
- Export of trees and tree nodes now support imputed features

## 🛠️ Minor improvements & fixes

Expand Down Expand Up @@ -33,6 +34,8 @@
- Command `groupClones` was renamed to `assembleCells`. Old name is working, but it's hidden from help. Also report and
output file names in `analyze` step were renamed accordingly.
- Fixed calculation of germline for `VCDR3Part` and `JCDR3Part` in case of indels inside CDR3
- Fixed export of trees if data assembled by a feature with reference point having offset
- Export of `VJJunction gemline` in `shmTrees` exports now export `mrca` as most plausible content

## New Presets

Expand Down
8 changes: 6 additions & 2 deletions itests/case-buld_trees_on_data_with_a_hole.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ for filename in $FILES; do
R1=${id}_R1.fastq.gz
R2=${id}_R2.fastq.gz

# skip FR1 abd CDR2
mixcr analyze --verbose --assemble-clonotypes-by '[{CDR1Begin:CDR2Begin},{FR3Begin:FR4End}]' mikelov-et-al-2021 trees_samples/$R1 trees_samples/$R2 $id
# skip part of FR1 and whole CDR2
mixcr analyze --verbose --assemble-clonotypes-by '[{CDR1Begin(-1):CDR2Begin},{FR3Begin:FR4End}]' mikelov-et-al-2021 trees_samples/$R1 trees_samples/$R2 $id
done

mixcr findAlleles \
Expand All @@ -51,8 +51,12 @@ mixcr findShmTrees \
$(ls alleles/*.clns) trees/result.shmt

mixcr exportShmTrees trees/result.shmt trees/trees.tsv
# test imputed
mixcr exportShmTrees -treeId -nFeatureImputed VDJRegion mrca -allNFeaturesImputed mrca -aaFeatureImputed VDJRegion mrca -allAAFeaturesImputed mrca trees/result.shmt tree/trees.imputed.tsv

mixcr exportShmTreesWithNodes trees/result.shmt trees/trees_with_nodes.tsv
# test imputed
mixcr exportShmTreesWithNodes -treeId -nodeId -nFeatureImputed VDJRegion mrca -allNFeaturesImputed mrca -aaFeatureImputed VDJRegion mrca -allAAFeaturesImputed mrca trees/result.shmt trees/trees_with_nodes.imputed.tsv

mixcr exportPlots shmTrees trees/result.shmt trees/plots.pdf

Expand Down
30 changes: 24 additions & 6 deletions regression/cli-help/exportShmSingleCellTrees.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,18 @@ Usage: mixcr exportShmSingleCellTrees [--include-one-chain-trees] [--only-observ
<gene_feature> [(germline|mrca|parent)]]... [-allNFeatures
[(germline|mrca|parent)]]... [-aaFeature <gene_feature>
[(germline|mrca|parent)]]... [-allAAFeatures
[(germline|mrca|parent)]]... [-nLength <gene_feature[,
gene_feature]...> [(germline|mrca|parent)]]... [-allNLength
[(germline|mrca|parent)]]... [-aaLength <gene_feature[,
gene_feature]...> [(germline|mrca|parent)]]... [-allAALength
[(germline|mrca|parent)]]... [-nMutations <gene_feature>
[(germline|mrca|parent)]
[(germline|mrca|parent)]]... [-nFeatureImputed <gene_feature>
[(germline|mrca|parent)]]... [-allNFeaturesImputed
[<from_reference_point> <to_reference_point>]
[(germline|mrca|parent)]]... [-aaFeatureImputed
<gene_feature> [(germline|mrca|parent)]]...
[-allAAFeaturesImputed [<from_reference_point>
<to_reference_point>] [(germline|mrca|parent)]]... [-nLength
<gene_feature[,gene_feature]...> [(germline|mrca|parent)]]...
[-allNLength [(germline|mrca|parent)]]... [-aaLength
<gene_feature[,gene_feature]...> [(germline|mrca|parent)]]...
[-allAALength [(germline|mrca|parent)]]... [-nMutations
<gene_feature> [(germline|mrca|parent)]
[(substitutions|indels|inserts|deletions)]]...
[-allNMutations [(germline|mrca|parent)]
[(substitutions|indels|inserts|deletions)]]...
Expand Down Expand Up @@ -150,6 +156,18 @@ Possible fields to export
Export amino acid sequences for all covered gene features.
If first arg is omitted, then feature will be printed for current
node. Otherwise - for corresponding `parent`, `germline` or `mrca`.
-nFeatureImputed <gene_feature> [(germline|mrca|parent)]
Export nucleotide sequence using letters from germline (marked
lowercase) for uncovered regions.
-allNFeaturesImputed [<from_reference_point> <to_reference_point>] [(germline|mrca|parent)]
Export nucleotide sequences for all covered gene features. By default,
boundaries will be `FR1Begin FR4End`.
-aaFeatureImputed <gene_feature> [(germline|mrca|parent)]
Export amino acid sequence using letters from germline (marked
lowercase) for uncovered regions.
-allAAFeaturesImputed [<from_reference_point> <to_reference_point>] [(germline|mrca|parent)]
Export amino acid sequences for all covered gene features. By default,
boundaries will be `FR1Begin FR4End`.
-nLength <gene_feature[,gene_feature]...> [(germline|mrca|parent)]
Export count of nucleotides of specified gene feature.
If second arg is omitted, then length will be printed for current
Expand Down
25 changes: 21 additions & 4 deletions regression/cli-help/exportShmTrees.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,15 @@ Usage: mixcr exportShmTrees [--filter-min-nodes <n>] [--filter-min-height <n>] [
[-chains] [-treeHeight] [-vHit] [-jHit] [-vGene] [-jGene] [-vFamily]
[-jFamily] [-nFeature <gene_feature> (germline|mrca)]... [-allNFeatures
(germline|mrca)]... [-aaFeature <gene_feature> (germline|mrca)]...
[-allAAFeatures (germline|mrca)]... [--force-overwrite] [--no-warnings]
[--verbose] [--help] [[--filter-in-feature <gene_feature>]
[--pattern-max-errors <n>] (--filter-aa-pattern <pattern> |
--filter-nt-pattern <pattern>)] trees.shmt [trees.tsv]
[-allAAFeatures (germline|mrca)]... [-nFeatureImputed <gene_feature>
(germline|mrca)]... [-allNFeaturesImputed [<from_reference_point>
<to_reference_point>] (germline|mrca)]... [-aaFeatureImputed
<gene_feature> (germline|mrca)]... [-allAAFeaturesImputed
[<from_reference_point> <to_reference_point>] (germline|mrca)]...
[--force-overwrite] [--no-warnings] [--verbose] [--help]
[[--filter-in-feature <gene_feature>] [--pattern-max-errors <n>]
(--filter-aa-pattern <pattern> | --filter-nt-pattern <pattern>)] trees.
shmt [trees.tsv]
Export SHMTree as a table with a row for every SHM root in a table (single row if no single cell
data)
trees.shmt Input file produced by 'findShmTrees' command.
Expand Down Expand Up @@ -79,3 +84,15 @@ Possible fields to export
type
-allAAFeatures (germline|mrca)
Export nucleotide sequences for all covered gene features.
-nFeatureImputed <gene_feature> (germline|mrca)
Export nucleotide sequence using letters from germline (marked
lowercase) for uncovered regions.
-allNFeaturesImputed [<from_reference_point> <to_reference_point>] (germline|mrca)
Export nucleotide sequences for all covered gene features. By default,
boundaries will be `FR1Begin FR4End`.
-aaFeatureImputed <gene_feature> (germline|mrca)
Export amino acid sequence using letters from germline (marked
lowercase) for uncovered regions.
-allAAFeaturesImputed [<from_reference_point> <to_reference_point>] (germline|mrca)
Export amino acid sequences for all covered gene features. By default,
boundaries will be `FR1Begin FR4End`.
48 changes: 18 additions & 30 deletions regression/cli-help/exportShmTreesWithNodes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,14 @@ Usage: mixcr exportShmTreesWithNodes [--split-by-tags (Molecule|Cell|Sample)] [-
[-allQFeatures [<from_reference_point>
<to_reference_point>]]... [-aaFeature <gene_feature>
[(germline|mrca|parent)]]... [-allAAFeatures
[(germline|mrca|parent)]]... [-nFeatureImputed
<gene_feature>]... [-allNFeaturesImputed
[<from_reference_point> <to_reference_point>]]...
[-aaFeatureImputed <gene_feature>]... [-allAAFeaturesImputed
[<from_reference_point> <to_reference_point>]]...
[-minFeatureQuality <gene_feature>]... [-allMinFeaturesQuality
[(germline|mrca|parent)]]... [-nFeatureImputed <gene_feature>
[(germline|mrca|parent)]]... [-allNFeaturesImputed
[<from_reference_point> <to_reference_point>]
[(germline|mrca|parent)]]... [-aaFeatureImputed <gene_feature>
[(germline|mrca|parent)]]... [-allAAFeaturesImputed
[<from_reference_point> <to_reference_point>]
[(germline|mrca|parent)]]... [-minFeatureQuality
<gene_feature>]... [-allMinFeaturesQuality
[<from_reference_point> <to_reference_point>]]...
[-allNFeaturesWithMinQuality [<from_reference_point>
<to_reference_point>]]... [-allNFeaturesImputedWithMinQuality
Expand Down Expand Up @@ -235,32 +237,18 @@ Possible fields to export
Export amino acid sequences for all covered gene features.
If first arg is omitted, then feature will be printed for current
node. Otherwise - for corresponding `parent`, `germline` or `mrca`.
-nFeatureImputed <gene_feature>
Export nucleotide sequence of specified gene feature using letters
from germline (marked lowercase) for uncovered regions (only for
nodes with clones)
-allNFeaturesImputed [<from_reference_point> <to_reference_point>]
-nFeatureImputed <gene_feature> [(germline|mrca|parent)]
Export nucleotide sequence using letters from germline (marked
lowercase) for uncovered regions for all gene features between
specified reference points (in separate columns).
For example, `-allNFeaturesImputed FR3Begin FR4End` will export
`-nFeatureImputed FR3`, `-nFeatureImputed CDR3`, `-nFeatureImputed
FR4`.
By default, boundaries will be got from analysis parameters if
possible or `FR1Begin FR4End` otherwise. (only for nodes with clones)
-aaFeatureImputed <gene_feature>
Export amino acid sequence of specified gene feature using letters
from germline (marked lowercase) for uncovered regions (only for
nodes with clones)
-allAAFeaturesImputed [<from_reference_point> <to_reference_point>]
lowercase) for uncovered regions.
-allNFeaturesImputed [<from_reference_point> <to_reference_point>] [(germline|mrca|parent)]
Export nucleotide sequences for all covered gene features. By default,
boundaries will be `FR1Begin FR4End`.
-aaFeatureImputed <gene_feature> [(germline|mrca|parent)]
Export amino acid sequence using letters from germline (marked
lowercase) for uncovered regions for all gene features between
specified reference points (in separate columns).
For example, `-allAAFeaturesImputed FR3Begin FR4End` will export
`-aaFeatureImputed FR3`, `-aaFeatureImputed CDR3`,
`-aaFeatureImputed FR4`.
By default, boundaries will be got from analysis parameters if
possible or `FR1Begin FR4End` otherwise. (only for nodes with clones)
lowercase) for uncovered regions.
-allAAFeaturesImputed [<from_reference_point> <to_reference_point>] [(germline|mrca|parent)]
Export amino acid sequences for all covered gene features. By default,
boundaries will be `FR1Begin FR4End`.
-minFeatureQuality <gene_feature>
Export minimal quality of specified gene feature (only for nodes with
clones)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2014-2023, MiLaboratories Inc. All Rights Reserved
* Copyright (c) 2014-2024, MiLaboratories Inc. All Rights Reserved
*
* Before downloading or accessing the software, please read carefully the
* License Agreement available at:
Expand Down Expand Up @@ -66,7 +66,7 @@ public void test1() throws Exception {
assemble.cloneSet,
assembleFullSeq.cloneSet)) {
for (VDJCObject o : it) {
VDJCObject.CaseSensitiveNucleotideSequence seq = o.getIncompleteFeature(feature);
CaseSensitiveNucleotideSequence seq = o.getIncompleteFeature(feature);
if (seq == null)
continue;

Expand Down Expand Up @@ -109,7 +109,7 @@ public void test3() throws Exception {

RunMiXCR.AlignResult align = RunMiXCR.align(params);
VDJCAlignments al = align.alignments.get(0);
VDJCObject.CaseSensitiveNucleotideSequence seq = al.getIncompleteFeature(VDJRegion);
CaseSensitiveNucleotideSequence seq = al.getIncompleteFeature(VDJRegion);
assertTrue(seq.toString().contains(al.getFeature(CDR3).getSequence().toString().toUpperCase()));
}

Expand Down Expand Up @@ -137,7 +137,7 @@ public void test5() throws Exception {
for (int i = 0; i < align.alignments.size(); ++i) {
VDJCAlignments al = align.alignments.get(i);
NSequenceWithQuality cdr3 = al.getFeature(CDR3);
VDJCObject.CaseSensitiveNucleotideSequence seq = al.getIncompleteFeature(VDJRegion);
CaseSensitiveNucleotideSequence seq = al.getIncompleteFeature(VDJRegion);
if (cdr3 != null && seq == null) {
assertTrue(
al.getBestHit(GeneType.Variable).getAlignment(0).getSequence2Range().getFrom() != 0
Expand Down Expand Up @@ -194,9 +194,9 @@ public void test7() throws Exception {
for (int i = 0; i < assemble.cloneSet.size(); i++) {

VDJCObject cl = assemble.cloneSet.get(i);
VDJCObject.CaseSensitiveNucleotideSequence f = cl.getIncompleteFeature(L2);
CaseSensitiveNucleotideSequence f = cl.getIncompleteFeature(L2);

NucleotideSequence lSeq = f.getSeq()[0];
NucleotideSequence lSeq = f.getSequence(0);
NucleotideSequence germ = cl.getBestHit(GeneType.Variable).getGene().getFeature(L2);
assertEquals(germ, lSeq);
}
Expand Down

0 comments on commit 88d34d8

Please sign in to comment.