Skip to content

STAR 2.7.8a --- 2021/02/20 ::: Major STARsolo updates

Compare
Choose a tag to compare
@alexdobin alexdobin released this 20 Feb 22:43
· 150 commits to master since this release

This release contains many major and minor STARsolo upgrades, bug fixes, and behavior changes.
STARsolo detailed description: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md

Major new features:

  • --runMode soloCellFiltering option for cell filtering (calling) of the raw count matrix, without re-mapping
  • Input from SAM/BAM for STARsolo, with options --soloInputSAMattrBarcodeSeq and --soloInputSAMattrBarcodeQual to specify SAM tags for the barcode read sequence and qualities
  • --clipAdapterType CellRanger4 option for 5' TSO adapter and 3' polyA-tail clipping of the reads to better match CellRanger >= 4.0.0 mapping results
  • --soloBarcodeMate to support scRNA-seq protocols in which one of the paired-end mates contains both barcode sequence and cDNA (e.g. 10X 5' protocol)

New options:

  • --soloCellFilter EmptyDrops_CR option for cell filtering (calling) nearly identical to that of CellRanger 3 and 4
  • --readFilesSAMattrKeep to specify which SAM attributes from the input SAM to keep in the output
  • --soloUMIdedup 1M_Directional_UMItools option matching the "directional" method in UMI-tools Smith, Heger and Sudbery (Genome Research 2017)
  • --soloUMIdedup NoDedup option for counting reads per gene, i.e. no UMI deduplication
  • --soloUMIdedup 1MM_CR option for 1 mismatch UMI deduplication similar to CellRanger >= 3.0
  • --soloUMIfiltering MultiGeneUMI_CR option filters lower-count UMIs that map to more than one gene matching CellRanger >= 3.0
  • --soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts options which allows 1MM multimatching to WL for barcodes with N-bases (to better match CellRanger >= 3.0)

Changes in behavior:

  • The UMI deduplication/correction specified in --soloUMIdedup is used for statistics output, filtering, and UB tag in BAM output.
  • If UMI or CB are not defined, the UB and CB tags in BAM output will contain "-" (instead of missing these tags).
  • For --soloUMIfiltering MultiGeneUMI option, the reads with multi-gene UMIs will have UB tag "-" in BAM output.
  • Different --soloUMIdedup counts, if requested, are recorded in separate .mtx files.
  • Cell-filtered Velocyto matrices are generated using Gene cell filtering.
  • Velocyto spliced/unspliced/ambiguous counts are reported in separate .mtx files.
  • Read clipping options --clip* now require specifying the values for all read mates, even if they are identical.

Bugfixes:

  • Issue #1107: fixed a bug causing seg-fault for --soloType SmartSeq with only one (pair of) fastq file(s)
  • Issue #1129: fixed an issue with short barcode sequences and --soloBarcodeReadLength 0
  • Issue #796: Fixed a problem with GX/GN tag output for --soloFeatures GeneFull option
  • PR: #1012: fix the bug with --soloCellFilter TopCells option
  • Fixed an issue that was causing slightly underestimated value of Q30 'Bases in RNA read' in Solo.out/Gene/Summary.csv