Releases: alexdobin/STAR
Alpha release: bug fix
Fixed another seg-fault issue introduced in 2.7.10a
Alpha release: bug fix
- Issue #1469: fixed seg-fault introduced in 2.7.10a.
STAR 2.7.10a --- 2021/01/14 ::: New features, behavior changes and bug fixes
STAR 2.7.10a --- 2021/01/14 ::: New features, behavior changes and bug fixes
New options and features:
- Implemented --soloCellReadStats Standard option to output read statistics for each cell barcode.
- Allow to define --clip5pAdapterSeq with --clipAdapterType CellRanger4 option.
- Implemented --soloCBmatchWLtype ED2 to allow mismatches and one insertion+deletion (edit distance <=2) for --soloType CB_UMI_Complex.
- Implemented Solo BAM tags gx gn: output ';'-separated gene IDs and names for both unique- and multi-gene reads. Note that GX/GN tags are used to output gene ID/name for unique-gene reads.
- Implemented --soloFeatures GeneFull_ExonOverIntron GeneFull_Ex50pAS options which prioritize exonic over intronic overlaps for pre-mRNA counting.
- Added script extras/scripts/soloCountMatrixFromBAM.awk to re-create Solo count matrix from the BAM output.
Changes in behavior:
- Changed --soloType CB_samTagOut behavior: if barcode cennot be matched to the passlist, CB:Z:- will be recorded (previously CB tag was absent for such reads).
- Changed Solo summary statistics outputs in Barcodes.stats and Features.stats files.
- Changed Solo BAM tags GX GN behavior: for missing values, "-" is output instead of omitting the tag.
- Changed Solo BAM tags output for multiple --soloFeatures: now the first feature on the list is used for GX,GN,XB,UB tags.
- Changed Solo SJ behavior: it no longer depends on the whether the alignment is concordant to a Gene.
- Fixed a bug that resulted in slightly different solo counts if --soloFeatures Gene and GeneFull were used together with --soloCBmatchWLtype 1MM_multi_pseudocounts option.
Bug fixes
- PR #1425: Assign supplementary alignment to correct mate when mates fully overlap. Many thanks to Sebastian @suhrig for resolving this problem in the chimeric detection.
- Fixed a bug introduced in 2.7.9a for --quantMode TranscriptomeSAM output that resulted in both mapped and unmapped output for some reads. Many thanks to Diane Trout (@caltech) for helping to track this bug.
- Issue #1223: fixed the N_unmapped value reported in ReadsPerGene.out.tab. The single-end (i.e. partially mapped alignment are not excluded from N_unmapped.
- Issues #535, #1350: fixed a long-standing problem that resulted in a seg-fault whem mapping to the rabbit genome.
- Issue #1316: fixed the seg-fault which occurred if --soloType CB_samTagOut and --soloCBwhitelist None are used together.
- Issue #1177: throw an error in case the BAM file does not contain NH and AS tags for duplication removal jobs (--runMode inputAlignmentsFromBAM --bamRemoveDuplicatesType UniqueIdenticalNotMulti).
- Issue #1262: fixed the bug that prevented EM matrix output when only EM option is specified in --soloMultiMappers.
- Issue #1230: fixed the bug that caused seg-faults for --runMode soloCellFiltering runs.
STAR 2.7.9a --- 2021/05/05 ::: STARsolo: multi-gene reads
Major updates:
- STARsolo can perform counting of multi-gene (multi-mapping) reads with --soloMultiMappers EM [Uniform Rescue PropUnqiue] options.
- PR #1163: SIMDe takes care of correct SIMD extensions based on -m g++ flag: compilation option CXXFLAGS_SIMD is preset to -mavx2, but can be to the desired target architecture. Many thanks to Michael R. Crusoe @mr-c, Evan Nemerson @nemequ and Steffen Möller @smoe !
New options and features:
- New option: --soloUMIfiltering MultiGeneUMI_All to filter out all UMIs mapping to multiple genes (for uniquely mapping reads)
- New script extras/scripts/calcUMIperCell.awk to calculate total number of UMIs per cell and filtering status from STARsolo matrix.mtx
- New option: --outSJtype None to omit outputting splice junctions to SJ.out.tab
- Simple script to convert BED spliced junctions (SJ.out.tab) to BED12 for UCSC display: extras/scripts/sjBED12.awk
- PR #1164: SOURCE_DATE_EPOCH to make the build more reproducible
- PR #1157: print STAR command line and version information to stdout
Changes in behavior:
- Minor changes to statistics output (Features.csv and Summary.csv) to accomodate multimappers.
- Modified option: ---limitIObufferSize now requires two numbers - separate sizes for input and output buffers
Bug fixes
- PR #1156: clean opal/opal.o
- Issue #1166: seg-fault for STARsolo --soloCBwhitelist None (no whitelist) with barcodes longer than 16b
- Issue #1167: STARsolo CR/UR SAM tags are scrambled in TranscriptomeSAM file Aligned.toTranscriptome.out.bam. This bug appeared in 2.7.7a.
- Issue #1177: Added file checks for the --inputBAMfile .
- Issue #1180: Output the actual number of alignments in NH attributes even if --outSAMmultNmax is set to a smaller value.
- Issue #1190: Allow GX/GN output for non-STARsolo runs.
- Issue #1220: corrupt SAM/BAM files for --outFilterType BySJout. The bug was introduced with the chnages in 2.7.7a.
- Issue #1211: scrambled CB tags in BAM output for --soloCBwhitelist None --soloFeatures Gene GeneFull.
- Fixed a bug causing seg-faults with --clipAdapterType CellRanger4 option.
STAR 2.7.8a --- 2021/02/20 ::: Major STARsolo updates
This release contains many major and minor STARsolo upgrades, bug fixes, and behavior changes.
STARsolo detailed description: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md
Major new features:
--runMode soloCellFiltering
option for cell filtering (calling) of the raw count matrix, without re-mapping- Input from SAM/BAM for STARsolo, with options
--soloInputSAMattrBarcodeSeq
and--soloInputSAMattrBarcodeQual
to specify SAM tags for the barcode read sequence and qualities --clipAdapterType CellRanger4
option for 5' TSO adapter and 3' polyA-tail clipping of the reads to better match CellRanger >= 4.0.0 mapping results--soloBarcodeMate
to support scRNA-seq protocols in which one of the paired-end mates contains both barcode sequence and cDNA (e.g. 10X 5' protocol)
New options:
--soloCellFilter EmptyDrops_CR
option for cell filtering (calling) nearly identical to that of CellRanger 3 and 4--readFilesSAMattrKeep
to specify which SAM attributes from the input SAM to keep in the output--soloUMIdedup 1M_Directional_UMItools
option matching the "directional" method in UMI-tools Smith, Heger and Sudbery (Genome Research 2017)--soloUMIdedup NoDedup
option for counting reads per gene, i.e. no UMI deduplication--soloUMIdedup 1MM_CR
option for 1 mismatch UMI deduplication similar to CellRanger >= 3.0--soloUMIfiltering MultiGeneUMI_CR
option filters lower-count UMIs that map to more than one gene matching CellRanger >= 3.0--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts
options which allows 1MM multimatching to WL for barcodes with N-bases (to better match CellRanger >= 3.0)
Changes in behavior:
- The UMI deduplication/correction specified in
--soloUMIdedup
is used for statistics output, filtering, and UB tag in BAM output. - If UMI or CB are not defined, the UB and CB tags in BAM output will contain "-" (instead of missing these tags).
- For
--soloUMIfiltering MultiGeneUMI
option, the reads with multi-gene UMIs will have UB tag "-" in BAM output. - Different
--soloUMIdedup
counts, if requested, are recorded in separate .mtx files. - Cell-filtered Velocyto matrices are generated using Gene cell filtering.
- Velocyto spliced/unspliced/ambiguous counts are reported in separate .mtx files.
- Read clipping options
--clip*
now require specifying the values for all read mates, even if they are identical.
Bugfixes:
- Issue #1107: fixed a bug causing seg-fault for
--soloType SmartSeq
with only one (pair of) fastq file(s) - Issue #1129: fixed an issue with short barcode sequences and
--soloBarcodeReadLength 0
- Issue #796: Fixed a problem with GX/GN tag output for
--soloFeatures GeneFull
option - PR: #1012: fix the bug with
--soloCellFilter TopCells
option - Fixed an issue that was causing slightly underestimated value of Q30 'Bases in RNA read' in
Solo.out/Gene/Summary.csv
STAR 2.7.7a --- 2020/12/28 ::: STARconsensus
Major new feature: STARconsensus: mapping RNA-seq reads to consensus genome.
- Insert (consensus) variants from a VCF file into the reference genome at the genome generation step with
--genomeTransformVCF Variants.vcf --genomeTransformType Haploid
- Map to the transformed genome. Alignments (SAM/BAM) and spliced junctions (SJ.out.tab) can be transformed back to the original (reference) coordinates with
--genomeTransformOutput SAM and/or SJ
- More information: https://github.com/alexdobin/STAR/tree/master/docs/STARconsensus.md
Minor bug fixes:
- Deprecated
--genomeConsensusFile
option. Please use--genomeTransformVCF
and--genomeTransformType
options instead. - Issue #1040: fixed a bug causing rare seg-faults for paired-end --soloType SmartSeq runs.
- Issue #1071: fixed a bug that can cause a crash for STARsolo runs with a small number of cells.
STAR 2.7.6a --- 2020/09/19
Major new feature:
Output multimapping chimeric alignments in BAM format using
--chimMultimapNmax N>1 --chimOutType WithinBAM --outSAMtype BAM Unsorted [and/or] SortedByCoordinate
Many thanks to Sebastian @suhrig who implemented this feature!
A more detailed description from Sebastian in PR #802.
Minor features and bug fixes:
- Issue #1008: fixed the problem with Unmapped.out.mate? output for --soloType CB_samTagOut output.
- PR # 1012: fixed the bug with --soloCellFiltering TopCells option.
- Issue #786: fixed the bug causing the Different SJ motifs problem for overlapping mates.
- Issue #945: GX/GN can be output for all --soloType, as well as for non-solo runs.
STAR 2.7.5c --- 2020/08/16
Bug-fix release.
- Issue #988: proceed reading from GTF after a warning that exon end is past chromosome end.
- Issue #978: fixed corrupted transcriptInfo.tab in genome generation for cases where GTF file contains extra chromosomes not present in FASTA files.
- Issue #945: otuput GX/GN for --soloFeatures GeneFull .
- Implemented removal of control characters from the ends of input read lines, for compatibility with files pre-processed on Windows.
STAR 2.7.5b --- 2020/08/01
Bug-fix release.
- Issue #558: Fixed a bug that can cause a seg-fault in STARsolo run with paired-end reads that have protruding ends.
- Issue #952: Increased the maximum allowed length of the SAM tags in the input SAM files.
- Issue #955: fixed seg-fault-causing bug for --soloFeatures SJ option.
- Issue #963: When reading GTF file, skip any exons that extend past the end of the chromosome, and give a warning.
- Issue #965: output genome sizes with and without padding into Log.out.
- Docker build: switched to debian:stable-slim in the Dockerfile.
- --soloType CB_samTagOut now allows output of (uncorrected) UMI sequences and quality scores with SAM tags UR and UY.
- Throw an error if FIFO file cannot be created on non-Linux partitions.
STAR 2.7.5a ______ 2020/06/16
STAR 2.7.5a 2020/06/16
Major new features:
- Implemented STARsolo quantification for Smart-seq with --soloType SmartSeq option.
- Implemented --readFilesManifest option to input a list of input read files.
Minor features and bug fixes:
- Change in STARsolo SJ output behavior: junctions are output even if reads do not match genes.
- Fixed a bug with solo SJ output for large genomes.
- N-characters in --soloAdapterSequence are not counted as mismatches, allowing for multiple adapters (e.g. ddSeq).
- SJ.out.tab is sym-linked as features.tsv for Solo SJ output.
- Issue #882: 3rd field is now optional in Solo Gene features.tsv with --soloOutFormatFeaturesGeneField3.
- Issue #883: Patch for FreeBSD in SharedMemory and Makefile improvements.
- Issue #902: Fixed seg-fault for STARsolo CB/UB SAM attributes output with --soloFeatures GeneFull --outSAMunmapped Within options.
- Issue #934: Fixed a problem with annotated junctions that was causing very rare seg-faults.
- Issue #936: Throw an error if an empty whitelist is provided to STARsolo.