Skip to content

Commit

Permalink
Ready for 2.7.9a
Browse files Browse the repository at this point in the history
  • Loading branch information
alexdobin committed May 4, 2021
1 parent 4b66c5e commit 1ebfe0b
Show file tree
Hide file tree
Showing 9 changed files with 60 additions and 9 deletions.
4 changes: 2 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
STAR 2.7.9a --- 2021/02/20 ::: STARsolo updates
STAR 2.7.9a --- 2021/05/05 ::: STARsolo updates
=====================================================
**Major updates:**
* STARsolo can perform counting of multi-gene (multi-mapping) reads with --soloMultiMappers EM [Uniform Rescue PropUnqiue] options.
* PR #1163: SIMDe takes care of correct SIMD extensions based on -m g++ flag: compilation option CXXFLAGS_SIMD is preset to -mavx2, but can be to the desired target architecture. Many thanks to Michael R. Crusoe @mr-c, Evan Nemerson @nemequ and Steffen Möller @smoe!
* PR #1163: [SIMDe](https://github.com/simd-everywhere/simde) takes care of correct SIMD extensions based on -m g++ flag: compilation option CXXFLAGS_SIMD is preset to -mavx2, but can be to the desired target architecture. Many thanks to Michael R. Crusoe @mr-c, Evan Nemerson @nemequ and Steffen Möller @smoe!

**New options and features:**
* New option: --soloUMIfiltering MultiGeneUMI_All to filter out all UMIs mapping to multiple genes (for uniquely mapping reads)
Expand Down
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
STAR 2.7
========
STAR 2.7.9a
==========
Spliced Transcripts Alignment to a Reference
© Alexander Dobin, 2009-2021
https://www.ncbi.nlm.nih.gov/pubmed/23104886
Expand Down Expand Up @@ -52,6 +52,11 @@ Compile under Linux
cd STAR/source
make STAR
```
For processors that do not support AVX extensions, specify the target SIMD architecture, e.g.
```
make STAR CXXFLAGS_SIMD=sse
```


Compile under Mac OS X
----------------------
Expand Down
10 changes: 10 additions & 0 deletions RELEASEnotes.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
STAR 2.7.9a --- 2021/05/05 ::: STARsolo updates
=====================================================

* [**Counting *multi-gene* (multimapping) reads**](#multi-gene-reads)
* STARsolo uses [SIMDe](https://github.com/simd-everywhere/simde) package which support different types of SIMD extensions. For processors that do not support AVX extensions, specify the target SIMD architecture, e.g.
```
make STAR CXXFLAGS_SIMD=sse
```


STAR 2.7.8a --- 2021/02/20
===========================
**Major STARsolo updates and many bug fixes**
Expand Down
Binary file modified bin/Linux_x86_64/STAR
Binary file not shown.
Binary file modified bin/Linux_x86_64/STARlong
Binary file not shown.
Binary file modified bin/Linux_x86_64_static/STAR
Binary file not shown.
Binary file modified bin/Linux_x86_64_static/STARlong
Binary file not shown.
39 changes: 37 additions & 2 deletions docs/STARsolo.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
**STARsolo**: mapping, demultiplexing and quantification for single cell RNA-seq
=================================================================================

Major update in STAR 2.7.9a (2021/05/05)
-----------------------------------------
* [**Counting *multi-gene* (multimapping) reads**](#multi-gene-reads)

Major updates in STAR 2.7.8a (2021/02/20)
-----------------------------------------
* [**Cell calling (filtering) similar to CellRanger:**](#emptydrop-like-filtering)
Expand Down Expand Up @@ -181,7 +185,7 @@ int>0: maximum number of mismatches allowed in adapter sequence

--------------------------------------
Cell filtering (calling)
--------------------
--------------------------------------
In addition to raw, unfiltered output of gene/cell counts, STARsolo performs cell filtering (a.k.a. cell calling), which aims to select a subset of cells that are likely to be "real" cells as opposed to empty droplets (containing ambient RNA).
Two types of filtering are presently implemented: simple (knee-like) and advanced EmptyDrop-like. The selected filtering is also used to produce summary statistics for filtered cells in the Summary.csv file, which is similar to CellRanger's summary and is useful for Quality Control.

Expand Down Expand Up @@ -230,9 +234,40 @@ Quantification of different transcriptomic features
--soloFeatures Gene GeneFull SJ Velocyto
```

------------------------------------------------------
Multi-gene reads
------------------------------------------------------
Multi-gene reads are concordant with (i.e. align equally well to) transcripts of two or more genes. One class of multi-gene read are those that map uniquely to a genomic region where two or more genes overlap. Another class are those reads that map to multiple loci in the genome, with each locus annotated to a different gene.

Including multi-gene reads allows for more accurate gene quantification and, more importantly, enables detection of gene expression from certain classes of genes that are supported only by multi-gene reads, such as overlapping genes and highly similar paralog families.

The multi-gene read recovery options are specified with ```--soloMultiMappers```. Several algorithms are implemented:

```
--soloMultiMappers Uniform
```
uniformly distributes the multi-gene UMIs to all genes in its gene set. Each gene gets a fractional count of 1/N_genes, where N_genes is the number of genes in the set. This is the simplest possible option, and it offers higher sensitivity for gene detection at the expense of lower precision.

```
--soloMultiMappers Uniform
```
distributes the multi-gene UMIs proportionally to the number of unqiue UMIs per gene. UMIs that map to genes that are not supported by unique UMIs are distributed uniformly.

```
--soloMultiMappers EM
```
uses Maximum Likelihood Estimation (MLE) to distribute multi-gene UMIs among their genes, taking into account other UMIs (both unique- and multi-gene) from the same cell (i.e. with the same CB). Expectation-Maximization (EM) algorithm is used to find the gene expression values that maximize the likelihood function. Recovering multi-gene reads via MLE-EM model was previously used to quantify transposable elements in bulk RNA-seq {[TEtranscripts](https://doi.org/10.1093/bioinformatics/btv422)} and in scRNA-seq {[Alevin](https://doi.org/10.1186/s13059-019-1670-y); [Kallisto-bustools](http://www.nature.com/articles/s41587-021-00870-2)}.

```
--soloMultiMappers Rescue
```
distributes multi-gene UMIs to their gene set proportionally to the sum of the number of unique-gene UMIs and uniformly distributed multi-gene UMIs in each gene [Mortazavi et al](https://www.nature.com/articles/nmeth.1226). It can be thought of as the first step of the EM algorithm.

Any combination of these options can be specified and different multi-gene falvors will be output into different files. The unique-gene UMI counts are output into the *matrix.mtx* file in the *raw/Gene* directory, while the sum of unique+multi-gene UMI counts will be output into *UniqueAndMult-EM.mtx, UniqueAndMult-PropUnique.mtx, UniqueAndMult-Rescue.mtx, UniqueAndMult-Uniform.mtx* files.

--------------------------------------
BAM tags
-----------------
--------------------------------------
* To output BAM tags into SAM/BAM file, add them to the list of standard tags in
```
--outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM
Expand Down
7 changes: 4 additions & 3 deletions source/parametersDefault.xxd
Original file line number Diff line number Diff line change
Expand Up @@ -1246,8 +1246,9 @@ unsigned char parametersDefault[] = {
0x43, 0x6f, 0x6f, 0x72, 0x64, 0x69, 0x6e, 0x61, 0x74, 0x65, 0x20, 0x2e,
0x2e, 0x2e, 0x20, 0x61, 0x6c, 0x69, 0x67, 0x6e, 0x6d, 0x65, 0x6e, 0x74,
0x73, 0x20, 0x69, 0x6e, 0x20, 0x42, 0x41, 0x4d, 0x20, 0x66, 0x6f, 0x72,
0x6d, 0x61, 0x74, 0x2c, 0x20, 0x75, 0x6e, 0x73, 0x6f, 0x72, 0x74, 0x65,
0x64, 0x2e, 0x20, 0x52, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x20,
0x6d, 0x61, 0x74, 0x2c, 0x20, 0x73, 0x6f, 0x72, 0x74, 0x65, 0x64, 0x20,
0x62, 0x79, 0x20, 0x63, 0x6f, 0x6f, 0x72, 0x64, 0x69, 0x6e, 0x61, 0x74,
0x65, 0x2e, 0x20, 0x52, 0x65, 0x71, 0x75, 0x69, 0x72, 0x65, 0x73, 0x20,
0x2d, 0x2d, 0x6f, 0x75, 0x74, 0x53, 0x41, 0x4d, 0x74, 0x79, 0x70, 0x65,
0x20, 0x42, 0x41, 0x4d, 0x20, 0x53, 0x6f, 0x72, 0x74, 0x65, 0x64, 0x42,
0x79, 0x43, 0x6f, 0x6f, 0x72, 0x64, 0x69, 0x6e, 0x61, 0x74, 0x65, 0x0a,
Expand Down Expand Up @@ -4469,4 +4470,4 @@ unsigned char parametersDefault[] = {
0x65, 0x76, 0x65, 0x6c, 0x6f, 0x70, 0x6d, 0x65, 0x6e, 0x74, 0x5f, 0x65,
0x6e, 0x64, 0x0a
};
unsigned int parametersDefault_len = 53619;
unsigned int parametersDefault_len = 53631;

0 comments on commit 1ebfe0b

Please sign in to comment.