RPKM_normalization

Update 2020/10/24: You can also use simplified and faster version of normalization script from here.

RPKM_normalization

RPKM for RNAseq V1.3

Usage for sample input provided:

perl rpkm_script_beta.pl sample_count_test.count 2:9 28 > sample_count_test.rpkm

Description

In above example 'sample_count_test.count' file has count data from 2 to 9th column;
28th column has length of each genes calculated from Gencode GTF (Note below).

General usage:
perl rpkm_script_beta.pl input_count_file.txt ActualColumnStart:ActualColumnEnd ColumnGeneLength > OUTPUT_RPKM_FILE

ActualColumnStart = For example you have GeneID in first column and counts starts from second column. This should be '2'

ActualColumnEnd = Upto which column you need RPKM

ColumnGeneLength = Length of each gene (**NOTE below)

**NOTE: Steps to prepare your input

Length of the gene can be obtained from Gencode GTF by following command (Successfully tested upto Gencode V19)

cat gencode.vXX.annotation.gtf | awk -F'\t' '{if($3=="gene") {split($9,a,";"); print a[1]"\t"$5-$4};}' | sed 's/[gene_id |"|]//g' > YOUR_GENE_LENGTH_FILE

Combine input_count_file.txt and YOUR_GENE_LENGTH_FILE by GeneID or First column

join -j1 <(sort input_count_file.txt) <(sort YOUR_GENE_LENGTH_FILE) > OUTPUT_ANNOTATED_COUNT_FILE

Run the script over OUTPUT_ANNOTATED_COUNT_FILE

perl rpkm_script_beta.pl OUTPUT_ANNOTATED_COUNT_FILE ActualColumnStart:ActualColumnEnd ColumnGeneLength > OUTPUT_ANNOTATED_RPKM_FILE

Description

ActualColumnStart = For example you have GeneID in first column and counts starts from second column. This should be '2'

ActualColumnEnd = Upto which column you need RPKM

ColumnGeneLength = Length of each gene

RPKM calculation

RPKM = (10^6 * C)/(N * L), where

C = Number of reads mapped to a gene

N = Total mapped reads in the experiment

L = gene length in base-pairs for a gene

Author: Santhilal Subhash

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
README.md		README.md
rpkm_script_beta.pl		rpkm_script_beta.pl
sample_count_test.count		sample_count_test.count

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RPKM_normalization

**NOTE: Steps to prepare your input

RPKM calculation

About

Releases

Packages

Languages

decodebiology/rpkm_rnaseq_count

Folders and files

Latest commit

History

Repository files navigation

RPKM_normalization

**NOTE: Steps to prepare your input

RPKM calculation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages