Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with generation of PSI results #184

Open
chandaayan opened this issue Feb 15, 2024 · 1 comment
Open

Help with generation of PSI results #184

chandaayan opened this issue Feb 15, 2024 · 1 comment

Comments

@chandaayan
Copy link

I have been trying to perform alternative splicing using SUPPA2. Briefly, I have 28 human samples on which RNAseq was performed. From those, I have generated the count files with Kallisto using gencode.v44 indices. Using the same gencode.v44 gtf file, I have generated the different .ioe files for alternate splicing events. Now, my original abundance.tsv file generated by kallisto looked like this:

$ head abundance.tsv
target_id length eff_length est_counts tpm
ENST00000456328.2|ENSG00000290825.1|-|OTTHUMT00000362751.1|DDX11L2-202|DDX11L2|1657|lncRNA| 1657 1658 0 0
ENST00000450305.2|ENSG00000223972.6|OTTHUMG00000000961.2|OTTHUMT00000002844.2|DDX11L1-201|DDX11L1|632|transcribed_unprocessed_pseudogene| 632 633 0 0
ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|WASH7P-201|WASH7P|1351|unprocessed_pseudogene| 1351 1352 0 0
ENST00000619216.1|ENSG00000278267.1|-|-|MIR6859-1-201|MIR6859-1|68|miRNA| 68 69 0 0
ENST00000473358.1|ENSG00000243485.5|OTTHUMG00000000959.2|OTTHUMT00000002840.1|MIR1302-2HG-202|MIR1302-
Now I processed it to look like this:

$ head abundance_preprocessed.tsv
ENST00000456328.2 0.000000
ENST00000450305.2 0.000000
ENST00000488147.1 0.000000
ENST00000619216.1 0.000000
ENST00000473358.1 0.000000
ENST00000469289.1 0.000000
ENST00000607096.1 0.000000
ENST00000417324.1 0.000000
ENST00000461467.1 0.000000
ENST00000606857.1 0.000000
One of the generated ioe files looks like this:

$ head gencode.v44_A3_strict.ioe
seqname gene_id event_id alternative_transcripts total_transcripts
chr1 ENSG00000237491.10 ENSG00000237491.10;A3:chr1:779092-803919:779092-803951:+ ENST00000655765.1,ENST00000670700.1,ENST00000657896.1,ENST00000658648.1,ENST00000656571.1,ENST00000669749.1,ENST00000666217.1,ENST00000665719.1 ENST00000665719.1,ENST00000412115.2,ENST00000666217.1,ENST00000658648.1,ENST00000669749.1,ENST00000657896.1,ENST00000670700.1,ENST00000655765.1,ENST00000656571.1
chr1 ENSG00000237491.10 ENSG00000237491.10;A3:chr1:779092-803919:779092-803922:+ ENST00000655765.1,ENST00000670700.1,ENST00000657896.1,ENST00000658648.1,ENST00000656571.1,ENST00000669749.1,ENST00000666217.1,ENST00000665719.1 ENST00000665719.1,ENST00000443772.2,ENST00000666217.1,ENST00000658648.1,ENST00000669749.1,ENST00000657896.1,ENST00000670700.1,ENST00000655765.1,ENST00000656571.1
chr1 ENSG00000237491.10 ENSG00000237491.10;A3:chr1:807323-809622:807323-809658:+ ENST00000412115.2,ENST00000670700.1,ENST00000657896.1,ENST00000658648.1 ENST00000665719.1,ENST00000658648.1,ENST00000588951.5,ENST00000669749.1,ENST00000657896.1,ENST00000670700.1,ENST00000412115.2,ENST00000656571.1

So now I am trying to generate Psi results using the slurm:
#!/bin/bash
#SBATCH --job-name=suppa_psi_calculation
#SBATCH --output=/home/achanda/%x_%j.out
#SBATCH --error=/home/achanda/%x_%j.err
#SBATCH --time=04:00:00
#SBATCH --cpus-per-task=16
#SBATCH --mem=128G

Load Python or any required modules

module load python3

Directories

SUPPA_DIR=/software/SUPPA-2.3
EVENT_DIR=
/AS_events
PROCESSED_DIR=/Processed # Correct directory for processed files
OUT_DIR=
/psi_results

Create the output directory

mkdir -p ${OUT_DIR}

Iterate over each event type

for EVENT_FILE in ${EVENT_DIR}/*.ioe; do
EVENT_TYPE=$(basename ${EVENT_FILE} .ioe) # Correct extraction of the event type

# Iterate over each sample's processed file in the Processed directory
for PROCESSED_FILE in ${PROCESSED_DIR}/*_abundance_preprocessed.tsv; do
    SAMPLE_NAME=$(basename ${PROCESSED_FILE} _abundance_preprocessed.tsv)  # Correct extraction of the sample name
    OUTPUT_PREFIX="${OUT_DIR}/${SAMPLE_NAME}_${EVENT_TYPE}"

    echo "Starting processing of ${SAMPLE_NAME} for event type ${EVENT_TYPE}..."
    python3 ${SUPPA_DIR}/suppa.py psiPerEvent -i ${EVENT_FILE} -e ${PROCESSED_FILE} -o ${OUTPUT_PREFIX}

    if [ $? -eq 0 ]; then
        echo "Successfully processed ${SAMPLE_NAME} for event type ${EVENT_TYPE}."
    else
        echo "Failed to process ${SAMPLE_NAME} for event type ${EVENT_TYPE}. Check error logs for details."
    fi
done

done

However,

$ head suppa_psi_calculation_28165525.err
INFO:lib.tools:File ~/Sample1_abundance_preprocessed.tsv opened in reading mode.
INFO:psiCalculator:Buffering transcript expression levels.
ERROR:lib.tools:1, in line 2. Skipping line...
ERROR:lib.tools:2, in line 3. Skipping line...
ERROR:lib.tools:3, in line 4. Skipping line...
ERROR:lib.tools:4, in line 5. Skipping line...
ERROR:lib.tools:5, in line 6. Skipping line...
ERROR:lib.tools:6, in line 7. Skipping line...
ERROR:lib.tools:7, in line 8. Skipping line...
ERROR:lib.tools:8, in line 9. Skipping line...

Can someone help with what is going wrong? I can provide further details if needed.

@EduEyras
Copy link
Member

EduEyras commented Mar 3, 2024

Hi Chandaayan,
your abundance file seems to have many zeroes.
Do you get non-zero abundances for your transcripts? That could trigger some of the errors.
E.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants