GitHub - GoekeLab/sg-nex-data: Nanopore RNA-Seq data from the Singapore Nanopore-Expression Project

The SG-NEx project is an international collaboration initiated at the Genome Institute of Singapore to provide reference transcriptomes for 5 of the most commonly used cancer cell lines using Nanopore long read RNA-Seq data:

Transcriptome profiling is done using PCR-cDNA sequencing ("PCR-cDNA"), amplification-free cDNA sequencing ("direct cDNA"), direct sequencing of native RNA (“direct RNA”), and short read RNA-Seq. All samples are sequenced with at least 3 high quality replicates. For a subset of samples spike-in RNAs are included and matched m6A profiling is available.

The raw, aligned, and processed data is hosted on the AWS open data registry (see below for data access and analysis tutorial).

Sign up for data release notifications and updates

You can sign up for the sg-nex-updates email list to receive notifications about upcoming data releases:

https://groups.google.com/forum/#!forum/sg-nex-updates/join

Data Release and Access

Latest Release (v0.6)

This release includes 113 samples from 13 different cell lines.

Data Access

You can access the following data through the AWS Open Data Registry:

raw files (fast5)
raw files (blow5)
basecalled files (fastq)
aligned reads (genome and transcriptome) (bam)
tracks for visualisation (bigwig and bigbed)
processed data for differential RNA modification analysis (json, for use with xPore)
processed data for identification of m6A (json, for use with m6Anet)
annotation files
detailed sample and experiment information

You can browse the S3 data here: 1) fast5, fastq, and bam and 2) blow5.

Please refer to the data access tutorial which describes the S3 data structure and how to access files with AWS CLI. The direct links to the data are listed in the sample spreadsheet.

Here are the locations for the spike-in concentrations used in SG-NEx samples:

Citation: Please cite the pre-print describing the SG-NEx data resource when using these data, and add the following details: "The SG-NEx data was accessed on [DATE] at registry.opendata.aws/sg-nex-data".

Chen, Y. et al. "A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines." bioRxiv (2021). doi: https://doi.org/10.1101/2021.04.21.440736

Release Note & Updates

Version Number: V0.6.0
Date: 2024-11-21
Replacement of fastq and bam files

fastq files basecalled from fast5 converted blow5 files using Guppy 6.4.2
bam files using updated fastq files with Minimap2-2.22

Added in code for SG-NEx manuscript

Version Number: V0.5.1
Date: 2024-04-15 Release of new sample

new RNA004 sample of Hek293T (SGNex_Hek293T_directRNA_replicate5_run1)
pod5, fastq, genome and transcriptome aligned bam files are included in this release

Version Number: V0.5.0
Date: 2024-03-08
Release of new samples

direct RNA data for H9 and HEYA8 samples
cDNA and direct cDNA samples for H9 and HEYA8
cDNA promethion samples of Hct116 samples using SQK-PCS110 (100 million reads on average)
cDNA sample of Hct116 sampe using the SQK-PCS111

Update of existing sample files

SGNex_MCF7_cDNAStranded_replicate2_run1.fastq.gz additional info characters removed before @ for the first read
SGNex_K562_cDNAStranded_replicate3_run3.fastq.gz line48000 added 1 character of “ for quality to match sequence length
SGNex_A549_directRNA_replicate5_run1.tar.gz updated as previous version is incomplete
SGNex_MCF7-EV_directRNA_replicate1_run1.fastq.gz updated on ENA as it is a duplicated file
SGNex_MCF7_directRNA_replicate2_run2 fixed with this command “zcat SGNex_MCF7_directRNA_replicate2_run2.fastq.gz | sed 's/.*@/@/g' | sed '$d' | gzip > SGNex_MCF7_directRNA_replicate2_run2_fixed.fastq.gz” thanks to Alex

Version Number: V0.4.0
Date: 2023-03-06
Update of the SG-NEx data on AWS. Includes raw signal data in blow5 format.

Version Number: V0.3.0
Date: 2022-07-28
Initial release of the SG-NEx data on AWS. Includes Nanopore direct RNA, cDNA, direct cDNA-Seq, short read RNA-Seq and m6ACE-Seq.

Release History

You can find previous releases here in the release history

Browse the data

You can now browse the data using the UCSC genome browser:

View the SG-NEx data in the UCSC Genome Browser

By default only selected tracks are shown, but you can visualise all reads (bigbed tracks) and their coverage tracks (bigwig) from each individual sample.

Data Processing

All data was aligned against the human genome version Grch38 (please refer to the data access tutorial for reference files). We collaborated with nf-core to develop nanoseq, a standardardized pipeline for Nanopore RNA-Seq data processing.

Use Cases and Applications

You can browse a list of articles that review or use the SG-NEx data here. If you have used the data for your own research, feel free to add a publication entry.

Data Analysis Tutorials and Workflows

The following short tutorials are available that demonstrate how to analyse the SG-NEx data:

Transcript discovery and quantification of SG-NEx samples (using Bambu)

Additional, more detailed workflows can be found here:

Contributing

We welcome contributions from all long read RNA-seq tool developers! You may follow the steps below to contribute:

Fork this repository
Add your tutorial document to the docs folder
Adding your tutorial workflow link in the Data Analysis Tutorials and Workflows section in README.md in this format: tutorial title
Submit a pull request.

Acknowledgements

GIS Sequencing Platform and Data Generation
Hwee Meng Low, Yao Fei, Sarah Ng, Wendy Soon, CC Khor

Cancer Genomics and RNA Modifications
Viktoriia Iakovleva, Puay Leng Lee, Lixia Xin, Hui En Vanessa Ng, Jia Min Loo, Xuewen Ong, Hui Qi Amanda Ng, Suk Yeah Polly Poon, Hoang-Dai Tran, Kok Hao Edwin Lim, Huck Hui Ng, Boon Ooi Patrick Tan, Huck-Hui Ng, N.Gopalakrishna Iyer, Wai Leong Tam, Wee Joo Chng, Leilei Chen, Ramanuj DasGupta, Yun Shen Winston Chan, Qiang Yu, Torsten Wüstefeld, Wee Siong Sho Goh

Statistical Modeling and Data Analytics
Ying Chen, Nadia M. Davidson, Yuk Kei Wan, Hasindu Gamaarachchi, Andre Sim, Harshil Patel, Min Hao Ling, Yu Song Chuah, Naruemon Pratanwanich, Christopher Hendra, Laura Watten, Chelsea Sawyer, Dominik Stanojevic, Philip Andrew Ewels, Andreas Wilm, Mile Sikic, Alexandre Thiery, Michael I. Love, Alicia Oshlak, Jonathan Göke

Citing the SG-NEx project

The SG-NEx resource is described in:

Chen, Ying, et al. "A systematic benchmark of Nanopore long read RNA sequencing for transcript level analysis in human cell lines." bioRxiv (2021). doi: https://doi.org/10.1101/2021.04.21.440736

Please cite this pre-print when using these data, and add the following details: "The SG-NEx data was accessed on [DATE] at registry.opendata.aws/sg-nex-data".

Contact

Questions about SG-NEx? Please add an entry in the Discussions Forum. You can also contact Jonathan Göke

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
docs		docs
images		images
manuscript/code		manuscript/code
templates		templates
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content

Sign up for data release notifications and updates

Data Release and Access

Browse the data

Data Processing

Use Cases and Applications

Data Analysis Tutorials and Workflows

Contributing

Acknowledgements

Citing the SG-NEx project

Contact

About

Releases

Packages

Contributors 11

Languages

GoekeLab/sg-nex-data

Folders and files

Latest commit

History

Repository files navigation

Content

Sign up for data release notifications and updates

Data Release and Access

Browse the data

Data Processing

Use Cases and Applications

Data Analysis Tutorials and Workflows

Contributing

Acknowledgements

Citing the SG-NEx project

Contact

About

Topics

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Languages

Packages