We have sequenced the CEPH1463 (NA12878/GM12878, Ceph/Utah pedigree) human genome reference standard on the Oxford Nanopore MinION using 1D ligation kits (450 bp/s) using R9.4 chemistry (FLO-MIN106).
Human genomic DNA from GM12878 human cell line (Ceph/Utah pedigree) was either purchased from Coriell - "DNA" - (cat no NA12878) or extracted from the cultured cell line - "cells". As the DNA is native, modified bases will be preserved.
We encourage the reuse of this data in your own analysis and publications which is released under the Creative Commons CC-BY license. Therefore we would be grateful if you would cite the reference below if you do.
Miten Jain, Sergey Koren, Karen H Miga, Josh Quick, Arthur C Rand, Thomas A Sasani, John R Tyson, Andrew D Beggs, Alexander T Dilthey, Ian T Fiddes, Sunir Malla, Hannah Marriott, Tom Nieto, Justin O'Grady, Hugh E Olsen, Brent S Pedersen, Arang Rhie, Hollian Richardson, Aaron R Quinlan, Terrance P Snutch, Louise Tee, Benedict Paten, Adam M Phillippy, Jared T Simpson, Nicholas J Loman & Matthew Loose. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature Biotechnology doi: doi:10.1038/nbt.4060.
Miten Jain, Sergey Koren, Josh Quick, Arthur C Rand, Thomas A Sasani, John R Tyson, Andrew D Beggs, Alexander T Dilthey, Ian T Fiddes, Sunir Malla, Hannah Marriott, Karen H Miga, Tom Nieto, Justin O'Grady, Hugh E Olsen, Brent S Pedersen, Arang Rhie, Hollian Richardson, Aaron Quinlan, Terrance P Snutch, Louise Tee, Benedict Paten, Adam M. Phillippy, Jared T Simpson, Nicholas James Loman, Matthew Loose. Nanopore sequencing and assembly of a human genome with ultra-long reads. bioRxiv. doi: https://doi.org/10.1101/128835.
The rel3
release consists of the full dataset, and has two new rapid kit runs with a new long DNA extraction method:
- 39 flowcells
- 91240120433 bases
- 14183584 reads
flowcell_id | reads | bases | Date | Centre | SampleType | Kit | Pore | Links |
---|---|---|---|---|---|---|---|---|
FAB23716 | 356209 | 1409812422 | 14/07/16 | UBC | DNA | Rapid | R9 | FASTQ |
FAB39088 | 658224 | 3287994454 | 19/09/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB39075 | 466329 | 2439355478 | 20/09/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB39043 | 436976 | 2273008592 | 23/09/16 | Bham | DNA | Ligation | R9.4 | FASTQ |
FAB42706 | 430660 | 1966505502 | 12/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB41174 | 117057 | 687394987 | 13/10/16 | Bham | DNA | Ligation | R9.4 | FASTQ |
FAB42260 | 267644 | 1399557161 | 13/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB42804 | 16669 | 75062609 | 14/10/16 | Bham | DNA | Ligation | R9.4 | FASTQ |
FAB42316 | 572838 | 3275026637 | 14/10/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB42205 | 317654 | 1686630108 | 14/10/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB42561 | 233678 | 1520513556 | 19/10/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB42473 | 644869 | 3357548938 | 19/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB42395 | 38291 | 179704035 | 20/10/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB42476 | 435158 | 2363036522 | 27/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB42451 | 817629 | 4530477841 | 28/10/16 | Notts | DNA | Ligation | R9.4 | FASTQ |
FAB42704 | 276152 | 1750149482 | 28/10/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB42828 | 33527 | 163405138 | 01/11/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB42810 | 322058 | 2020615256 | 02/11/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB42798 | 193551 | 1339441522 | 03/11/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB45280 | 128234 | 799554798 | 11/11/16 | Norwich | DNA | Ligation | R9.4 | FASTQ |
FAB46664 | 491346 | 2038018797 | 15/11/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB46683 | 72605 | 286275511 | 17/11/16 | Bham | DNA | Ligation | R9.4 | FASTQ |
FAB45332 | 530938 | 2864140853 | 17/11/16 | UBC | DNA | Ligation | R9.4 | FASTQ |
FAB43577 | 426941 | 2539015084 | 18/11/16 | UCSC | DNA | Ligation | R9.4 | FASTQ |
FAB44989 | 558224 | 3443824633 | 18/11/16 | UCSC | DNA | Ligation | R9.4 | FASTQ |
FAF01169 | 339447 | 2913892142 | 22/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAF01441 | 254705 | 2203636947 | 22/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAB45277 | 53547 | 445641679 | 22/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAB45321 | 299174 | 2584017112 | 22/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAF01127 | 632728 | 4972081712 | 25/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAF01132 | 689781 | 5455971336 | 25/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAB49712 | 632158 | 4906148911 | 28/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAF01253 | 471698 | 3695661984 | 28/11/16 | Bham | Cells | Ligation | R9.4 | FASTQ |
FAB45321* | 123037 | 1043504055 | 28/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAB49914 | 309175 | 2841008085 | 28/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAB45271 | 472656 | 3689043164 | 28/11/16 | Notts | Cells | Ligation | R9.4 | FASTQ |
FAB49164 | 746333 | 4438258089 | 06/12/16 | UCSC | DNA | Ligation | R9.4 | FASTQ |
FAB49908 | 224380 | 3141600861 | 09/12/16 | Bham | Cells | Rapid | R9.4 | FASTQ |
FAF04090 | 91304 | 1213584440 | 09/12/16 | Bham | Cells | Rapid | R9.4 | FASTQ |
Please verify downloads against MD5 hashes.
[*] This flowcell ID was input incorrectly.
Rel4 adds an additional 23140190547 bases in 1415868 reads, predominantly using the new ultra-long read protocol.
asic_id | nreads | mn | count | n50 | flowcell | centre | kit | date | sequencedate |
---|---|---|---|---|---|---|---|---|---|
16056159 | 82138 | 21998 | 1806857522 | 114375 | FAF15665 | Notts | Ultra | 10/03/2017 | FASTQ |
17958431 | 53723 | 23321 | 1252868852 | 77045 | FAF13748 | Notts | Ultra | 10/03/2017 | FASTQ |
2901545329 | 41385 | 20506 | 848632752 | 54473 | FAF10039 | Bham | Ultra | 01/03/2017 | FASTQ |
3439856925 | 19674 | 30217 | 594496244 | 121393 | FAF09968 | Bham | Ultra | 03/03/2017 | FASTQ |
3709819546 | 73755 | 26946 | 1987434656 | 117805 | FAF09277 | Bham | Ultra | 03/06/2017 | FASTQ |
3976726082 | 75692 | 24191 | 1831031405 | 88882 | FAF14035 | Notts | Ultra | 08/03/2017 | FASTQ |
4109802543 | 61227 | 25048 | 1533616061 | 104528 | FAF15694 | Bham | Ultra | 06/03/2017 | FASTQ |
4111860526 | 65142 | 25171 | 1639658993 | 93299 | FAF09713 | Bham | Ultra | 07/03/2017 | FASTQ |
4178920553 | 270189 | 10106 | 2730589684 | 24848 | FAF18554 | UBC | Rapid | 06/03/2017 | FASTQ |
4244782843 | 9663 | 33401 | 322753214 | 102804 | FAF15630 | Notts | Ultra | 09/03/2017 | FASTQ |
4245291640 | 72936 | 20524 | 1496943560 | 92109 | FAF09640 | Bham | Ultra | 07/03/2017 | FASTQ |
4249180049 | 68169 | 25394 | 1731054841 | 119444 | FAF09701 | Bham | Ultra | 03/03/2017 | FASTQ |
82266371 | 71155 | 24602 | 1750584936 | 118548 | FAF15586 | Bham | Ultra | 08/03/2017 | FASTQ |
87644245 | 451020 | 8012 | 3613667827 | 13920 | FAF05869 | UBC | Ligation | 08/03/2017 | FASTQ |
### Alignments by flowcell
Reads for the rel3 (30x coverage dataset) aligned against pre-computed 1000 genomes GRCh38 BWA database at ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/ with decoys using BWA MEM (commit: 5961611c358e480110793bbf241523a3cfac049b
) using parameters -x ont2d
. Alignment statistics calculated using samtools stats
(samtools version 1.3.1).
FileID | Sequences | Mapped | Mapped MQ0 | Unmapped | Bases Mapped | Avg Length | Link | |
---|---|---|---|---|---|---|---|---|
FAB23716 | 356209 | 319259 | 26702 | 36950 | 1165998694 | 3957 | BAM | BAI |
FAB39088 | 658224 | 613044 | 35394 | 45180 | 3007307322 | 4995 | BAM | BAI |
FAB39075 | 466329 | 425117 | 28167 | 41212 | 2146453407 | 5230 | BAM | BAI |
FAB39043 | 436976 | 415389 | 21043 | 21587 | 2113140439 | 5201 | BAM | BAI |
FAB42706 | 430660 | 375374 | 17378 | 55286 | 1867123361 | 4566 | BAM | BAI |
FAB41174 | 117057 | 114520 | 4186 | 2537 | 652217119 | 5872 | BAM | BAI |
FAB42260 | 267644 | 246982 | 15624 | 20662 | 1263089767 | 5229 | BAM | BAI |
FAB42804 | 16669 | 13311 | 1755 | 3358 | 53666089 | 4503 | BAM | BAI |
FAB42316 | 572838 | 512994 | 18985 | 59844 | 3100596254 | 5717 | BAM | BAI |
FAB42205 | 317654 | 282502 | 12561 | 35152 | 1601397762 | 5309 | BAM | BAI |
FAB42561 | 233678 | 225141 | 10255 | 8537 | 1420740185 | 6506 | BAM | BAI |
FAB42473 | 644869 | 611138 | 32539 | 33731 | 3112342902 | 5206 | BAM | BAI |
FAB42395 | 38291 | 36477 | 2059 | 1814 | 167168840 | 4693 | BAM | BAI |
FAB42476 | 435158 | 416969 | 20908 | 18189 | 2214880871 | 5430 | BAM | BAI |
FAB42451 | 817629 | 779328 | 36986 | 38301 | 4178966543 | 5540 | BAM | BAI |
FAB42704 | 276152 | 263722 | 12926 | 12430 | 1619875186 | 6337 | BAM | BAI |
FAB42828 | 33527 | 27843 | 2442 | 5684 | 146819837 | 4873 | BAM | BAI |
FAB42810 | 322058 | 305070 | 16802 | 16988 | 1808343119 | 6274 | BAM | BAI |
FAB42798 | 193551 | 185739 | 8749 | 7812 | 1232035338 | 6920 | BAM | BAI |
FAB45280 | 128234 | 122219 | 6336 | 6015 | 743280816 | 6235 | BAM | BAI |
FAB46664 | 491346 | 456247 | 27622 | 35099 | 1862427349 | 4147 | BAM | BAI |
FAB46683 | 72605 | 64739 | 5307 | 7866 | 269213160 | 3942 | BAM | BAI |
FAB45332 | 530938 | 497862 | 26392 | 33076 | 2620752139 | 5394 | BAM | BAI |
FAB43577 | 426941 | 410137 | 19835 | 16804 | 2344990054 | 5946 | BAM | BAI |
FAB44989 | 558224 | 536572 | 25936 | 21652 | 3161900821 | 6169 | BAM | BAI |
FAF01169 | 339447 | 315489 | 16481 | 23958 | 2677881316 | 8584 | BAM | BAI |
FAF01441 | 254705 | 238834 | 12458 | 15871 | 2010117898 | 8651 | BAM | BAI |
FAB45277 | 53547 | 51957 | 2132 | 1590 | 426639054 | 8322 | BAM | BAI |
FAB45321 | 299174 | 283355 | 15165 | 15819 | 2366003310 | 8637 | BAM | BAI |
FAF01127 | 632728 | 605633 | 27192 | 27095 | 4640355789 | 7858 | BAM | BAI |
FAF01132 | 689781 | 655357 | 33564 | 34424 | 4966810089 | 7909 | BAM | BAI |
FAB49712 | 632158 | 612752 | 26264 | 19406 | 4594356245 | 7760 | BAM | BAI |
FAF01253 | 471698 | 454434 | 20639 | 17264 | 3430678969 | 7834 | BAM | BAI |
FAB45321 | 123037 | 118311 | 5891 | 4726 | 952851126 | 8481 | BAM | BAI |
FAB49914 | 309175 | 296250 | 12281 | 12925 | 2673848960 | 9188 | BAM | BAI |
FAB45271 | 472656 | 450702 | 20148 | 21954 | 3468377327 | 7804 | BAM | BAI |
FAB49164 | 746333 | 718351 | 32664 | 27982 | 4107087899 | 5946 | BAM | BAI |
FAB49908 | 224380 | 211060 | 11903 | 13320 | 2898563539 | 14001 | BAM | BAI |
FAF04090 | 91304 | 83164 | 6072 | 8140 | 1085757398 | 13291 | BAM | BAI |
Flowcell alignments were separated into individual chromosomes using samtools merge
.
Chrom | Mapped # | Mapped MQ0 | Bases Mapped | Avg Length | BAM | BAI |
---|---|---|---|---|---|---|
chr1 | 1075867 | 43397 | 6829526262 | 6744 | BAM | BAI |
chr2 | 1062314 | 31802 | 6755642896 | 6842 | BAM | BAI |
chr3 | 858643 | 24189 | 5487703898 | 6757 | BAM | BAI |
chr4 | 845677 | 30723 | 5395140705 | 6890 | BAM | BAI |
chr5 | 774613 | 23499 | 4953273570 | 6821 | BAM | BAI |
chr6 | 723047 | 24496 | 4618883250 | 6762 | BAM | BAI |
chr7 | 696473 | 28231 | 4382999832 | 6772 | BAM | BAI |
chr8 | 617988 | 23361 | 3968911801 | 6844 | BAM | BAI |
chr9 | 539660 | 25898 | 3428430670 | 6764 | BAM | BAI |
chr10 | 594688 | 20787 | 3805443564 | 6845 | BAM | BAI |
chr11 | 583055 | 17748 | 3710684724 | 6855 | BAM | BAI |
chr12 | 586663 | 17891 | 3734922623 | 6840 | BAM | BAI |
chr13 | 440615 | 17662 | 2844212242 | 6904 | BAM | BAI |
chr14 | 383777 | 15752 | 2439119767 | 6713 | BAM | BAI |
chr15 | 359853 | 19556 | 2268233023 | 6838 | BAM | BAI |
chr16 | 386401 | 22680 | 2425913744 | 6787 | BAM | BAI |
chr17 | 369036 | 22907 | 2302471086 | 6661 | BAM | BAI |
chr18 | 339094 | 13053 | 2172098564 | 6807 | BAM | BAI |
chr19 | 257039 | 10926 | 1472760724 | 6266 | BAM | BAI |
chr20 | 291960 | 13226 | 1829244829 | 6659 | BAM | BAI |
chr21 | 192383 | 24988 | 1207807437 | 6792 | BAM | BAI |
chr22 | 172934 | 10514 | 1041347396 | 6665 | BAM | BAI |
chrX | 658347 | 28769 | 4210769167 | 7076 | BAM | BAI |
chrY | 23378 | 5292 | 133803203 | 7869 | BAM | BAI |
chrM | 59363 | 658 | 91949786 | 1628 | BAM | BAI |
FAST5 files for 30x dataset have been split by chromosome according to the above alignments, meaning that some files may be found in multiple archives (they can be made non-redundant by reference to the filename). Each complete 'part' contains 100,000 reads and should be roughly in sort order along the chromosome to aid region-by-region analysis.
- canu.20x.contigs.fasta, md5
12ab2c03983ab1afd256ec826e89d786
- canu.30x.contigs.fasta, md5
b22086754b4bc4555db59f6fd7a82e47
- canu.30x.contigs.polished1.fasta, md5
dcdd543ddc8947276024e9a6ad8d9990
- canu.30x.contigs.polished2.fasta, md5
de4d0af9782a9f853ae51c30370867b9
- canu.30x.contigs.polished3.fasta, md5
dc94952a44637988978908a79415704a
- canu.35x.contigs.fasta, md5
ff5ef9c98ec70c3c4145e8dcac3178e4
- canu.35x.contigs.polished2.fasta, md5
d91214b1ca89aaabe95ca5ff52cee50e
- canu.chr20.metrichor.fasta, md5
d2ad6a6ed1260fd32c19c704f02a6d3a
- canu.chr20.metrichor.nanopolish.fasta, md5
888d2d81571a09467a9091a1f0e589ad
- canu.chr20.metrichor.nanopolish.polished2.fasta, md5
4510f05409243c473c2e92b6cb245aaf
- canu.chr20.metrichor.polished2.fasta, md5
9b456459b45ed8549bac50fcf2027039
- canu.chr20.nanonet.fasta, md5
f63b89396bdba099348f51cb8d23ff13
- canu.chr20.nanonet.nanopolish.fasta, md5
57f56dbdb0d91ed640f28745df01e6d6
- canu.chr20.nanonet.nanopolish.polished2.fasta, md5
a4ad5a39760f1a1bed2237a1c033fe9a
- canu.chr20.nanonet.polished2.fasta, md5
4bc6f3284dc209a661ddc9f21a80f829
- canu.chr20.scrappie.fasta, md5
70e4dcc72a87ad071da655066a4b77f9
- canu.chr20.scrappie.nanopolish.fasta, md5
122e36327c9d7c9f44387a71e05acbe6
- canu.chr20.scrappie.nanopolish.polished2.fasta, md5
1287822a8941a806473c2304c9e95185
- canu.chr20.scrappie.polished2.fasta, md5
0681fdf186a4562da03f8997a614e3bc
- mhc_haplotypeA.pilon.fasta, md5
d3d60c4b49f194980bcba87dfb0ce9bb
- mhc_haplotypeB.pilon.fasta, md5
020f0b2d54e1dba0bdb44b77becdb048
Figure: A typical read length distribution from a flowcell where we have run a cell-extracted DNA library. The y-axis shows the count of bases. Mean read length ~8.6kb with N50 of ~12.5kb (vertical line). Reads longer than 60kb are not expected due to limitations of the QIAGEN extraction kit employed.
We would like to acknowledge the support of Oxford Nanopore Technologies in generating this dataset, with particular thanks to Rosemary Dokos, Oliver Hartwell, Jonathan Pugh and Clive Brown. We would like to thank Radoslaw Poplawski and Simon Thompson for technical assistance with configuration and optimising of the CLIMB platform file system. We are grateful to Angel Pizarro and Jed Sundwall at Amazon Web Services for hosting this dataset as an AWS Open Data set.