-
Notifications
You must be signed in to change notification settings - Fork 13
Future Applications
Working with genomes with a non-standard GC content poses a challenge for all sequencing methods, but future developments of RUFUS may provide an answer to this problem. The current version of RUFUS does not do a good job at predicting k-mer distribution copy numbers in AT rich genomes. In AT rich genomes, it appears that GC content is not consistent across the RUFUS model copy number peak. This breaks the assumption made in RUFUS.model that copy number peaks are multiples of the single copy peak. If RUFUS were to bin k-mers based on GC content, then each copy number peak distribution could be modeled separately. This would allow RUFUS to improve the confidence and quality of calls in non-standard GC genomes.
Currently, RUFUS is designed to run controlled experiments comparing a control subject to two or three closely related individuals. Often times, researchers may not have access to data of closely individuals related to the subject of interest. Researchers may also want to know if a given mutation is novel in a population, not just a close family. While RUFUS could take in a human reference to do these studies, the use of a reference would introduce all the systematic errors associated with assembly. Instead, future versions of RUFUS would use raw Illumina sequence data from the 1000 genomes study to produce a population k-mer hash table. This population hash table would only need to be generated a single time. It would also be roughly the same size as a single individual hash table, as the vast majority of k-mers are non-unique between individuals. With this hash table, RUFUS would be able to detect if a variant is unique to a population in roughly the same time as it would take to run a trio analysis.