-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order of chromosomes in BigBed.Writer #2
Comments
@jonathanBieler, does the following cover your idea of a function Base.isless(::Int, ::Char)
return true
end
function Base.isless(::Char, ::Int)
return false
end
function seqname_isless(str1::String, str2::String) :: Bool
function parse_seqname(str::String) :: Vector{Union{Char, Int}}
arr = Vector{Char}(str)
arr = convert(Vector{Union{Char, Int}}, arr)
m = match(r"(\d+)", str)
if m !== nothing
for (capture, offset) in zip(reverse(m.captures), reverse(m.offsets))
splice!(arr, UnitRange(offset, offset + length(capture) -1), parse(Int, capture))
end
end
return arr
end
return parse_seqname(str1) < parse_seqname(str2)
end julia> seqname_isless("chr1", "chrM")
true
julia> seqname_isless("chrM", "chr1")
false
julia> seqnames = ["chr11", "chr10", "chr01", "chr100", "chr1" , "chrM", "chr010"];
julia> sort(seqnames)
7-element Array{String,1}:
"chr01"
"chr010"
"chr1"
"chr10"
"chr100"
"chr11"
"chrM"
julia> sort(seqnames,lt=seqname_isless)
7-element Array{String,1}:
"chr01"
"chr1"
"chr10"
"chr010"
"chr11"
"chr100"
"chrM"
julia> seqnames = string.(1:11);
julia> sort(seqnames)
11-element Array{String,1}:
"1"
"10"
"11"
"2"
"3"
"4"
"5"
"6"
"7"
"8"
"9"
julia> sort(seqnames,lt=seqname_isless)
11-element Array{String,1}:
"1"
"2"
"3"
"4"
"5"
"6"
"7"
"8"
"9"
"10"
"11" |
My issue isn't that the chromosome list is in a specific order or another, just that you need to know (or be able to set) the order it's stored internally in the chromlist(writer) =
collect(values(writer.chromnames)[sortperm(collect(keys(writer.chromnames)))] |
Here's an MWE : output = open("data.bb", "w")
writer = BigBed.Writer(output, [("1", 12345), ("2", 9100), ("10", 123)])
write(writer, ("1", 101, 150, "gene 1"))
write(writer, ("2", 211, 250, "gene 2"))
write(writer, ("10", 211, 250, "gene 3"))
close(writer) |
Actually it would be better if I could specify the order in the Just commenting the sort here solves the problem (might create other though, I'm not sure) : But conceptually I would leave the choice of ordering to the user (since it might be imposed by external constrains, like a file you got from another tool). |
@jonathanBieler, I agree that this is an issue/annoyance and have made a note of it in https://github.com/BioJulia/BigBed.jl/projects/1. |
I'm trying to write data to a BigBed file on each human chromosome, I defined my writer as such :
With
chrs
being["1", "2", ...]
. Then I'm looping onchrs
and do some write operations, but I'm getting a :ArgumentError: disordered intervals
Because in the writer the chromosomes are getting reordered as :
["1", "10", "11"]
.I managed to get around it by getting the chromosomes in the right order with:
Would it be possible to have the writer keep the ordering it's given ? (maybe using an OrderedDict) or is there a good reason why it gets reordered ?
Alternatively a
chromlist
method to get the chromosome in the right order would help.The text was updated successfully, but these errors were encountered: