-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input format error #8
Comments
Hi, Thank you for creating this issue. The problem is because of the GTF file format, which is explained here. Note that fields are tab-separated, and the last column (field 9) is semicolon-separated list of tag-value pairs. You missed the semicolons. Tejaas looks for three tags in this field, namely "gene_name", "gene_id" and "gene_type". The first column should be the chromosome number, with or without the 'chr' prefix. However, with the input file that you are using, the debug messages should have included a line which says,
I am surprised that this message was not produced. Would it be possible for you to share the GTF file that you are using so that I can check why it is not producing this particular error message? Explanation of the error messages
This happens because an empty array is used. No genes were selected because the gene names provided in the expression file did not match the gene names in the GTF file (obtained from field 9 of GTF with tag "gene_id"). In your GTF file, no "gene_id" tag-value pair is provided. Troubleshooting 1: You found the same error because the gene_id from gencode.v19 GTF are different from the gene names that you are using. Troubleshooting 2: Tejaas allows missing genotype. Troubleshooting 3: I think this is because the GTF file was not tab-separated. Fixed GTF File format
Note the new tag-value pairs, tab separation between fields. |
Thank you so much for this extremely detailed response. I am unfortunately still have trouble with my GTF file format. I revised the input GTF to look like this:
The error I am now hitting is:
I checked that the GTF file is properly tab separated. Attached is the first 1000 lines of the GTF that I am using, as an example. Thank you again for all your time. I really appreciate it. |
The GTF file format is correct. I found that the problem is because the PyPI release version of our package was not using the latest Github changes. We sincerely apologize for your trouble. Hope you will find Tejaas useful to detect trans-eQTLs. If you would need any help in selecting the parameters or running the software, please let us know. I have now updated the package to the latest Github version. You can update it using:
and hopefully it will work. If not, please do let us know. Also, note that the |
The program runs for me now with the update you mentioned! An additional questions: I don't know if I am misunderstanding the --chrom tag. My understanding is that if I have multiple chromosomes in the vcf (each chromosome designated as an integer in the CHROM column of the vcf), then I can use this parameter to designate which chromosome I would choose SNPs from (ie --chrom 2 --include-SNPs 0:100 would mean first 100 SNPs from chr2). However, regardless of what integer I put for the --chrom tag, it defaults to picking the SNPs from the first chromosome in the vcf. I could even put a number that doesn't exist as a chromosome in the vcf and it would still run and default to SNPs that are on the first chromosome. I have been getting around this problem by pre-subsetting the vcf to each only contain one chromosome. I see in the parallelization example it says "file_path_here_${CHRM}.vcf.gz", so perhaps subsetting the vcf into one chromosome per file is what I am meant to do? Just wondering if I am misunderstanding this, since this is a required parameter. |
Hello,
I am hitting an error with what I assume to be a file format issue. This is the error message that I get:
This is my Tejaas command:
This is the format of my expression file:
My VCF file:
My annotation file:
This GTF was generated from a GFF3 using
rtracklayer
in R, and then I manually addedgene_type "protein_coding";
to the last field.Troubleshooting that I tried:
but ended up with a different error:
Now I am not sure which input file is likely causing the issue.
I am able to successfully run the example files that came with the download.
Any suggestions would much appreciated. Thank you.
The text was updated successfully, but these errors were encountered: