-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about how to run Tephra correctly? #51
Comments
I've downloaded the gene file (GCF_000004555.2_CB4_rna.fna) for Caenorhabditis briggsae from NCBI and provided the repeat database (cbrrep.ref) sourced from Repbase. However, even after running for some time, the program remains stuck at the message 'Command - 'tephra illrecomb' started at: 21-09-2023 12:12:25'. |
Hi @CSU-KangHu,
The repeat database is used for annotating unclassified elements. Transposons are classified into superfamilies according to 1) structure, and 2) the presence and 3) order of coding domains. If none of this information is present it is difficult to classify the TE other than a similarity search against against a database of TEs. If this search fails to find significant matches (using published thresholds for matches) then the elements are marked as unclassified according to order otherwise they will be annotated at the superfamily level producing the signfiicant matches. The database is not used for family-level classification.
This is a good suggestion. I will think about making this optional. The idea is to provide as much information as possible in the annotation process so you end up with a high confidence set of annotations rather than a large number of predictions that may lead to conflicts at later stages of the analysis. This is based on my experience but I can provide more details on this point and where to get the files that are needed.
I see in the log you provided that it is receiving the ABRT signal so something has killed the tephra process that was running or some of the sub-processes are being killed. You might want to check the memory usage and whether or not you are reaching limits (CPU, RAM, or disk space) on your computer or with the queueing system if it is a computing cluster. I don't believe this is related to the input but you will definitely want to provide non-empty files. There are some validation steps on the input but I will add checks to make sure the inputs are not empty in the next version.. |
Hi @sestaton , Subsequently, I attempted to run the rice genome (GCF_001433935.1_IRGSP-1.0_genomic.fna) with the same configuration. However, after running for nearly 8 hours, Tephra encountered an error. May I ask what could be the reason for this? |
Thank you for the feedback, @CSU-KangHu. It is hard to tell exactly what is happening other than there were not TIRs found, which is unexpected for rice. Perhaps the data could not be written to disk? If you could check that you have ample disk space that would be a good place to start. Also, if you could point me to the input files you are using I will try to recreate the issue to better understand what is happening. |
Thank you for your assistance. The program's input and output data are stored in the /home/huk/tephra_lib/rice directory, and there is still sufficient space. Download links for input data:
|
By the way, Both the Arabidopsis thaliana and Caenorhabditis briggsae genome and gene files were downloaded from NCBI. |
Hi,
I'm trying to use Tephra to generate the TE library for Caenorhabditis briggsae (GCF_000004555.2_CB4_genomic.fna). I downloaded the docker image from Docker Hub and successfully ran the container. I've read through the Specifications and example usage and have only changed the genome, outfile, repeatdb, genefile, threads parameters in the all section, leaving the rest as default.
I have three main questions, and I'd appreciate your assistance:
I have a small suggestion for Tephra. I think only the genome parameter should be mandatory, while the rest can be optional. The program should be able to run even if the user doesn't provide them.
The text was updated successfully, but these errors were encountered: