-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in database downloading #52
Comments
I have the same error here. Is this resolved? Best, |
@jianshu93 I believe I encountered a similar problem and fixed it by editing read_gtdb_taxonomy.py, which for me was located: I changed the server address at line 18 from: to: |
hi @leonmhartman, Thanks! It actually works. I am downloading it right now. I will be using it to remove host contigs from the American gut project. Perhaps I will have more questions later on when I actually ran the program to evaluate the quality of my genomes. Thanks, Jianshu |
Hi @jianshu93 Unfortunately, it seems that this only partially worked. The main DB files downloaded successfully overnight (~9 hrs), but the makedb process failed to complete due to the following error (note that I have edited the file paths from the original for brevity):
You're probably better equipped than me to troubleshoot this (I'm not a bioinformatician), but it seems odd that the MD5SUM file is corrupt – if it is, surely other people have would have had issues. Anyway, my next strategy is to change the URL again and download the previous DB (release 214.1) and hope that the MD5SUM file for that dataset is okay. Cheers, |
@jianshu93 It turns out that problems with the MD5SUM file were flagged on the GTDB forum several weeks ago (see here). I have added a post on the forum asking the GTDB admins to update the corrupt file. |
Hi @leonmhartman, Thank you so much for it! Let me know when you get the response from authors. I will use the old version first (v207 pr something). Thanks! Jianshu |
Hi @leonmhartman I mentioned the problem to GTDB-Tk team and they solved it today. You can try again now. I am also trying to download the newest database. |
Hi @leonmhartman, The gtdb problem is solved, however: --03a: downlad SILVA data--
Traceback (most recent call last): This is related to download silva database, any idea? Thanks, |
Hi @jianshu93 Thanks for contacting the GTDB-Tk team. Like you, the file update allowed me to continue the makedb process, but I have now encountered the same error that you posted :( I'll have a closer look at it tomorrow. In the meantime I have run MAGpurify on my data. It's not my preferred option, but it runs and removes some discordant contigs from my test MAGs. Cheers, |
I think it is because the website: https://www.arb-silva.de/fileadmin/silva_databases/current is not working, it was under maintenance. Any idea? Jianshu |
Hi @jianshu93 Good pick-up! Wow, we are really having some bad luck! The SILVA archive is back online now, however restarting the the makedb process still failed for me until I deleted the VERSION.txt file (it's actually an HTML file), which contained info about the status of the SILVA website. After deleting the file, the makedb process was able to continue and it will be interesting to see how far I get this time. Cheers, |
let me know what you get, but the website link is not available in a browser, no idea why. Line 23: silva_server = "https://www.arb-silva.de/fileadmin/silva_databases/current" Thanks, |
It seems that accessing SILVA with a web-browser via that link is forbidden, but other actions are ok (for example, see below). My makedb process is also still running and no new errors have been reported.
|
Thank you so much. Let's wait and see. I start from scratch, so you know it will take a couple of more hours. Additionally, see my pull request to use unicode parsing of version number, in case there are some non unicode. e.g., version 2.1_1. Thanks, |
@jianshu93 sorry to tag you, but did you manage to create the database succesfully and run mdmcleaner with it? |
Hi!
I'm trying to download database using mdmcleaner makedb, and I've got an error like this:
01a: download GTDB data--
Traceback (most recent call last):
File "/mnt/storage/lab4/progs/miniconda3/envs/mdmcleaner/bin/mdmcleaner", line 10, in
sys.exit(main())
^^^^^^
File "/mnt/storage/lab4/progs/miniconda3/envs/mdmcleaner/lib/python3.11/site-packages/mdmcleaner/mdmcleaner.py", line 231, in main
read_gtdb_taxonomy.main(args, configs)
File "/mnt/storage/lab4/progs/miniconda3/envs/mdmcleaner/lib/python3.11/site-packages/mdmcleaner/read_gtdb_taxonomy.py", line 1142, in main
getNprepare_dbdata_nonncbi(args.outdir, verbose=args.verbose, settings=configs.settings)
File "/mnt/storage/lab4/progs/miniconda3/envs/mdmcleaner/lib/python3.11/site-packages/mdmcleaner/read_gtdb_taxonomy.py", line 1036, in getNprepare_dbdata_nonncbi
progressdump = _download_dbdata_nonncbi(targetdir, progressdump, verbose=verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/storage/lab4/progs/miniconda3/envs/mdmcleaner/lib/python3.11/site-packages/mdmcleaner/read_gtdb_taxonomy.py", line 714, in _download_dbdata_nonncbi
progressdump["gtdb_download_dict"], progressdump["gtdb_version"] = download_gtdb_stuff(gtdb_source_dict, targetdir, verbose=verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/storage/lab4/progs/miniconda3/envs/mdmcleaner/lib/python3.11/site-packages/mdmcleaner/read_gtdb_taxonomy.py", line 316, in download_gtdb_stuff
download_dict = get_download_dict(sourcedict, targetfolder)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/storage/lab4/progs/miniconda3/envs/mdmcleaner/lib/python3.11/site-packages/mdmcleaner/read_gtdb_taxonomy.py", line 300, in get_download_dict
okdownloadfilelist, allisfine = check_gtdbmd5file(which_md5filename(targetfolder), targetfolder, sourcedict[x]["pattern"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mnt/storage/lab4/progs/miniconda3/envs/mdmcleaner/lib/python3.11/site-packages/mdmcleaner/read_gtdb_taxonomy.py", line 243, in which_md5filename
return glob.glob(targetdir + "/" + MD5FILEPATTERN_GTDB)[0] # --> assumes there is only one hit, therefore takes only the first of the list returned by glob.glob(); todo: make sure md5sum file is always deleted after db-setup! otherwise there may be problems if preexisting dbs are updated
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
Can you please help me to solve it?
Regards,
Maria
The text was updated successfully, but these errors were encountered: