EzBioCloud

Python script designed to streamline bioinformatics analysis and facilitate data extraction from EzBioCloud

EzBioCloud is a bioscience's public data and analytics portal focusing on taxonomy, ecology, genomics, metagenomics, and microbiome of Bacteria and Archaea.

Unfortunatelly Ezbiocloud does not provide any API keys. Because of that, here I present a solution to automate processing of big scale microbiome analysis using original approach -automatic webdriver for Chrome Selenium.

The programe download from Ezbiocloud crucial data in ordered way and extract some specific data eg. total valid reads, percentage valid reads, species, percentage etc.

1. Input

Firstly interpreter ask User for:

path where experiment folder might be created and experiment name,
login and password to EZBioCloud,
all samples IDs,

All samples' fastq files have to be already be uploaded to EZBioCloud

2. Downloading and file management

Webdriver enter EZBioCloud, login and search first given sample ID.

.xlsx files and .png charts for genus and species are downloaded and moved into a given folder.

Because of the fact that changing download folder location in Chrome using `` Webdriver is problematic - the files are first downloaded into Users Download folder by default and then renamed and moved into a given folder. You can change a path of a download folder location here:

source_folder = r'C:\Users\Asus\Downloads'

Remember that download folder MUST be empty.

Sample file after this step:

3. Create INFO.txt file

Total valid reads and percentage valid reads values are taken and INFO.txt file is created.

4. Create details.xlsx

The main goal is to create a single details.xlsx file based on files downloaded and EZBiocloud app for every sample. The excel sheet provide all microbiome genuses types sorted by percetage and create separated Details column for species detected in a sample for each genus.

The threshold is set on 1% and only genus types and species with percentage more than 1% are processed and then shown in final excel sheet. #BEFORE

Genus file example:

...

Species file example:

... #AFTER

Output details.xlsx file example(final excel):

5. Comparing contig similarity in a taxonomic group

During alignment, EZBioCloud sometimes assign reads to a taxonomic group instead of specific species. A taxonomic group is defined as a group of taxa (species/subspecies) that cannot be differentiated solely by 16S rRNA sequences. A typical example is the case of Escherichia coli and Shigella spp., which show almost identical 16S rRNA sequences. It is safer to identify such 16S rRNA sequences as a member of a species group that contains very similar 16S rRNA sequences, rather than to potentially wrongly assign them as E. coli. For example:

In this situation, contig data is used (contig is a set of identical and sometimes overlapping sequences that together represent a consensus region of DNA) in order to show the most likely species. Webdriver make a set of activities:

Find taxonomic group in EZBiocloud Taxonomic hierarchy:

Take first contig top hit

Compare similarity percentage of all 5 Hit Species Name:

In above example first four species names will be taken, written in organized way together with taxonomic group percentage and added to detail.xlsx file:

Rules of extracting Hit Species Name:

Take all Hit Species Name with 100% Similarity
If there is no such Hit Species Name with 100% Similarity, then take Hit Species Name with Similarity above 99%

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
__pycache__		__pycache__
selenium		selenium
.gitgnore		.gitgnore
EZbioCloud_explore.py		EZbioCloud_explore.py
FileHandler.py		FileHandler.py
Files_mainipulation.py		Files_mainipulation.py
Final_excel_file.py		Final_excel_file.py
INFO_file.py		INFO_file.py
README.md		README.md
main.py		main.py
review1.txt		review1.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EzBioCloud

Python script designed to streamline bioinformatics analysis and facilitate data extraction from EzBioCloud

1. Input

2. Downloading and file management

3. Create INFO.txt file

4. Create details.xlsx

5. Comparing contig similarity in a taxonomic group

About

Releases

Packages

Languages

janklaszczyk/EzBioCloud-automation

Folders and files

Latest commit

History

Repository files navigation

EzBioCloud

Python script designed to streamline bioinformatics analysis and facilitate data extraction from EzBioCloud

1. Input

2. Downloading and file management

3. Create INFO.txt file

4. Create details.xlsx

5. Comparing contig similarity in a taxonomic group

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages