IG-Buddy: Doris Lab IG Pipeline Tool

IGBuddy automates the data extraction and processing pipeline for the Doris Lab by converting and analyzing sequence data. This tool converts .bam files to .fasta, splits .fasta files, indexes them, and extracts sequences based on specific targets using various bioinformatics tools.

Requirements

To run this tool, the following programs must be installed and accessible in your system's PATH:

samtools 1.9 - for converting .bam files to .fasta
seqtk - for splitting .fasta files and retrieving specific sequences
fatotwobit - for converting .fasta files to .2bit
blat and blatSrc - for aligning sequences to targets

Ensure that these tools are available on your system before starting.

Setup Instructions

Clone this repository:

git clone https://github.com/kyraezikeuzor/ig-buddy.git
cd ig-buddy

Prepare a folder structure in the following format:
- Create a folder named bio-sample-[number].
- Place the .bam file for your sample within this folder.
Place your strain-specific target files in a designated location (or in the same directory as this tool for ease of use).

Usage

Set Up Environment

Run the setup functions to initialize the environment variables:

set_target_options()  # Loads strain-specific target options from `targets.txt`.
set_fatotwobit_script("/path/to/fatotwobit")  # Set the path to the faToTwoBit script.
set_blat_script("/path/to/blat")  # Set the path to the BLAT script.
set_igblast_script("/path/to/igblast")  # Set the path to the IgBlast script.

Start the data processing pipeline with your .bam file by following these steps:

Steps

Convert .bam to .fasta:
```
convert_bam_to_fasta("sample.bam", "sample.fasta")
```
Converts .bam files to .fasta format for further processing.
Split .fasta into smaller chunks:
```
split_fasta_file("sample.fasta", "output_prefix")
```
Splits the .fasta file into 10 smaller files for more manageable analysis.
Index .fasta files using faToTwoBit:
```
index_fasta_file("chunk_0001.fasta", "chunk_0001.2bit")
```
Converts .fasta files to .2bit format for faster access by BLAT.
Extract sequences of interest using BLAT:
```
extract_sequences_of_interest("database.2bit", "query.txt", "output.txt")
```
Uses BLAT to extract sequences that match specific targets, outputting them in BLAST format.
Extract identifiers with the highest score:
```
identifiers = extract_identifiers("target_file.txt")
```
Retrieves identifiers of sequences with the highest score based on strain-specific targets.
Append identifiers to a master file:
```
append_list_of_identifiers("master_file.txt", identifiers)
```
Adds identifiers to a specified master file for further analysis.
Retrieve sequences that match specific identifiers:
```
match_sequences("sample.fasta", "identifiers.txt", "matching_sequences.fasta")
```
Extracts sequences from .fasta file based on a list of identifiers.

Error Handling

Errors during file operations or external command executions will be caught and displayed, making it easier to troubleshoot issues such as missing files or incorrect paths.

Contributing

If you'd like to contribute to this project, please fork the repository and use a feature branch. Pull requests are welcome.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
README.md		README.md
fix.py		fix.py
igblast.py		igblast.py
main.py		main.py
spinner.py		spinner.py
targets.txt		targets.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IG-Buddy: Doris Lab IG Pipeline Tool

Requirements

Setup Instructions

Usage

Set Up Environment

Steps

Error Handling

Contributing

License

About

Releases

Packages

Languages

kyraezikeuzor/ig-buddy

Folders and files

Latest commit

History

Repository files navigation

IG-Buddy: Doris Lab IG Pipeline Tool

Requirements

Setup Instructions

Usage

Set Up Environment

Steps

Error Handling

Contributing

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages