GID

GisAids Influenza Downloader

Introduction

This project is used to download data from the EpiFlu(influenza virus database) in GISAID, including metadata, genetic sequences, etc.

search_page.py can only be used to download all entries from EpiFlu, without the ability to filter specific datasets.
download_item.py selects up to 8,000 data entries each time. You need to manually click the 'Download' button in the browser, select the required file type, and then manually download the file.
Each time download_item.py initiates a download , the program opens a new browser page for login.

Before you begin using this script, ensure that your system meets the following requirements:

GisAids Account
Operating System: Linux desktop environment is required for proper execution of the script.
Firefox Browser: This script uses Selenium, which requires Firefox browser to be installed on your machine. Install Firefox if it is not already installed.
Firefox WebDriver: You will also need to download the Firefox WebDriver, which allows Selenium to interact with Firefox. The WebDriver can be downloaded from the following URL:
- GeckoDriver
After downloading, extract the WebDriver and place it in the root directory of the project.

Clone the Repository: First, clone this repository to your local machine using Git.
Set up Conda Environment: Use the environment.yml file to create a Conda environment with all the necessary dependencies:
```
conda env create -f environment.yml
```

Follow these steps to use this project:

In the download_item.py and search_page.py files, fill in username = '' and password = '', as well as the page_number variable with the username, password, and maximum number of pages.
Execute the search_page.py program to download all search page entry information:
```
python search_page.py
```
After the download is complete, execute combine_json.py to merge all the page information:
```
python combine_json.py
```
Execute download_item.py to download the required entry data:
```
python download_item.py 
```