This repo contains a readme.md file explaining how to install Airflow locally using Docker. This is the fastest way to install Airflow on your computer. In this guide, I assume you already have Python installed on your computer. If you don't have Python installed, please install it from python.org
-
Create a virtual environment →
python -m venv venv
-
Make sure the python used is the one in the virtual env by running the command
which python
-
Activate the virtual environment →
source venv/Scripts/activate
(on Windows) orsource venv/bin/activate
(on Linux/Mac) -
Package installations:
pip3 install apache-airflow==2.3.4
- Or also the following command ...
pip3 install "apache-airflow[celery]==2.3.4" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.3.4/constraints-3.7.txt"
-
Download Docker from this link
-
Follow the steps in the official Airflow documentation to install Airflow via Docker
- Fetch docker-compose.yaml →
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.4.0/docker-compose.yaml'
- Create a .env file in the same directory as docker-compose.yaml and add the following lines. You can also add any other environment variables that you need to use throughout your code (e.g., GOOGLE_APPLICATION_CREDENTIALS)
AIRFLOW_IMAGE_NAME=apache/airflow:2.3.4
AIRFLOW_UID=50000
- Fetch docker-compose.yaml →
-
Add a Dockerfile to your directory with the following lines. Write the file name like this without the double quotes “
Dockerfile
”, and VSCode will automatically detect that this is a Docker file)
FROM apache/airflow:2.3.4
# Install any dependencies that did not get installed with the official Airflow image
# Make sure to pip freeze > requirements.txt first
COPY requirements.txt .
RUN pip3 install -r requirements.txt
# You can create a folder in your local directory called py_scripts and put there any python modules that you want to import in your DAG
# Make sure to include this line to add this folder to PATH in the local Airflow environment so that you can import your module right away with ease
ENV PYTHONPATH="$PYTHONPATH:/opt/airflow/py_scripts"*/
- If you want to send Emails through your DAG, add the following parameters under
_PIP_ADDITIONAL_REQUIREMENTS
in the docker-compose.yaml file- To generate the password, follow the steps explained in this blog post. Please create an environment variable in the .env file called GOOGLE_PASSWORD by simple typing
GOOGLEPASSWORD=fjkvbsjfls
and pass the variable to the docker-compose.yaml file as shown above
- To generate the password, follow the steps explained in this blog post. Please create an environment variable in the .env file called GOOGLE_PASSWORD by simple typing
# EMAIL CONFIG
AIRFLOW__SMTP__SMTP_HOST: smtp.gmail.com
AIRFLOW__SMTP__SMTP_PORT: 587
AIRFLOW__SMTP__SMTP_USER: john.doe@gmail.com
AIRFLOW__SMTP__SMTP_PASSWORD: ${GOOGLE_PASSWORD}
AIRFLOW__SMTP__SMTP_MAIL_FROM: john.doe@gmail.com
-
Change
AIRFLOW__CORE__LOAD_EXAMPLES
to FALSE so that you don't load the default Airflow DAG examples and clutter your UI -
Comment out this line in the docker-compose file
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.3.4}
- ... and uncomment this line
build: .
- ... and uncomment this line
-
Add more volumes (another name for folders) if you need to
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./py_scripts:/opt/airflow/py_scripts
What you are doing here is mapping your local directories on the left-hand side to the Airflow directories on the right-hand side. ./dags
, ./logs
, and
./plugins
are the default volumes that get created when you run the docker-compose.yaml file
- You can create a folder in your local directory called py_scripts and put there any python modules that you want to import in your DAG. If you do this, you need to add a another volume to the docker-compose.yaml file
-
Run
docker-compose up -d
- Make sure the Docker application is open and running in the background. You can always set it to open automatically as soon you log on to your computer so that you don’t have to remember to open the application each time you run docker-compose
-
Run
docker ps
to check on the status of Airflow’s services. Everything should be healthy as shown below -
Now, go to your browser, type in localhost:8080. The username and password are both “
airflow
” without the double quotes. Congratulations, you are now running Airflow locally from your computer. It should look something like this
To add a path to the PYTHONPATH on the Windows operating system, use the environment variables window from the start menu
The new value that will be added under the Path
variable is --> %PYTHONPATH%
N.B. You will need to restart your computer for this change to come into effect
By running this command, you will install the necessary libraries to query data from BigQuery tables. However, you will still need to configure the credentials so that your requests don't get blocked
pip3 install "apache-airflow[celery]==2.3.4" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.3.4/constraints-3.7.txt"
-
Donwload Google Cloud SDK
-
Open the Google Cloud SDK terminal after the installation finishes and type in the following commands one by one
gcloud auth application-default login --disable-quota-project
gcloud config set project {insert_project_name_without_curly_braces}
gcloud auth login --enable-gdrive-access --no-add-quota-project-to-adc
-
Run docker-compose.yaml to activate the Airflow environment, go to localhost:8080 after all services are spun up, click on "Admin --> Connections" to create a connection and fill the fields as shown in the screenshot below
- The scopes that you need to enter in the Scopes (comma separated) field are shown below. Make sure to separate them by commas
-
Go to the folder where your application_default_credentials are stored
- On Windows, it's usually stored in this directory
- ```C:\Users%USER%\AppData\Roaming\gcloud\application_default_credentials.json``
- On Mac/Linux, it's usually stored in this directory
$HOME/.config/gcloud/application_default_credentials.json
- On Windows, it's usually stored in this directory
-
Copy-paste the JSON file to one of the folders that get cloned to the Airflow environment. In this example, I use the folder py_scripts, which I created a volume for in step #11 under the
Steps to install Airflow locally
section. You can also put in any of the other three volumes that are created by Airflow by default. In the end, it should look something like this
- Important: Remember to create a volume in the docker-compose.yaml that maps the folder where the JSON file is stored to the local environment in Airflow
- At the beginning of the DAG script that is stored in the DAGs folder, add the following commands
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/opt/airflow/{volume_where_the_JSON_file_is_stored}/application_default_credentials.json"
- Now, you should be able to query data in BigQuery through Airflow