Member-only story

Explore Airflow KubernetesExecutor on AWS and kops

Published in

Data Engineering Space

3 min readOct 9, 2018

With the recent launch of Apache Airflow 1.10, we saw some exciting changes. One impressive feature is the Kubernetes Executor, which allows users to execute tasks within the Kubernetes environment; in this way, you will get a self-healing pod and quickly scale your DAGs using Kubernetes.

The Bloomberg team originally started the KubernetesExecutor and contributed back to the Airflow community.

From Airflow official docs:

The kubernetes executor is introduced in Apache Airflow 1.10.0. The Kubernetes executor will create a new pod for every task instance.

The Airflow team also has an excellent tutorial on how to use minikube to play with KubernetesExecutor on your local environment. But you may want to have a different setup than minikube for your production, and I hope this blog can provide you with some ideas for running Kubernetes Executor on production.

After exploring the new Kubernetes Executor, two things to notice here:

A new pod for every task instance — if your task is as simple as a print statement, it will still follow the lifecycle of a pod: create a pod -> execute the code -> destroy it.
Tasks take time to start to execute — it takes time to create the container, I noticed usually took 30 seconds to 1-minute wait-time (on minikube) to accomplish the task.

Step 0: set up kops on AWS and Dockerfile for Airflow

It is out of scope here. You can refer more information here

Step 1: change the executor in airflow.cfg

Find the following line in your airflow.cfg

executor = KubernetesExecutor

Step 2: update Kubernetes section in Airflow.cfg

worker_container_repository = $IMAGE

worker_container_tag = $VERSION

dags_volume_claim = airflow-dags

logs_volume_claim = airflow-logs

worker_container_repository and worker_container_tag are the default image and tag if you don’t specify in your Airflow operator. Since we are…

Data Engineering Space

Explore Airflow KubernetesExecutor on AWS and kops

Step 0: set up kops on AWS and Dockerfile for Airflow

Step 1: change the executor in airflow.cfg

Step 2: update Kubernetes section in Airflow.cfg

Published in Data Engineering Space

Written by Chengzhi Zhao

Responses (4)