Learn how to install Apache Airflow locally on Mac, Windows, and Linux with this comprehensive guide. Get hands-on experience with the popular data management tool.

Learn how to install Apache Airflow locally on Mac, Windows, and Linux with this comprehensive guide. Get hands-on experience with the popular data management tool.
Apache Airflow is an open-source platform for managing and scheduling data workflows. It allows you to define, organize, and manage tasks and dependencies that make up a workflow. Airflow provides a user-friendly interface for monitoring and managing the execution of your workflows, making it a popular choice for data engineers, data scientists, and developers.
Airflow can be used in a variety of scenarios, including:
ETL (Extract, Transform, Load) processes: Airflow can be used to automate the process of extracting data from multiple sources, transforming the data into a desired format, and loading the data into a target database.
Data processing pipelines: Airflow can be used to automate complex data processing pipelines that involve multiple stages and tasks.
Machine learning workflows: Airflow can be used to automate the deployment and management of machine learning models, including training, evaluation, and deployment.
Monitoring and alerting: Airflow can be used to automate the process of monitoring data and triggering alerts when certain conditions are met.
To use Apache Airflow, you need to have a basic understanding of Python and know how to create virtual environments. Once you've installed Airflow and set up the environment, you can start defining and executing workflows using the Airflow UI and Python code.
Before diving into Apache Airflow, there are a few prerequisites to keep in mind. To ensure a smooth and successful Airflow experience, it is recommended that your environment meets the following requirements:
Python version: Airflow is compatible with Python 3.7, 3.8, 3.9, and 3.10.
Supported databases: Airflow has been tested with PostgreSQL versions 11, 12, 13, 14, and 15, MySQL versions 5.7 and 8, SQLite version 3.15.0 or later, and experimental support for MSSQL versions 2017 and 2019.
Kubernetes: Airflow has been tested with Kubernetes versions 1.20.2, 1.21.1, 1.22.0, 1.23.0, and 1.24.0.
It's important to make sure your environment meets these requirements to ensure the best possible experience with Apache Airflow.
Here are the official resources for downloading Apache Airflow and related documentation:
Download Apache Airflow: The latest version of Apache Airflow can be downloaded from the official Apache Airflow website at https://airflow.apache.org/.
Installation Guide: Detailed installation instructions can be found in the official Apache Airflow documentation at https://airflow.apache.org/docs/stable/installation.html.
User Guide: A comprehensive guide to using Apache Airflow can be found in the official documentation at https://airflow.apache.org/docs/stable/userguide/index.html.
Tutorials: There are several tutorials available that cover different aspects of Apache Airflow, including creating and managing data pipelines. These tutorials can be found in the official documentation at https://airflow.apache.org/docs/stable/tutorial.html.
API Reference: The API reference for Apache Airflow can be found in the official documentation at https://airflow.apache.org/docs/stable/api.html.
Release Notes: The release notes for each version of Apache Airflow can be found in the official documentation at https://airflow.apache.org/docs/stable/releases.html.
These resources should provide you with everything you need to get started with Apache Airflow and understand how to use it effectively.
Here is a step-by-step tutorial on how to create and run a workflow using Apache Airflow:
Install Apache Airflow: The first step is to install Apache Airflow. You can install it using pip, by running the following command:
pip install "apache-airflow==2.2.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-no-providers-3.8.txt"The given command is a pip command used to install a specific version of Apache Airflow. The command is used to install Apache Airflow version 2.2.3.
The "pip install" part of the command tells pip to install a package, and the package specified is "apache-airflow==2.2.3", which means that we want to install version 2.2.3 of Apache Airflow.
The "--constraint" option is used to specify a constraint file for the installation. The constraint file contains a list of package versions that are compatible with the installed package. In this case, the constraint file is
specified as "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-no-providers-3.9.txt", which means that the package versions listed in the constraint file should not contain any versions for Python 3.9.
So, in summary, this command is used to install Apache Airflow version 2.2.3 while ensuring that the installed packages do not contain any versions for Python 3.9.
Initialize the Airflow database: Before using Apache Airflow, you need to initialize its database by running the following command:
jai@MacBook-Air airflow % airflow db init
Once you've followed these steps, you should have Apache Airflow installed and ready to use on your computer.
It will create the airflow folder in your root directory, so navigate to it:
jai@MacBook-Air ~ % cd airflow
jai@MacBook-Air airflow % ls
airflow-scheduler.err airflow-scheduler.pid airflow-webserver.out airflow.db
airflow-scheduler.log airflow-webserver.err airflow-webserver.pid logs
airflow-scheduler.out airflow-webserver.log airflow.cfg webserver_config.py
Prior to diving into the intricacies of the Airflow metastore database (airflow.db), we'll establish an Airflow user and establish the necessary environment for accessing the database. Afterwards, we'll explore the process for accessing the database and the significance of editing the airflow.cfg file.
jai@MacBook-Air airflow % airflow users create \
--username jai \
--firstname jai \
--lastname giri \
--role Admin \
--email admin@example.org
The Apache Airflow system operates through two essential components - the Webserver and the Scheduler. To properly evaluate and execute your DAGs, it's necessary to run both. Let's begin by starting the Webserver in the background, or daemon mode,
with the following command:
jai@MacBook-Air airflow % airflow webserver --daemon
or
jai@MacBook-Air airflow % airflow webserver -D
With the Webserver now running in the background, we can similarly launch the Scheduler using the following command:
jai@MacBook-Air airflow % airflow scheduler -D
or
jai@MacBook-Air airflow % airflow scheduler --daemonAccess the Airflow UI: To access the Airflow UI, open your web browser and navigate to http://localhost:8080. You should see the Airflow dashboard, where you can manage and monitor your workflows.

Example 1:
Create your first workflow: To create your first workflow, you'll need to write some Python code. A basic workflow in Airflow consists of one or more tasks, which are defined using Python functions. For example, here's a simple workflow that prints "Hello, World!" to the console:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2021, 1, 1),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'hello_world',
default_args=default_args,
description='A simple example of a DAG',
schedule_interval=timedelta(hours=1),
)
def hello_world_task():
print('Hello, World!')
hello_world_task = PythonOperator(
task_id='hello_world_task',
python_callable=hello_world_task,
dag=dag,
)
You can save this code as a Python script and run it to create your first workflow in Airflow.
These are the basic steps to download, install, and use Apache Airflow. Once you have Airflow set up, you can start exploring its many features and using it to automate and manage your data pipelines.
Another example2:
That shows how to use Apache Airflow to automate a simple data pipeline. In this example, we'll use Airflow to extract data from a CSV file, transform it, and load it into a database.
python3 -m venv airflow-env
source airflow-env/bin/activate
pip install apache-airflow
jai@MacBook-Air airflow % airflow db init id,name,age
1,John Doe,35
2,Jane Doe,30
3,Bob Smith,40
csv library. Here's an example:import csv
def extract_data_from_csv(**kwargs):
file_path = '/path/to/sample.csv'
data = []
with open(file_path, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
data.append(row)
return data
full_name, which is the concatenation of name and age. Here's an example:def transform_data(**kwargs):
ti = kwargs['ti']
data = ti.xcom_pull(task_ids='extract_data_from_csv')
for row in data:
row['full_name'] = row['name'] + '-' + str(row['age'])
return data
import sqlite3
def load_data_into_database(**kwargs):
ti = kwargs['ti']
data = ti.xcom_pull(task_ids='transform_data')
conn = sqlite3.connect('/path/to/db.sqlite')
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS data (
id INTEGER PRIMARY KEY,
name TEXT,
age INTEGER,
full_name TEXT
)
''')
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'you',
'depends_on_past': False,
'start_date': datetime(2022, 1, 1),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
'data_pipeline',
default_args=default_args,
description='A simple data pipeline example',
schedule_interval=timedelta(hours=1),
)
extract_data_task = PythonOperator(
task_id='extract_data_from_csv',
python_callable=extract_data_from_csv,
dag=dag,
)
transform_data_task = PythonOperator(
task_id='transform_data',
python_callable=transform_data,
dag=dag,
)
load_data_task = PythonOperator(
task_id='load_data_into_database',
python_callable=load_data_into_database,
dag=dag,
)
extract_data_task >> transform_data_task >> load_data_task
airflow webserver -p 8080
That's it! You now know how to use Apache Airflow to automate a simple data pipeline. With this knowledge, you can start building more complex pipelines to handle larger datasets and more sophisticated transformations.
Extra Stuff:
Here are the steps to download and install Apache Airflow on Mac/Linux, Windows:
Create a virtual environment: It's a good practice to create a virtual environment for your Airflow installation. To create a virtual environment, run the following command:
python3 -m venv airflow-envActivate the virtual environment: Once you've created the virtual environment, activate it by running the following command:
source airflow-env/bin/activateInstall Apache Airflow: To install Apache Airflow, run the following command:
pip install apache-airflowInitialize the Airflow database: Before using Apache Airflow, you need to initialize its database by running the following command:
airflow db init
Create a virtual environment: It's a good practice to create a virtual environment for your Airflow installation. To create a virtual environment, run the following command:
python -m venv airflow-envActivate the virtual environment: Once you've created the virtual environment, activate it by running the following command:
airflow-env\Scripts\activateInstall Apache Airflow: To install Apache Airflow, run the following command:
pip install apache-airflowInitialize the Airflow database: Before using Apache Airflow, you need to initialize its database by running the following command:
airflow db init
Once you've followed these steps, you should have Apache Airflow installed and ready to use on your computer.
Start the Airflow web server: To start the Airflow web server, run the following command:
airflow webserver --daemon
or
airflow webserver -D
Start the Airflow scheduler: To start the Airflow scheduler, run the following command:
airflow scheduler -D
or
airflow scheduler --daemonAccess the Airflow UI: To access the Airflow UI, open your web browser and navigate to http://localhost:8080. You should see the Airflow dashboard, where you can manage and monitor your workflows.
deactivatefor windows:
pip uninstall apache-airflow
for macOS/Linux:
pip3 uninstall apache-airflowDelete the airflow home folder: Apache Airflow creates a home folder during installation, which stores configuration and log files. You can delete the home folder after uninstalling Apache Airflow. The default location for the home folder is ~/airflow, but you can check the location of your home folder in the airflow.cfg file.
Remove the virtual environment: If you have used a virtual environment for installing Apache Airflow, you can remove the environment after uninstalling Apache Airflow. You can do this by deleting the folder that contains the environment.
That's it! You have successfully uninstalled Apache Airflow from your system.
Conclusion
In conclusion, Apache Airflow is a powerful open-source platform for managing and scheduling data pipelines. With its ability to run on different operating systems, support for multiple databases, and comprehensive documentation, Apache Airflow is a great choice for anyone looking to manage their data pipelines efficiently. Whether you're a data engineer, data scientist, or data analyst, Apache Airflow can help you automate and streamline your data workflows. With the resources provided above, you should be able to get started with Apache Airflow easily and quickly.
DigitalOcean Sign Up : If you don't have a DigitalOcean account yet, you can sign up using the link below and receive $200 credit for 60 days to get started: Start your free trial with a $200 credit for 60 days link below: Get $200 free credit on DigitalOcean ( Note: This is a referral link, meaning both you and I will get credit.)
👩💻🔍 Explore Python, Django, Django-Rest, PySpark, web 🌐 & big data 📊. Enjoy coding! 🚀📚