Apache has released a new version of Airflow. Everyone who uses this tool knows that minor changes can transform how DAGs work or totally block them. Apache Airflow version 2.0 is not yet available on cloud platforms, but Data Pipeline is our domain so we review what’s new. Here we’ll describe our first impressions and what changes you should be prepared for in the main version.
Airflow 2.0 has arrived — the biggest differences between Airflow 1.10.x and 2.0

Airflow 2.0 has arrived — the biggest differences between Airflow 1.10.x and 2.0
New User Interface, Airflow
Airflow 2.0 got a totally new look based on the Flask App Builder module. With a new dashboard, it is now easier to find the information you need and navigate your DAGs. This version has additional filters to facilitate the search for specific diagrams and displayed tags.


The Dag Run screen also has a new screen layout with additional information like "Run Type," "External Trigger," or information about the applied configuration.
On the task screen, you'll find a field with a documentation section, which can be very helpful in knowledge transfer from the development phase to the support phase. Also, a very useful "Auto-Refresh" switch has appeared on the DAG screen. If you're monitoring the execution of your code in a diagram, you can enable this to focus on other activities.
If you create a very complex DAG with many tasks, you can aggregate your tasks into logical groupings. This solution helps you identify at which stage your process is stuck. Imagine how difficult it would be to determine in which step the ETL process failed if your DAG has hundreds of tasks. Now you can group them into sections. You can also nest sections within a section to more easily find problematic logic groups.


Sometimes even a team of developers using coding standards has difficulty figuring out where connections are being used. The new version of Airflow has not only added new connection types but also a description field, making it easier to identify what a connection is used for.

Another interesting part is "Plugins" available in the "Admin" menu. This page contains information about your installed plugins. To be honest, maybe it's not a full management plugin engine, but it provides information about installed extensions and helps developers identify conflicts if they don't have administrator access to the system.

Airflow is increasingly used as a component of larger systems. If you are considering integrating your system with Airflow, know that you will now have good API documentation. You use Swagger? There you go. You don't know Swagger, but you know Redoc? No problem. The new version of Airflow provides well-documented APIs for both components. This way, you can use Airflow without part of the user interface.


A Redesigned Scheduler, Airflow
You may have experienced multiple DAG execution problems in previous versions due to scheduler bugs, for example:
- Delays in task pickup (lag when switching from one task to the next task),
- Problems with retries or with distribution to workers.
The scheduler is a core functionality of Apache Airflow. In the second version of Airflow, the focus was on improving key elements to reduce delays and enable many schedulers to run in simultaneous mode with horizontal scaling without tasks being missed during scheduler replication. The new version also optimizes resource usage, as the scheduler works faster without increasing CPU or memory. Version 2.0 provides a High-Availability manifest, so if a system uses more than one scheduler, we expect zero downtime. This is possible because each scheduler does everything in independent mode.
This solution helps improve performance when executing many DAGs in parallel mode; in some tests, performance increased tenfold.

REST API
Airflow can now be a complete management tool. With the REST API, you can check all DAGs, trigger them, and manage task instances. This is only a small part of what this API can do for you. It offers a method to add new connections but also lists them. The API provides information about existing connections, so you can use this in other systems. It is possible to read variables stored in Airflow or XCom results, so you can track the results of a task while processing a DAG. Of course, you can also easily control DAG execution, including DAG runs with a specified configuration, or view a simple DAG representation.
Smart Sensor
Airflow 2.0 offers an extended Sensor operator called the SmartSensor operator. The SmartSensor operator can check the status of tasks in batch mode and store information about this sensor in the database. It also improves performance and resolves a previous heavy sensor issue.
DAG Versioning
In the previous version of Airflow, you could add new tasks to an existing DAG. However, this had some undesirable effects, such as orphaned tasks (tasks without status) in DAGRUN, so you could find tasks added in a current version of a DAG in the previous execution. For this reason, you might have had problems checking logs or viewing code assigned to the current DAGrun. In version 2.0, there is additional support for storing many versions of serialized DAGs, correctly displaying the relationships between DAGRuns and DAGs.
DAG Serialization
The new version has changed how the system server parses DAGs. In the previous version, the WebServer and the Scheduler component needed access to the DAG file. In the new version of Airflow, only the scheduler needs access to the DAG file; the scheduler parsing DAGs only needs access to the Metadata Database, and the Web Server only needs access to metadata. From this change, we get:
- High availability for the scheduler
- DAG versioning
- Faster DAG deployment
- Lazy DAG loading
- Now the Web Server is stateless
A New Way to Define DAGs
The new version offers a new way to define DAGs with the TaskFlow API. Now you can use Python decorators to define your DAG.
Regarding the example above, you can write simple DAGs faster in pure Python with clear handle dependency. Also, XCom push is easier to use. Additionally, the new version offers task decorators and also supports a custom XCom backend.
What Else?
Airflow 2.0 is not monolithic. It has been split into its core and 61 provider packages. Each of these packages is intended for a specific external service, a particular database (MySQL or Postgres), or a protocol (HTTP/FTP). This allows you to perform a custom Airflow installation and build a tool according to your individual requirements.
Additionally, you get:
- Extended support for the Kubernetes Executor
- Plugin manager
- KEDA Queues
Cloud backup services how do you select the best option
Best cloud storage for business in 2021
Stream processing vs batch processing a practical guide to data processing