Apache Airflow is open-source software that lets you programmatically author, schedule and monitor data pipelines. Airflow is written in Python, and workflows are created via Python scripts. Thanks to plug-and-play operators, it is possible to execute tasks on Microsoft Azure, Google Cloud Platform, or Amazon Web Services. Apache Airflow provides an API interface and a WebUI interface for diagram visualization and monitoring. A log, task history and jinja templates are available, which significantly expands the possibilities of the code.

What is Apache Airflow?

Apache Airflow main principles:

  • Scalable – due to its modular architecture Airflow can be scaled infinitely.
  • Dynamic – Airflow pipelines are defined in Python, which allows for dynamic pipeline generation.
  • Open Source – there are no barriers or lengthy procedures here. A community of many active users is willing to share their experiences.
  • Extensible – easy to define operators and extend libraries to fit the environment.
  • Elegant – Airflow pipelines are lean, explicit and user friendly.

Apache Airflow services we perform:

  • Deploying and monitoring Airflow instances
  • Migrating Airflow instances
  • Migrating workflows
  • Upgrading Airflow to newest versions
  • Resolving issues with Airflow components
  • Spotting and fixing Airflow bugs
  • Writing DAGs with all kinds of operators
  • Writing custom plugins

With our extensive practical experience, we can provide comprehensive services for Apache Airflow:

Rs-service

Platform:

  • on premises or in the Cloud
  • hardware scaling
  • fault tolerance

Software:

  • required components
  • choosing an approach to prepare the optimal workflow build

Security:

  • implementation of SSO authentication
  • key vault utilization for credentials or sensitive data storage
  • designing multiple levels of access for specific target groups

Rs-service
  • installation of all prerequisites on the selected platform
  • Airflow installation in the selected environment: bare metal, virtual machines, containerized in Docker, orchestrated via Kubernetes

Rs-service
  • building DAGs using Python: static or dynamic
  • building custom operators not available out of the box
  • setting up automated monitoring or alerting for whole workflows or selected tasks
  • building custom user interfaces integrated with Airflow (using JavaScript) via dedicated operators that dynamically trigger building specific tasks or workflows based on user input
  • monitoring the DAG execution progress, access to logs, and other functions

Rs-service
  • testing implemented solutions
  • error debugging, including source code analysis both for an implemented solution and Airflow itself

Your benefits with Airflow:

  • low cost 
  • fast and smooth DAG synchronization 
  • the ability to debug issues deep inside Pods or Nodes
  • effective autoscaling
  • newest Airflow releases
  • easy access to logs

Our team of 40+ experts has successfully delivered projects for global clients in the area of Airflow implementation:

  • Successfully deployed over 70 instances of Airflow on Azure Kubernetes through CICD pipelines with autoscaling, Istio service mesh enabled and monitoring dashboards in Grafana, including alerts
  • L4 support for over 70 instances of Apache Airflow in organization
  • Migrated over 70 instances of Airflow from VMs to Kubernetes
  • Experienced in upgrading Airflow to newer versions
  • Experienced in migrating workflows from Apache Ozzie to Apache Airflow
  • Experienced in investigating and resolving scheduler performance issues
  • Able to spot and fix Airflow bugs without assistance from the Airflow community if it is required in a short period of time
  • Experienced in writing DAG’s with any kind of operators for Azure services, Google Cloud Platform services, Spark, JDBC, and others
  • Experienced in writing custom plugins when necessary
Our clients were featured in

Our Clients

Fortune
Gartner
Fast Company
Forbes
Discover our latest news & blog posts

Our blog

Apache Airflow FAQ

What is Airflow?

Apache Airflow is an open-source workflow management platform started in October 2014 at Airbnb. Airflow lets you programmatically author, schedule, and monitor data workflows via the built-in user interface. Airflow is a data transformation pipeline ETL (Extract, Transform, Load) workflow orchestration tool.

What problems does Airflow help to resolve?

It helps you programmatically control workflows by setting task dependencies and monitoring tasks within each DAG in a Web UI. Airflow offers detailed logs for each task in very complex workflows.

What are the fundamentals of Airflow?

  • Scalable: Airflow is ready for infinite scaling.
  • Dynamic: Pipelines defined in Python allow for dynamic pipeline generation.
  • Extensible: Operators are easily defined.
  • Elegant: Airflow pipelines are lean and coherent.

When should you use Apache Airflow in your organization?

If you are in need of an open-source workflow automation tool, you should definitely consider adopting Apache Airflow. This Python based technology makes it easy to set up and maintain data workflows.

Get in touch with us

Contact us to see how we can help you.
We’ll get back to you within 4 hours on working days (Mon – Fri, 9am – 5pm).

Piotr Iwanicki

Business Developer

Address

Grochowska 306/308
03-840 Warsaw

Mail us

hello@dsstream.com

    The Controller of your personal data is DS Stream sp. z o.o. with its registered office in Warsaw (03-840), at ul. Grochowska 306/308. Your personal data will be processed in order to answer the question and archive the form. More information about the processing of your personal data can be found in the Privacy Policy.