What is Apache Airflow?
Apache Airflow main principles:
- Scalable – due to its modular architecture Airflow can be scaled infinitely.
- Dynamic – Airflow pipelines are defined in Python, which allows for dynamic pipeline generation.
- Open Source – there are no barriers or lengthy procedures here. A community of many active users is willing to share their experiences.
- Extensible – easy to define operators and extend libraries to fit the environment.
- Elegant – Airflow pipelines are lean, explicit and user friendly.
Apache Airflow services we perform:
- Deploying and monitoring Airflow instances
- Migrating Airflow instances
- Migrating workflows
- Upgrading Airflow to newest versions
- Resolving issues with Airflow components
- Spotting and fixing Airflow bugs
- Writing DAGs with all kinds of operators
- Writing custom plugins
With our extensive practical experience, we can provide comprehensive services for Apache Airflow:
Platform:
- on premises or in the Cloud
- hardware scaling
- fault tolerance
Software:
- required components
- choosing an approach to prepare the optimal workflow build
Security:
- implementation of SSO authentication
- key vault utilization for credentials or sensitive data storage
- designing multiple levels of access for specific target groups
- installation of all prerequisites on the selected platform
- Airflow installation in the selected environment: bare metal, virtual machines, containerized in Docker, orchestrated via Kubernetes
- building DAGs using Python: static or dynamic
- building custom operators not available out of the box
- setting up automated monitoring or alerting for whole workflows or selected tasks
- building custom user interfaces integrated with Airflow (using JavaScript) via dedicated operators that dynamically trigger building specific tasks or workflows based on user input
- monitoring the DAG execution progress, access to logs, and other functions
- testing implemented solutions
- error debugging, including source code analysis both for an implemented solution and Airflow itself
Your benefits with Airflow:
- low cost
- fast and smooth DAG synchronization
- the ability to debug issues deep inside Pods or Nodes
- effective autoscaling
- newest Airflow releases
- easy access to logs
Our team of 40+ experts has successfully delivered projects for global clients in the area of Airflow implementation:
- Successfully deployed over 70 instances of Airflow on Azure Kubernetes through CICD pipelines with autoscaling, Istio service mesh enabled and monitoring dashboards in Grafana, including alerts
- L4 support for over 70 instances of Apache Airflow in organization
- Migrated over 70 instances of Airflow from VMs to Kubernetes
- Experienced in upgrading Airflow to newer versions
- Experienced in migrating workflows from Apache Ozzie to Apache Airflow
- Experienced in investigating and resolving scheduler performance issues
- Able to spot and fix Airflow bugs without assistance from the Airflow community if it is required in a short period of time
- Experienced in writing DAG’s with any kind of operators for Azure services, Google Cloud Platform services, Spark, JDBC, and others
- Experienced in writing custom plugins when necessary
Clients
They were very impressive with their thoroughness of research and approach to kicking off the project.
Adam Murray,
Head of Product Development, Sportside
Their commitment, knowledge, and good communication resulted in high performance and a comfortable work atmosphere.
Maciej Moscicki,
CEO, Macmos Stream
Blog
Apache Airflow FAQ
What is Airflow?
Apache Airflow is an open-source workflow management platform started in October 2014 at Airbnb. Airflow lets you programmatically author, schedule, and monitor data workflows via the built-in user interface. Airflow is a data transformation pipeline ETL (Extract, Transform, Load) workflow orchestration tool.
What problems does Airflow help to resolve?
It helps you programmatically control workflows by setting task dependencies and monitoring tasks within each DAG in a Web UI. Airflow offers detailed logs for each task in very complex workflows.
What are the fundamentals of Airflow?
- Scalable: Airflow is ready for infinite scaling.
- Dynamic: Pipelines defined in Python allow for dynamic pipeline generation.
- Extensible: Operators are easily defined.
- Elegant: Airflow pipelines are lean and coherent.
When should you use Apache Airflow in your organization?
If you are in need of an open-source workflow automation tool, you should definitely consider adopting Apache Airflow. This Python based technology makes it easy to set up and maintain data workflows.
Get in touch with us
Contact us to see how we can help you.
We’ll get back to you within 4 hours on working days (Mon – Fri, 9am – 5pm).
Dominik Radwański
Service Delivery Partner
Address
PolandDS Stream sp. z o.o.
Grochowska 306/308
03-840 Warsaw, Poland
United States of America
DS Stream LLC
1209 Orange St,
Wilmington, DE 19801