Service Level Agreement
Share this post

When you use a data orchestration tool, you need to both be able to learn quickly when something isn’t working the way it’s supposed to and have a good strategy to fix problems when they occur. The Apache Airflow SLA (Service Level Agreement) comes with a quite useful notification mechanism that can inform you about inefficiencies. Why is it a useful solution and how do you use it? Read our article to learn more.

In fact, many service providers equip their users with standard SLAs in the business world – explicitly describing the level of services that both sides of the contract have agreed upon. However, you should be aware that Airflow approaches the matter a bit differently. Its SLA sets the amount of time a task or DAG needs to run. Airflow provides users with a mechanism that sends a notification in case a task or DAG does not meet the expected timing for the SLA. The notification doesn’t have to be an e-mail, of course – it can also be sent to your Slack account. But let’s start from the very beginning…

What is an SLA – service level agreement?

In the business world, a Service Level Agreement (SLA) is a document that contains the level of service expected by a customer and guaranteed by the vendor. It usually mentions the metrics which will be used to assess if the service conforms with the contract. The SLA also includes steps that should be taken in the absence of such conformity and the penalties for the service provider if the agreed-on service level is not achieved. The SLA is mostly between two different companies (business partners or a service provider and a customer), but it can also be between two departments of the same organization. The most popular types of SLA are:

  • Customer SLA – between a provider and an external customer (a person or organization that is not a part of the provider’s company).
  • Internal SLA – between two departments of the same institution.
  • Multilevel SLA – in which many parties are involved. 

SLA contract elements

Service Level Agreements commonly consist of two major components that define services and management. Each should provide contractors with a crucial set of information, making it possible for them to ensure the agreed-on level of service.

The service element of an SLA contract includes the specifics of the services offered by the provider. It is a good standard practice in an SLA to also explain what is excluded from the service so there is no room for doubt. A service level agreement is a rather complex document. It should describe the services in detail for the sake of business transparency. Therefore, the conditions of service availability, the responsibilities of all parties involved, escalation procedures and additional costs should be a part of such a contract.

The management-related element of the SLA document explains how the level of service should be measured. Metrics should be listed, based on which parties will be able to assess the quality of service. All the methods, standards, reporting processes and frequency of service level assessments should be described in this part of the SLA contract. Moreover, it is a place to clarify how the dispute resolution process should be carried out if necessary. And of course, you can’t forget about possible SLA updates that may be necessary. The reality of the market changes dynamically, and you may need to adjust the SLA contract to new conditions, expectations or regulations. The methods of keeping the SLA up-to-date should also be described in the SLA contract.

Importance of SLA reporting

Now, surely you understand why you need an SLA for the services you want to leverage in your company. It is quite an important part of a contract – especially an IT contract. A service level agreement pulls together essential information regarding your contracted service. From the SLA document you can learn what the guaranteed performance is of a solution you’re paying for and how you can check if the quality of service is not lower than you agreed to.

With an SLA, you don’t need to establish your own methods and strategy for measuring performance (but of course, you can leverage additional solutions to satisfy your curiosity or double-check). The entire process of measuring the service level, the frequency of this event and the tools and metrics to be used are described in the SLA document for all parties. It also includes the responsibilities and expectations of both the provider and the customer (or other institutions).

Thanks to the SLA:

  • you don’t need to waste your time planning the assessment process for a new service.
  • any doubts regarding low service level are eliminated, so there are no disagreements between the parties. 
  • all parties involved are protected by a complex agreement that covers the most important matters.

Service level agreement – examples of use in the IT industry

Companies all over the world make an effort to measure their performance, security, etc. IT organizations leverage various monitoring tools to be sure that the efficiency of leveraged business and technological solutions is at the right level. Still, these monitoring tools can only provide users with data for assessing the productive capabilities of the company, but they can’t really help much to increase this level.

Are you wondering precisely what SLA contracts are used for in IT? Their purpose is mostly to ensure:

  • high level of service availability – the goal of the service provider is to ensure that the particular service is available when it should be, according to the SLA.
  • low defect rates – the SLA will include the number or percentage of errors in major deliverables that can occur, and the provider should keep the actual defect rate below this threshold.
  • reliability – when a company invests in business tools, it expects that it will be able to leverage applications with all their features and at their full potential when the service should be available.
  • security – with strict regulatory requirements, organizations have to make sure that their data are kept safe all the time, so the service provider has to take care of antivirus updates and use modern cybersecurity solutions.

Service Level Agreement – Airflow

And so, what is in the SLA for Airflow? You may be wondering how Apache Airflow monitors performance of the tasks and DAGs it runs. The SLA is understood as the time by which task/DAGs should have succeeded. If some of them are not run according to the requirements (in the predefined time), an email alert will be sent to the user. If there is an “SLA miss,” that means that a certain task or DAG exceeded the expected completion time. The notification will contain details about tasks that missed their SLA. Additionally, the event is recorded in the database, and you can view it with details in Airflow’s web UI.

The SLA monitoring functionality which is responsible for sending a notification can be leveraged after adding just a line of code in your task operator. The check for potential violations is conducted at the start and at the end of the task – and if it is spotted, the email alert will be sent after the execution of the task is completed. You should know that the SLA defined at the task level is the time from the beginning of the DAG execution – not the task execution (defining the SLA based on tasks is optional).

To sum up

The email alert feature of Airflow’s SLA is very useful. It simply makes your day-to-day work easier, as you are aware of failures that occur, and you can react properly. In this way, you can ensure the highest efficiency of your data pipelines. Don’t hesitate to contact us if you need some assistance with setting up your SLAs. We’ll be happy to share more valuable tips on using Apache Airflow with you. Data Pipeline Automation_2

Author

  • Tomasz is a Kubernetes Team Leader and CI/CD expert, evangelizing DevOps culture in DS Stream. For our customers, Tomasz is delivering end-to-end MLOps solutions on GCP and architecting Airflow as a Service mutli-cloud product. Never stopping to learn new technologies and spreading them in the organization. In previous life was Barça and Premier League fan, currently all free time spending on preparing a 2-year-old son to be a Robert Lewandowski's successor.

    View all posts
Share this post

Tomasz Stachera

Tomasz is a Kubernetes Team Leader and CI/CD expert, evangelizing DevOps culture in DS Stream. For our customers, Tomasz is delivering end-to-end MLOps solutions on GCP and architecting Airflow as a Service mutli-cloud product. Never stopping to learn new technologies and spreading them in the organization. In previous life was Barça and Premier League fan, currently all free time spending on preparing a 2-year-old son to be a Robert Lewandowski's successor.