Case Study: MLOps platform for scaling deep learning model training and inferencing.

Key information

Time Frame

2022

Client Industry

FMCG

Cloud Platform

Microsoft Azure

Technologies

Python, PyTorch, Docker, Kubernetes, Azure, CI/CD with GitHub Actions, PostgreSQL, ServiceBus, OpenTelemetry

Project size

9 consultants

Challenge

Our client confronted the challenge of scaling deep learning models to handle high traffic and large image datasets efficiently. They sought a solution that would democratize access to these models, enabling data scientists to test them rapidly and run batch predictions without the complexity of managing infrastructure.

Project Description

Our team in this MLOps project focused on deploying a web application on Azure Kubernetes Service (AKS) to democratize deep learning model access. Leveraging Azure’s powerful infrastructure and services, we designed and implemented a solution that enabled data scientists to quickly test and deploy models, while ensuring scalability and cost-efficiency

Solution

An ML platform which automatically creates an experiment to test and train deep learning models, and then containerizes the trained models for online and batch inferences.

Key implementations:

  • Automated process of CI/CD that meets organization’s standards and best practices.
  • Horizontal and vertical resource autoscaling (number of workers and GPU).
  • Model versioning, model monitoring, and model retraining pipelines (triggered if data / concept drift is detected).
  • Containerization of trained ML models as microservices for optimized model inference.

Outcome/Benefits

The deployment of the web application on AKS yielded significant benefits for our client. By customizing autoscaling of workers in Kubernetes, we achieved a cost-effective solution for handling high traffic and large datasets. This approach proved to be less expensive than Azure Machine Learning (AML) managed endpoints, offering greater flexibility and control over resource allocation.

Furthermore, our optimization of GPU usage through memory sharing among workers reduced redundant costs associated with GPU underutilization, enhancing overall cost-efficiency.

The implementation of CI/CD pipelines using Github Actions allowed for seamless testing, validation, and deployment of new features, empowering our client to iterate quickly and deliver value to their end-users.

Conclusion

Through strategic utilization of Azure’s powerful infrastructure and technologies like Docker, Kubernetes, and PyTorch, we successfully addressed our client’s challenges and delivered a scalable, cost-effective, and user-friendly platform for democratizing deep learning in the FMCG industry. This project exemplifies our commitment to innovation and efficiency, driving tangible business outcomes for our clients in a rapidly evolving digital landscape. 

We’re available for new projects

Contact us

Contact us to see how we can help you.
We’ll get back to you within 4 hours on working days (Mon – Fri, 9am – 5pm).

Dominik Radwański

Service Delivery Partner

Address

Poland
DS Stream sp. z o.o.
Grochowska 306/308
03-840 Warsaw, Poland

United States of America
DS Stream LLC
1209 Orange St,
Wilmington, DE 19801

    Select subject of application


    The Controller of your personal data is DS Stream sp. z o.o. with its registered office in Warsaw (03-840), at ul. Grochowska 306/308. Your personal data will be processed in order to answer the question and archive the form. More information about the processing of your personal data can be found in the Privacy Policy.