-
16 May 2024
Case Study: MLOps platform for scaling deep learning model training and inferencing.
Key information
Challenge
Our client confronted the challenge of scaling deep learning models to handle high traffic and large image datasets efficiently. They sought a solution that would democratize access to these models, enabling data scientists to test them rapidly and run batch predictions without the complexity of managing infrastructure.
Project Description
Our team in this MLOps project focused on deploying a web application on Azure Kubernetes Service (AKS) to democratize deep learning model access. Leveraging Azure’s powerful infrastructure and services, we designed and implemented a solution that enabled data scientists to quickly test and deploy models, while ensuring scalability and cost-efficiency
Solution
An ML platform which automatically creates an experiment to test and train deep learning models, and then containerizes the trained models for online and batch inferences.
Key implementations:
- Automated process of CI/CD that meets organization’s standards and best practices.
- Horizontal and vertical resource autoscaling (number of workers and GPU).
- Model versioning, model monitoring, and model retraining pipelines (triggered if data / concept drift is detected).
- Containerization of trained ML models as microservices for optimized model inference.
Outcome/Benefits
The deployment of the web application on AKS yielded significant benefits for our client. By customizing autoscaling of workers in Kubernetes, we achieved a cost-effective solution for handling high traffic and large datasets. This approach proved to be less expensive than Azure Machine Learning (AML) managed endpoints, offering greater flexibility and control over resource allocation.
Furthermore, our optimization of GPU usage through memory sharing among workers reduced redundant costs associated with GPU underutilization, enhancing overall cost-efficiency.
The implementation of CI/CD pipelines using Github Actions allowed for seamless testing, validation, and deployment of new features, empowering our client to iterate quickly and deliver value to their end-users.
Conclusion
Through strategic utilization of Azure’s powerful infrastructure and technologies like Docker, Kubernetes, and PyTorch, we successfully addressed our client’s challenges and delivered a scalable, cost-effective, and user-friendly platform for democratizing deep learning in the FMCG industry. This project exemplifies our commitment to innovation and efficiency, driving tangible business outcomes for our clients in a rapidly evolving digital landscape.
We’re available for new projects
Contact us
Contact us to see how we can help you.
We’ll get back to you within 4 hours on working days (Mon – Fri, 9am – 5pm).
Dominik Radwański
Service Delivery Partner
Address
PolandDS Stream sp. z o.o.
Grochowska 306/308
03-840 Warsaw, Poland
United States of America
DS Stream LLC
1209 Orange St,
Wilmington, DE 19801