Declarative vs Scripted Pipeline – Key differences
Share this post

Introduction

In the Fast-Moving Consumer Goods (FMCG) industry, Machine Learning Operations (MLOps) is crucial for optimizing and automating the deployment of machine learning models. This article delves into the technical aspects of key tools and technologies used in MLOps, focusing on how they can be leveraged by developers to streamline processes in the FMCG sector.

Overview of Popular MLOps Tools

TensorFlow

TensorFlow is an open-source platform for machine learning that provides a comprehensive ecosystem of libraries, tools, and community resources. It is widely used for developing, training, and deploying machine learning models.

Core Components:

  • TensorFlow Core: The primary API for building and training models.
  • TensorFlow Extended (TFX): A production-ready platform for deploying machine learning models.
  • TensorFlow Serving: A flexible, high-performance serving system for machine learning models, designed for production environments.
  • TensorFlow Lite: A lightweight solution for mobile and embedded devices.

Key Features:

  • Eager Execution: An imperative programming environment that evaluates operations immediately.
  • Keras API: A high-level API for building and training models.
  • Distribution Strategies: Support for training on multiple GPUs and distributed environments.
python
 
import tensorflow as tf
 
from tensorflow import keras
 
# Define a simple sequential model
 
model = keras.Sequential([
 
    keras.layers.Dense(128, activation=’relu’, input_shape=(784,)),
 
    keras.layers.Dropout(0.2),
 
    keras.layers.Dense(10, activation=’softmax’)
 
])
 
# Compile the model
 
model.compile(optimizer=’adam’,
 
              loss=’sparse_categorical_crossentropy’,
 
              metrics=[‘accuracy’])
 
# Train the model
 
model.fit(train_images, train_labels, epochs=5)
 
# Save the model
 
model.save(‘my_model’)

Kubernetes

Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications. It is essential for managing complex microservices architectures in MLOps.

Core Components:

  • Pods: The smallest deployable units, which can contain one or more containers.
  • Services: Defines a logical set of Pods and a policy by which to access them.
  • Deployments: Manages the deployment of Pods.
  • ConfigMaps and Secrets: Manages configuration data and sensitive information.

Key Features:

  • Horizontal Pod Autoscaling: Automatically adjusts the number of Pods based on CPU utilization or other select metrics.
  • Helm: A package manager for Kubernetes that helps in defining, installing, and upgrading complex Kubernetes applications.

Example Usage:

yaml
 
apiVersion: apps/v1
 
kind: Deployment
 
metadata:
 
  name: my-ml-app
 
spec:
 
  replicas: 3
 
  selector:
 
    matchLabels:
 
      app: my-ml-app
 
  template:
 
    metadata:
 
      labels:
 
        app: my-ml-app
 
    spec:
 
      containers:
 
      – name: ml-container
 
        image: my-ml-image
 
        ports:
 
        – containerPort: 8080
 
—
 
apiVersion: v1
 
kind: Service
 
metadata:
 
  name: my-ml-service
 
spec:
 
  selector:
 
    app: my-ml-app
 
  ports:
 
    – protocol: TCP
 
      port: 80
 
      targetPort: 8080
 
  type: LoadBalancer

DS STREAM Implementation with Kubernetes

For a client in the FMCG sector, DS Stream set up a web application on Azure Kubernetes Service (AKS) to make deep learning models easily accessible. We used Kubernetes’ horizontal pod autoscaling to ensure the app could handle high traffic smoothly, adjusting resources as needed.

In another project, DS Stream utilized Kubernetes namespaces to keep environments separate and allocate resources properly on Azure for model inferencing tasks. This method helped us manage resources efficiently and save costs through shared infrastructure.

MLflow

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experimentation, reproducibility, and deployment.

Core Components:

  • MLflow Tracking: Records and queries experiments: code, data, config, and results.
  • MLflow Projects: A format for packaging data science code in a reusable and reproducible way.
  • MLflow Models: A format for packaging machine learning models to make them easy to deploy.
  • MLflow Registry: A centralized model store to collaboratively manage the full lifecycle of an MLflow Model.

Key Features:

  • Experiment Tracking: Log parameters, code versions, metrics, and output files.
  • Model Packaging: Package models in various formats (e.g., Python, R, Java).
  • Deployment: Deploy models to various platforms, including REST APIs, cloud services, and edge devices.
python
 
import mlflow
 
import mlflow.sklearn
 
from sklearn.ensemble import RandomForestRegressor
 
# Set the experiment name
 
mlflow.set_experiment(‘my-experiment’)
 
# Start a new run
 
with mlflow.start_run():
 
    # Train a model
 
    model = RandomForestRegressor(n_estimators=100)
 
    model.fit(X_train, y_train)
 
 
 
    # Log model parameters
 
    mlflow.log_param(‘n_estimators’, 100)
 
 
 
    # Log model metrics
 
    mlflow.log_metric(‘rmse’, rmse)
 
 
 
    # Log the model
 
    mlflow.sklearn.log_model(model, ‘random-forest-model’)

Kubeflow

Kubeflow is an open-source platform designed to make deployments of machine learning workflows on Kubernetes simple, portable, and scalable.

Core Components:

  • Kubeflow Pipelines: A platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers.
  • Katib: A Kubernetes-native project for automated hyperparameter tuning.
  • KFServing: A system for serving machine learning models on Kubernetes, optimized for inference workloads.

Key Features:

  • Reproducible Pipelines: Create and manage portable, scalable ML workflows.
  • Hyperparameter Tuning: Automate the search for the best hyperparameters.
  • Model Serving: Deploy and serve models with high performance and scale.

Example Usage:

python
 
import kfp
 
from kfp import dsl
 
@dsl.pipeline(
 
   name=’My Pipeline’,
 
   description=’An example pipeline’
 
)
 
def my_pipeline():
 
    # Define pipeline steps
 
    train_op = dsl.ContainerOp(
 
        name=’Train Model’,
 
        image=’gcr.io/my-project/train-image:latest’,
 
        arguments=[‘–model-dir, ‘/mnt/models’]
 
    )
 
    serve_op = dsl.ContainerOp(
 
        name=’Serve Model’,
 
        image=’gcr.io/my-project/serve-image:latest’,
 
        arguments=[‘–model-dir, ‘/mnt/models’]
 
    )
 
# Compile the pipeline
 
kfp.compiler.Compiler().compile(my_pipeline, ‘my_pipeline.yaml)

DS STREAM Implementation with Kubeflow

In a project to centralize operations on Google Cloud Platform (GCP), DS Stream used Kubeflow to enhance machine learning workflows. By implementing Kubeflow Pipelines, we automated the deployment of ML workflows, making them scalable and portable across different cloud setups.

Another project involved moving code to Kubeflow on GCP. This process included separating environments with Kubernetes namespaces, using CI/CD branching strategies on GitHub, running data validation checks, setting up Docker images, and allocating resources for model training. These steps ensured a smooth and reusable transition from development to production.

Integrating MLOps Tools into FMCG Business Processes

Integrating MLOps tools into FMCG business processes involves several key steps:

Data Ingestion and Processing:

  • Use tools like Apache Kafka or Google Pub/Sub for real-time data ingestion.
  • Process data using Apache Beam or Spark, and store processed data in data warehouses like BigQuery or Snowflake.

Model Training and Experimentation:

  • Utilize MLflow or TensorFlow for model training and experimentation.
  • Implement distributed training using Kubernetes and TensorFlow.

DS STREAM Implementation of Model Training:

DS Stream created an automated CI/CD pipeline with GitHub Actions to manage the deployment and scaling of worker pods in AKS. This setup made batch inferencing and model monitoring efficient, ensuring high performance and scalability while keeping costs low through smart resource use.

Model Deployment and Serving:

  • Deploy models using KFServing or TensorFlow Serving.
  • Use Kubernetes for managing containerized applications and ensuring scalability.

DS STREAM Implementation of Model Deployment:

In one project, DS Stream prepared Docker images and set up resource allocation for GPU-based model inferencing on Azure. This approach ensured efficient deployment and scaling of worker pods based on incoming requests, resulting in significant cost savings compared to traditional managed endpoints.

Monitoring and Maintenance:

  • Implement monitoring using Prometheus and Grafana to track model performance.
  • Use MLflow and Kubeflow for continuous model validation and retraining.

DS STREAM Implementation of Monitoring and Maintenance

For model monitoring, DS Stream used OpenTelemetry to keep track of application performance and detect data drift. Automated processes for retraining and monitoring models ensured that the deployed models remained accurate and reliable over time.

Conclusion

For developers in the FMCG industry, mastering the tools and technologies of MLOps is essential for optimizing machine learning workflows and enhancing business operations. By leveraging TensorFlow, Kubernetes, MLflow, and Kubeflow, developers can build robust, scalable, and efficient ML systems that drive significant value for their organizations. Understanding the technical aspects and integration strategies of these tools will enable developers to streamline processes, reduce operational costs, and deliver high-quality ML solutions.

FAQ: Technical Guide to MLOps Tools and Technologies for FMCG Developers

1. What are the core components of TensorFlow used in MLOps for the FMCG industry?

  • TensorFlow Core: The primary API for building and training models.
  • TensorFlow Extended (TFX): A production-ready platform for deploying machine learning models.
  • TensorFlow Serving: A high-performance serving system for production environments.
  • TensorFlow Lite: A lightweight solution for mobile and embedded devices.

2. How does Kubernetes help manage machine learning workflows in the FMCG sector?

  • Kubernetes automates the deployment, scaling, and management of containerized applications. It helps manage complex microservices architectures, supports horizontal pod autoscaling, and provides tools like Helm for packaging and deploying Kubernetes applications.

3. What is MLflow, and how is it used in MLOps for FMCG?

  • MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experimentation, reproducibility, and deployment. It offers components like MLflow Tracking for logging experiments, MLflow Projects for packaging code, and MLflow Models for packaging and deploying models.

4. What advantages does Kubeflow offer for FMCG machine learning workflows?

  • Kubeflow simplifies the deployment of machine learning workflows on Kubernetes. It provides components like Kubeflow Pipelines for building and deploying ML workflows, Katib for hyperparameter tuning, and KFServing for serving models with high performance and scalability.

5. How can FMCG developers integrate MLOps tools into business processes?

  • Developers can integrate MLOps tools by:
  • Using data ingestion tools like Apache Kafka or Google Pub/Sub.
  • Processing data with Apache Beam or Spark and storing it in data warehouses like BigQuery.
  • Training and experimenting with models using MLflow or TensorFlow.
  • Deploying models with KFServing or TensorFlow Serving.
  • Monitoring performance with Prometheus and Grafana and using continuous validation and retraining with MLflow and Kubeflow.

Author

  • Kuba is a recent graduate in Engineering and Data Analysis from AGH University of Science and Technology in Krakow. He joined DS STREAM in June 2023, driven by his interest in AI and emerging technologies. Beyond his professional endeavors, Kuba is interested in geopolitics, techno music, and cinema.

    View all posts
Share this post

Jakub Grabski

Kuba is a recent graduate in Engineering and Data Analysis from AGH University of Science and Technology in Krakow. He joined DS STREAM in June 2023, driven by his interest in AI and emerging technologies. Beyond his professional endeavors, Kuba is interested in geopolitics, techno music, and cinema.