Intro to ML Deployment with GKE
Share this post

Deploying machine learning models – what is it about?

Leveraging data science and productionalizing ML models can add values to businesses. More and more companies every day leverage data to operate more efficiently. Before you start to think of what tools will be the best for your company to use during deploying a machine learning model, you need to make sure that you understand what deployment means exactly. ML models deployment is quite a complex process. Deploying machine learning models means making them available in the production environment, so they could be used for generating predictions to the rest of the systems.

A typical machine learning project starts with a business problem that can be solved using ML technology. Creating a solution may involve developing a new product or enhancing an existing one with machine learning capabilities in the form of a supervised learning model. After defining project goals, company experts need to realize if they are in possession of appropriate training and validation data, and if they exist data scientists and machine learning engineers can step in and select the right model. After training a machine learning model you can make its predictions available to other systems and users – that is deployment. Learn more about using data models in production to take advantage of data science solutions for business.


Data science is an interdisciplinary field that requires a lot of professional knowledge and experience to be understood properly. Deploying machine learning models is not an easy task. This article describes how data scientists and data engineers can deploy a machine learning model and create its real time microservice inference on Google Cloud Platform using Kubernetes, which is a container orchestration system to manage containerized applications. For simplicity and user-friendly features, Google Kubernetes Engine (GKE) has been utilized to reduce the operational costs of managing Kubernetes clusters. 

To enjoy a seamless reading experience, it is beneficial to skim through topics related to Docker and Python Flask if they sound unfamiliar. Otherwise, a brief explanation regarding them will be covered later in this article.

Deployment of machine learning model – example

The machine learning model, used for this deployment, is a binary classification algorithm that has been trained on census data. Given an individual’s personal information regarding his age, work class, education, marital-status, etc., the model predicts whether his or her annual income will be over $50k or not. 

For demonstration purposes, a microservice directory ml-census containing required files has been created as follows


  • Dockerfile
  • requirements.txt
  • model_deployment.yaml

With this particular machine learning model, it will be easier to walk through the whole machine learning model deployment process  and to discuss the details in the remaining sections of this article.

Containerizing machine learning models 

Deploying a machine learning model requires specific tools. ML models require a lot of processing power to work. Containerizing machine learning models is an approach that can help you solve that problem. It is certainly beneficial for machine learning systems, so remember that all parts of it (including model training, testing and serving) can be containerized. One of the most popular tools is Docker that enables developers to efficiently containerize their code. More on how to containerize applications on GCP using Docker, you can find this tutorial helpful.


A docker container image is a file that consists of the source code and all of its dependencies required by the code to run. Sometimes an image can be referred to a snapshot representing an application and its virtual environment at a specific moment the image or snapshot has been “taken”. 

The image is produced by a Dockerfile which is a blueprint with a set of instructions.

FROM python:3.7-slim-buster
COPY . /app
RUN pip install --upgrade pip\
    pip install -r /app/requirements.txt --no-cache-dir

Firstly, the already existed image python:3.7-slim-buster from DockerHub is pulled as a base image in order to build next layers on top of it. The files such as and requirements.txt are copied to the folder /app in the image. Then, Python modules with their appropriate versions defined in requirements.txt and required by the source code to run are going to be installed. By default, the working directory will be set to /app when running the image. Once the final image has been built and tagged as, e.g.,{project_id}/model_deployment:latest, it should be pushed to the Google Container Registry which is accessible to GKE.

Source Code

Machine learning models are not code – a machine learning model consists of a given algorithm, weights and hyper parameters which are learned from data. Once the models have been validated/reviewed, they are ready to be wrapped in a source code and then to serve their predictive capabilities/prowess.

The source code utilizes Python Flask, which is a micro web framework for developing RESTful API services.  Prior to accessing the machine learning model for predictions, some authentication should be set for security as below.

app = Flask(__name__)
def login():
    auth = request.authorization
    if auth and auth.username.lower() in users.keys()\
        and auth.password==users[auth.username]:
        token = jwt.encode({'user': auth.username, 'exp': datetime.datetime.utcnow() + datetime.timedelta(seconds=720)}, secret_key)
        return jsonify({'token': token.decode('utf-8')})
    return jsonify({'error message''your password or username is invalid'}), 401

The correct login and password are encoded as a token that is passed to the decorator token_required. Any Python functions wrapped by this decorator will be only available if the token matches.

def token_required(f):
    def authenticate():
        token = request.headers.get('Authorization')
        if not token:
            return jsonify({'error message''Token is missing!'}), 403
            data = jwt.decode(token, secret_key)
            return jsonify({'error message''Token is invalid!'}), 403
        return f()
    return authenticate

Therefore, the machine learning model can provide its inference service if the authentication has been successful. 


def predict_output():
    input_request = request.get_json()
    outputs = ml_model.predict(input_request['inputs']).tolist()
    outputs = [prediction_decoding[output] for output in outputs]
    return jsonify({'predictions': outputs})

if __name__ == '__main__':

    with open('/configuration/config.yaml''r'as config_file:
        cfg = yaml.load(config_file, Loader=yaml.FullLoader)

    with file_io.FileIO(cfg['model_path'], 'rb'as infile:
        ml_model = joblib.load(infile)

    with file_io.FileIO(cfg['users'], 'r'as infile:
        users = json.load(infile)

    with file_io.FileIO(cfg['prediction_decoding'], 'r'as infile:
        prediction_decoding = json.load(infile)
        prediction_decoding = {int(key): value for key, value in prediction_decoding.items()}

    with file_io.FileIO(cfg['secret'], 'r'as infile:
        secret_key =''port=5090debug=False)

The global variables, determined under if __name__ == ‘__main__, come from config.yaml generated by a ConfigMap which is mounted externally to the Docker container containing this above mentioned source code.  Hence, any configuration data or secret keys can be kept separately at this stage of machine learning model deployment. The full source code and remaining relevant files can be found in this GitHub repository.

Google Kubernetes Engine for Machine Learning Model Deployment

A popular approach for a machine learning model deployment into a production environment is to present them as RESTful API microservices that can be deployed in the Google cloud environment. Kubernetes is more and more used in data science and it is the right tool for this task.

GKE ConfigMap

To deploy the machine learningcensus prediction microservice on GKE, some GKE cluster needs to be first created. Navigate to the Kubernetes Engine section on GCP and then create a cluster named census-ai with selecting its version to be 1.18.17-gke.1901. Then enable a basic authentication and issue a client certificate in the security section. Another way to create the GKE cluster is to run the following command lines explicitly in the cloudshell terminal.

$ gcloud config set project {project-id}

$ gcloud container clusters create census-ai –cluster-version 1.18.17-gke.1901 –machine-type

   n1-standard-1 –num-nodes 1 –issue-client-certificate –enable-basic-auth –zone {zone-name}


Once the cluster has been provisioned, type this next command in the cloudshell to attach the cluster to an appropriate project.

$ gcloud container clusters get-credentials census-ai –zone {zone-name}

Then check if that cluster has been attached properly by using kubectl, which is the Kubernetes command-line tool.

$ kubectl get nodes

NAME                                             STATUS   ROLES    AGE    VERSION

gke-census-ai-default-pool-ae7c12c2-dq0r   Ready    <none>   112m   v1.18.17-gke.1901

Prior to model deployment, GKE’s resources need to be specified in some yaml file. ConfigMaps and Secrets are Kubernetes immutable objects which store configuration data as a key-value pair. They separate configuration artifacts from Docker images in order to keep containerized apps portable. Compared to ConfigMaps, Secrets are base64 encoded which is suitable for passing confidential information or credentials. To keep it simple, only one ConfigMap is used instead.

    model_path: gs://{bucket_name}/model_deployment/model.pkl
    users: gs://{bucket_name}/model_deployment/users.json
    secret: gs://{bucket_name}/model_deployment/secret.txt
    prediction_decoding: gs://{bucket_name}/model_deployment/prediction_decoding.json


The above ConfigMap named census-config is defined in model_deployment.yaml at the very beginning. Setting the field kind as ConfigMap creates that Kubernetes configuration resource object. Then, the field data generates config.yaml that contains some unencrypted parameter information read in by the source code at runtime.


Then, a pod specification is defined in model_deployment.yaml consecutively. Pods are the Kubernetes wrappers around Docker containers. The Pod creation can be done by specifying the argument kind as Deployment, which is a blueprint for building pods. The Deployment is named as ml-census to be distinguished from other Pods whereas label is an arbitrary key-value pair used to map the Deployment to its correct Pod of the same key-value. The spec field defines that the Deployment has one Pod and finds which Pod to manage by matching the same label.


      - nameconfiguration
      - namecensus-app{project_id}/model_deployment:latest
        command: ["python"""]
        - containerPort5090
        - nameconfiguration


The following field is the Pod template which has the same key-value label as the Deployment ml-census. The Pod template’s specification determines a particular volume of configuration data. In this case, the volume comes from the ConfigMap census-config and is mounted externally at the path /configuration in the container census-app. In general, the volumeMounts field makes the ConfigMap and Secret objects in the Pods available and accessible to containers. The container census-app uses the earlier built custom image stored in the Google Container Registry to launch the source code with its port to be 5090 at startup. Note that ConfigMap and Secret resources need to exist prior to the Pod Deployment.

GKE Service

Besides the Pod Deployment, a Service is another Kubernetes resource utilized in the model_deployment yaml file. GKE clusters create distinct ephemeral cluster IPs for each Pod. Every time when a Pod breaks down and then recovers, it gets assigned a new IP address. The Service object is designed to set a stable and constant IP address (during the cluster or Service lifetime) to reach its dedicated Pod. This way the recovered Pod can continuously be connected and available to expose its microservice.

    - port90

As shown above, the field kind is set to Service in order to create one. This Service object is linked to the Pod that has the same label app: census. By setting the Service type to LoadBalancer, it makes the microservice externally available. Then, it is exposed on port 90 and will forward incoming requests to the matched pod, which runs on port 5090.

I am text block. Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.a


Once these above Kubernetes resources have been determined, the machine learning model deployment can be performed by running the following command in the cloudshell:

$ kubectl apply -f model_deployment.yaml

To verify that the Kubernetes objects have been produced successfully, execute these consecutive command lines:

$ kubectl get configmap


census-config      1          2m23s

kube-root-ca.crt 1          5m2s


$ kubectl get pod

NAME                         READY   STATUS    RESTARTS   AGE

ml-census-6b8bd8d887-dpnv5   1/1       Running   0                  4m10s


$ kubectl get service

NAME                      TYPE                 CLUSTER-IP    EXTERNAL-IP    PORT(S)        AGE

kubernetes               ClusterIP          <none>         443/TCP        7m38s

ml-census-service   LoadBalancer   90:30763/TCP  4m43s

ML Census Microservice

To get some predictions from the ML census microservice, the authentication is required. Using the created external ip address and its corresponding port from the Kubernetes Service object, a token can be obtained.

import requests
endpoint = ''
response = requests.get(endpoint+'login', auth=('census''census'))


Once the token has been generated, it can be passed together with some single_prediction request.

headers = {'Authorization': response.json()['token']}
input = [
single_prediction ='prediction', headers=headers, 

Output: {'predictions': ['Annual Salary <50K']}

In addition, a batch prediction is possible as well.

inputs = [
batch_prediction ='prediction'headers=headers, 
  json={'inputs': inputs})

 {'predictions': ['Annual Salary <50K',
 'Annual Salary <50K',
 'Annual Salary >50K',
 'Annual Salary <50K']}

Once the ML census model has been deployed and tested, many other microservices can be added as extra features. GKE is able to manage multiple pods and allocate additional resources if needed.

Do you wonder how to put machine learning models into production, or perhaps you want to learn more about different technologies (like deep learning, logistic regression, real-time analytics and others) and ways to use data for your company’s benefit? Our data science experts can explain to you why it is beneficial to deploy machine learning models in your company. Leveraging machine learning can help you become more competitive. We understand the complexity of running a business, that is why our data scientists can advise you on the best solutions for your company.

This may interest you:


Share this post

Send Feedback