Model Deployment to AKS
AI Azure Machine Learning Clemens Siebler  

Deploying Machine Learning Models to Azure Kubernetes Service

In last week’s post, we looked into how we can use Automated Machine Learning for building and deploying Models as APIs. This week, we’ll take the same model we have trained, but deploy it to Azure Kubernetes Service (AKS). Machine Learning Model deployment is such a critical task, and using AKS is the production-approved way of running Machine Learning models in Azure.

Why you ask? Out of the box, this provides us several benefits, such as:

  • Authentication for our deployed API
  • Data Collection of input data and predictions to Azure Blob (this helps us to further re-train our model and detect e.g. data drift)
  • Monitoring of our model using Application Insights
  • Fine-granular autoscaling options
  • And last but not least, re-use of existing Kubernetes resource (we’re probably running our app on AKS anyway)

In detail, we’ll be building out the following architecture:

Model Deployment to AKS

We’ll be

  • using Azure Machine Learning workspace to train and register our model (already done in this post)
  • pushing the containized model to Azure Container Registry (this time, we’ll add features like Data Collection)
  • deploying the model to Azure Kubernetes Service (AKS)
  • analyzing data input and model output in Azure and general model telemetry in Application Insights

Let’s get started!

Preparing our Model Container

To get started, let’s first import all necessary packages and authenticate to our existing Workspace for the last post:

from azureml.core import Workspace
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import Webservice, AksWebservice
from azureml.core.model import Model

import azureml.core

ws = Workspace.from_config()

As pointed out before, we want to re-use the model we’ve trained and registered in the last post. For this, we need the model id. We can find it either in the Portal in our Workspace under Models:

Model in the Portal

Alternatively, we can retrieve all registered models in a workspace directly in Python:

models = Model.list(ws)

Once we found the correct one, we can reference it in our code:

model = Model(ws, name="AutoML811537828best")
print(,, model.version, sep = '\t')

For packaging our model into a container, we need to create our scoring script and define the Conda environment. Surely, we could re-use the for the last post, but since we’ll be adding the Data Collection feature, we will need to adapt it a bit:

import pickle
import json
import pandas as pd
import time
import azureml.train.automl
from sklearn.externals import joblib
from azureml.core.model import Model
from azureml.monitoring import ModelDataCollector

def init():
    global model
    model_name = 'AutoML811537828best'
    print ("Initializing model at " + time.strftime("%H:%M:%S"))
    model_path = Model.get_model_path(model_name = model_name)
    model = joblib.load(model_path)
    global inputs_dc, prediction_dc
    inputs_dc = ModelDataCollector(model_name,
    prediction_dc = ModelDataCollector(model_name,

def run(rawdata):
        data = json.loads(rawdata)['text']
        result = model.predict(pd.DataFrame(data, columns = ['Text']))
        print("Saving input data and output prediction at " + time.strftime("%H:%M:%S"))
    except Exception as e:
        result = str(e)
        return json.dumps({"error": result})
    return json.dumps({"result": result.tolist()})

The code should look very familiar, except that we have added the ModelDataCollector import. This allows us to persist model data input and associated predictions to Azure Blob.

Since we will be using the Data Collection feature, we also need to include the azureml-monitoring package in our Conda environment. Furthermore, the azureml-defaults package is required for deployments to Kubernetes:

from azureml.core.conda_dependencies import CondaDependencies

cd = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],
                              pip_packages=['azureml-defaults', 'azureml-train-automl', 'azureml-monitoring'])

conda_env_file_name = 'automl-sentiment-dc-env.yml'
cd.save_to_file('.', conda_env_file_name)

Lastly, we can build the container image and have it push to the Azure Container Registry in our Workspace:

from azureml.core.image import ContainerImage

image_config = ContainerImage.image_configuration(execution_script = "",
                                                  runtime = "python",
                                                  conda_file = conda_env_file_name,
                                                  description = "IMDB Sentimemt with Data Collection",
                                                  tags = {'datasource': "imdb"})

image = ContainerImage.create(name = "imdb-sentiment-with-dc",
                              models = [model],
                              image_config = image_config,
                              workspace = ws)

image.wait_for_creation(show_output = True)

Once completed, we can deploy our AKS cluster.

Creating an AKS Cluster

Next, we want to create our Kubernetes cluster. In this case, we’ll stay with the default config.

aks_name = 'aksautomlclemens' 

cts = ws.compute_targets

if aks_name in cts and cts[aks_name].type == 'AKS':
    print('Found existing AKS cluster, will use it!')
    aks_target = cts[aks_name]
    print('Creating a new AKS cluster...')
    prov_config = AksCompute.provisioning_configuration()
    aks_target = ComputeTarget.create(workspace = ws, 
                                  name = aks_name, 
                                  provisioning_configuration = prov_config)
    print('Waiting for cluster creation completion...')
    aks_target.wait_for_completion(show_output = True)

print('Cluster state:', aks_target.provisioning_state)
print('Cluster is ready!', aks_target)

In contrast to using Azure Container Instances, deploying to AKS gives us a lot more knobs to configure. Have a look at the documentation here. It is worth noting that for production setups, AKS needs to be configured with at least 3 nodes, running at least a total minimum of 12 vCores. Since the above default configuration will deploy 3 agent nodes of type Standard_D3_v2 (4 vCores), we’ll end up with a total of 3*4=12 vCores. Perfect!

For test/dev deployments, we can specify cluster_purpose in AksCompute.provisioning_configuration and set it to AksCompute.ClusterPurpose.DEV_TEST, see here. This will allow us to also deploy clusters with less vCores, however, a minimum of 2 cores in total is recommended.

Deploying our Model Image

Now that we have our AKS cluster running and our Model containerized, we can finally start our model deployement:

aks_config = AksWebservice.deploy_configuration(collect_model_data=True,

aks_service_name ='sentiment-api-with-data-coll'

wss = Webservice.list(workspace = ws, compute_type='AKS')

if any( == aks_service_name for ws in wss):
    print('Model with same name already deployed')
    if aks_target.provisioning_state == "Succeeded":
        aks_service = Webservice.deploy_from_image(workspace = ws, 
                                                   name = aks_service_name,
                                                   image = image,
                                                   deployment_config = aks_config,
                                                   deployment_target = aks_target)
        aks_service.wait_for_deployment(show_output = True)
        print('Service state:', aks_service.state)
        print('Service details:', aks_service)
        raise ValueError("Failed to deploy service to AKS - Error: ", aks_service.error)

In our AksWebservice.deploy_configuration we explicitly enable data collection, as well as Application Insights for telemetry data collection. As our cluster is already up and running, deployment will be much quicker as compared to spinning up an Azure Container Instance!

Analyzing our Results

First, let’s validate that our API is working properly. In case of deploying to AKS, the model is automatically protected by an API key, which we can regenerate using the Portal or some Python code (aks_service.regen_key('Primary') or 'Secondary'):

import requests
import json

key1, Key2 = aks_service.get_keys()

headers = {'Content-Type':'application/json',
           'Authorization': 'Bearer ' + key1}

data = {"text": ['the food was horrible',
                 'wow, this movie was truely great, I totally enjoyed it!',
                 'why the heck was my package not delivered on time?']}

resp =, json=data, headers=headers)
print("Prediction Results:", resp.json())

The results look promising:

Prediction Results: {"result": ["0", "1", "0"]}

Model Telemetry

In the Azure Portal, we can view the model telemetry in the associated Application Insights instance. Per default, this instance will have the same name as your Workspace, unless you have been re-using an existing Application Insights instance. Once we open up App Insights, we’ll see a high level overview for our deployed API, featuring failed requests, response time, number of requests and availability:

Application Insights Telemetry

From here, we can explore the Application Map. However, unless we re-use Application Insights for multiple services (e.g., a web app calling our API), it won’t look very exciting at this point.

More interestingly can be querying Log Analytics. On the main screen of Application Insights, select Logs (Analytics), enter traces as the query, set the time range and hit run:

Model logs in Log AnalyticsThis allows us to view and anlyze the logs that our containerized model is outputting to stdout and stderr.

Perfect, now we know general telemetry metrics of our model and can also perform some troubleshooting in case things go south. This should cover us from the infrastructure side, so let’s look into data collection of our model next.

Data Collection for our Model

Since we’ve enabled data collection for our model, all input data and prediction results will now get logged to Azure Blob. However, there is a 15 minute delay until data shows up – keep that in mind when testing!

So where is the data store? Per default, it ends up in the default storage out, which was created during the creation of the Workspace. We can find the input data and predictions under:

  • modeldata container
  • subscription_id
  • resource_group
  • workspace_name
  • webservice_name
  • model name –> model_version
  • identifier (e.g., inputs and predictions in our case)
  • year –> month –> day
  • data.csv

Data Collection to Azure Blob

Our input data has the following format:

2019-08-30T14:31:52.112225,,20392dae-cc2a-476a-b61b-3c92f65823d8,the food was horrible,"wow, this movie was truely great, I totally enjoyed it!",why the heck was my package not delivered on time?

Out result predictions the following:


By having access to the model input data, we can now either study our input data, adopt and retrain our model or also monitor data drift, in order to make sure that we always have a high-quality model running in production!

Last but not least, we can run aks_service.delete() for deleting our API service and aks_target.delete() for deleting our Kubernetes cluster.


Machine Learning Model deployment to Azure Kubernetes Service is a robust way for running Machine Learning models on Azure in production and at scale. Features like for example authentication, data collection of input data and prediction results, as well as rich the monitoring capabilities even allow the use in enterprise scenarios.

If you have any questions or want to give feedback, feel free to reach out via Twitter or just comment below.

Leave A Comment