Anomaly Detection on Time-Series Data with Azure

Introduction

Anomaly detection on time-series data is a crucial component of many modern systems like predictive maintenance, security applications or sales performance monitoring. It allows to detect events, that look suspicions or fall outside the distribution of the majority of the data points.

This post explains how to perform anomaly detection on time-series data with Azure. We’ll walk through several examples how different underlying data patterns affect the actual detection of anomalies.

Why Anomaly Detection?

Detection anomalies on time-series data is not only a tedious, yet often also very difficult task for humans. Often, those anomalies are hard to spot due to the small nuances. Anomaly detection, also know as outlier detection, hence can be used to address many different applications. With the provided Anomaly Detector API Service in Azure, we can for example solve the following challenges:

  • Predictive Maintenance – Detect anomalies in machine generated data, e.g., detect unusual sensor readings to prevent the failure of a machine
  • Security – Detect unusual behaviour patterns, e.g., a user logging in significantly more often from an known or unknown location
  • Sales Monitoring – Detect unusual sales patterns of, e.g., a certain product (maybe due to faulty pricing by marking it “too cheap” or “too pricy” by accident)
  • Operations Monitoring – Detect unusual event patterns in IT infrastructure, e.g., transactions on a database or number of error logs

Python Example

The Anomaly Detector API documentation provides good quickstart examples for several languages, including Python. The API offers adjustment of the sensitivity of the service and allows for batch processing or real-time analysis of new data points.

Let’s have a look at a minimal example for batch processing:

Our source data comes from this example and has the following format:

{
  "granularity": "daily",
  "series": [
    {
      "timestamp": "2018-03-01T00:00:00Z",
      "value": 32858923
    },
    {
      "timestamp": "2018-03-02T00:00:00Z",
      "value": 29615278
    },
    ...
  ]
}

Optionally, we can specify the period, if we know the underlying seasonality of the data (e.g., daily or weekly).

Results Analysis

Let’s have a look at our results and examine what happens when we artificially add linear or exponential growth to our example data:

Anomaly detection on the original data

In case of our original source data, we can see that the service accurately detected all the anomalies:Anomaly Detection results

The dotted blue line indicates the expected values, the light blue areas the thresholds, and the black line the actual observed values. The anomalies are clearly visible and and marked with the red dots.

If we look at the raw detection results, we can see that the service automatically predicted the correct period of 7 days.

Anomaly Detection Data

Anomaly detection with underlying linear growth

Next, let’s assume our data values grow linearly. Hence, let’s simulate an underlying linear growth in our data pattern. As we can see, the anomaly detection still works as expected:Anomaly Detection results with linear growth

In fact, the Anomaly Detector Service still correctly identifies the same anomalies. However, if we increase the slope of the underlying linear growth, the results start to change. This is not surprising as the small nuances in our anomalies get “eaten up” by the large upwards slope. In this case, we would need to manually adjust the sensitivity of the detection algorithm.

Anomaly detection with underlying exponential growth

Out of curiosity, let’s see how the service performs if we add an underlying exponential growth to our time-series data. In this example, we’ve used 5% growth per value:Anomaly Detection results with exponential growth

At the first glance, this doesn’t look as promising as the other two results – the Anomaly Detector API does not seem to be able to model the underlying distribution properly, hence is not able to classify “outliers” correctly. The predicted blue curve does not seem to grow “fast enough” to keep up with the actual data.

However, we might need to consider:

  • The service still detects a 7 day seasonality pattern for our source data. At the same time, the exponential growth obviously dilutes this pattern over time. Especially in a real world scenario, we need to ask ourselves if our data really can keep growing exponentially? Would it eventually reset again and expose a longer seasonality pattern?
  • And more general: Is this a realistic scenario or actually an anomaly? Observing an exponential growth in, e.g., error messages is probably a sign that something is not working!

Summary

Anomaly Detection is an important component for many modern applications, like predictive maintenance, security or performance monitoring. Azure Anomaly Detector API offers a simple way to detect anomalies in time-series data. Outlier detection can either be performed in batch mode or in real-time on new data points.

From our initial results, it looks like it is best to have an underlying distribution that either has no underlying growth or moderate linear growth. In both cases, the algorithms is able to detect the seasonality of our data (e.g., daily or weekly). In our short experiments, the service does not seem able to model data with underlying exponential growth very well. However, it needs to be further analyzed if such an underlying distribution is actually common for the mentioned use cases, or, if such a pattern would actually point towards being an anomaly.

This last example in this post clearly indicates that “just using” the Anomaly Detector API might not be the best idea – especially when the underlying distribution of the data is not well understood or seems to appear “random”. Even though we are working with a service here, we still need to have a basic understanding of what our data looks like.

If you have any questions or want to give feedback, feel free to reach out via Twitter or just comment below.

Leave a Reply

Your email address will not be published. Required fields are marked *