Microsoft Bot Framework v4 explained (JavaScript)

Introduction

Over the past year at Microsoft, I worked with a lot of partners and customers to build their customer-facing chatbots on Azure using the Bot Framework. The Bot Framework is Microsoft’s open source framework for building chat- and speech-enabled bots and virtual assistants. This post points out the key differences between Microsoft Bot Framework v3 and the new v4 – all with a focus on the JavaScript release.

In May 2018, Microsoft announced the preview of their new open source Bot Framework Version 4. In September 2018, the new release went general available and by end of 2018, version 4.2 was released.

Most of us had to learn the hard way, that version 4 was basically a complete rewrite of version 3 and required a fairly different way of designing and coding chatbots. In this post, we’ll walk through those differences and highlight the key concepts, so that you can also get started quickly with Bot Framework v4.

Core Concepts of Bot Framework v4

Firstly, let’s examine the different components that make up a bot on Azure.

Bot Architecture on Azure

One of the things that is fairly similar to v3 is the overall bot architecture on Azure.

Let’s start with the user. The user itself does not connect directly to the bot, but rather talk to it via a Channel through the Channel Connector service in Azure. These connectors are identical to those in v3. They are used to connect our bot to different platforms like Skype, Microsoft Teams, Kik, Facebook Messenger, and so on. Each Channel Connector sends its incoming messages from the user to the /api/messages path of the bot’s API endpoint.

Bot Framework Architecture on Azure
Bot Architecture on Azure

A bot (shown in the middle) is a simple web service that exposes one single API on the /api/messages path, as discussed before. This API is used for receiving incoming messages from the user and answering them.

The bot’s code is based on Bot Framework as the underlying “engine” and is hosted and executed on an Azure App Service. Similar to v3, a bot in v4 is stateless and the state of a conversation with a user is persisted after every single message. Azure Blob and CosmosDB are supported as persistency layers for a bot. As a result, a bot can easily be scaled-out horizontally in order to handle more incoming messages.

The bot code leverages Azure Cognitive Services, mainly its Language Understanding Service (LUIS) for natural language processing, as well as other services like QnaMaker (for simple question/answer pairs) or Speech API for being able to process and answer with speech.

Bot Framework v4 has deeper integration into Application Insights. Therefore monitoring the bot and its infrastructure, as well as how users “flow” through conversations with the bot can be performed by Application Insights. We’ll spend more on this in the section below.

Bot Deployment Options on Azure

Out of the box, Azure Bot Service currently only supports Azure App Service as the platform for v4 bots. While v3 also supported Azure Functions (after all, the Bot Framework is stateless), I personally see no reason why a v4 Node.js based-bot shouldn’t be able to run in an Azure Function. My personal guess is that it has just not been integrated into the Azure platform.

Is this is a big pity? Not really – in my experience most bots have been deployed on Azure App Service anyways.

Bot Framework v4 Concepts

Let’s switch over to the coding side and examine the new concepts of v4. Hint: now might be a good moment to forget most of what you know from v3.

Adapter

The Adapter is a necessary component of every bot. After the web service endpoint (under /api/messages) received a message from the user (or more generally speaking, an Activity), it is forwarded to the Adapter. The Adapter unwraps it, performs authentication, maps it to the user, etc. As an output, it creates an TurnContext object for us, which our actual bot code can process in the current Turn.

Turn & TurnContext

Bot Framework v4 represents interactions between users and the bot as Turns. Each Activity a user performs generates a new Turn. For example, a message from the user to the bot will imply a new Turn, but there are numerous activities that also imply a new turn (more on that later).

For each Turn, our bot receives a TurnContext object (generated by the Adapter). The TurnContext contains information about the current conversation, the activity that triggered the turn, the user state and further data points.

Activities

Activities are the events that our bot receives from its users. Probably the two most prominent activities are conversationUpdate and message. While message is self-explanatory (a message sent from the user to the bot), the conversationUpdate is triggered when a user or the bot join a conversation. Other Activities include contactRelationUpdate (when user adds or removes the bot to/from the contact list) or typing (triggered when the user is typing). A full list of all supported Activities can be found here.

Middleware

The Adapter produces the TurnContext object by passing the initial request through the Bot Framework’s Middleware. The Middleware is a pipeline that for example restores the state of the conversation, and potentially performs language understanding or translation. This Middleware pipeline can be extended with additional processing steps and is executed on every incoming message.

State and Persistency

The Middleware component automatically restores the conversation state (e.g., the dialog in which the conversation with the user is in) and also restores any custom user state. In contrast to Bot Framework v3, we are now responsible for manually updating both states. Both Azure Blob and CosmosDB are supported targets for persisting state.

Conversation Flow

This is probably the area where the most changes happened.

First of all, Dialogs are not a “must-use” concept in v4. For bots and assistants that perform single-shot operations, e.g. “turn off the lights” (similar to Alexa) we probably do not need to use any dialogs, but rather just use regular classes and call their methods.

However, Dialogs are the way to go for more complex and nested conversations:

A Dialog is composed of one or more WaterfallSteps. This allows for a linear conversation flow, as indicated in this example:

  • Dialog starts
  • Waterfall Step 1: Bot asks something
  • User answers
  • Waterfall Step 2: Bot processes the response and asks something else
  • User answers
  • Waterfall Step 3: Bot processes the response and answers
  • Dialog ends

In this example, the Dialog would contain three WaterfallStep entries. Step 1 and 2 would contain a Prompt. A Prompt is a single-step Dialog that asks the user something. The concept of Prompts is similar its counterpart in v3 and several built-in Prompts are included. However, custom Prompts with custom Validators can be written for better reusability of code.

How does a Dialog receive data? Similar to v3, either by passing (or having it passed) into the Dialog via the TurnContext object or by accessing the custom user state data.

Multiple Dialogs and Prompts are grouped together in a DialogSet. As a Dialog can not have child-Dialogs any more, DialogSets are the way to group Dialogs and Prompts. In v3, we often used a Root Dialog and routed to the individual sub-dialogs. In v4, we would have a DialogSet as the Root Dialog, containing all our sub-Dialogs.

Our actual Bot

Our actual bot is nothing more than a class with a single method called onTurn() which receives the TurnContext object during each invocation. From there, we decide if we want to leverage dialogs or just perform single-shot responses.

Monitoring

With the new Application Insights integration for Bot Framework v4, it becomes significantly easier to monitor the bot and its surrounding infrastructure:

Application Map of the Bot Framework through Application Insights
Application Map through Application Insights

Application Insights generates an Application Map based on all events occurring in the infrastructure. For example, we can see that our bot talks to Azure Blob (upper left), has been accessed by the Webchat Channel (top), or uses LUIS for language understanding (upper right). Furthermore, we see which calls generated errors (e.g., 404 or 500 errors) and their associated latencies.

With the User Flow feature in Application Insights, we can see how requests flow through our bot code. This enables us to know which dialogs our users frequently take and allows us to explore where we might have performance bottlenecks or a bumpy flow through the conversation.

User Flow in Application Insights
User Flow in Application Insights

In this example, we see how messages flow into the /api/messages path and then trigger various Waterfall steps in the dialogs.

The Application Insight integration package is available via botbuilder-applicationinsights and requires only a few lines of code for integration.

Authentication

Yes, finally a decent library for performing authentication has been included in the Bot Framework. Full instructions on its setup are available in the Azure documentation here.

Conclusion

Bot Framework v4 is a complete rewrite of Bot Framework v3, hence both are very different in terms of functionality, as well as coding patterns. In this post we’ve looked at most of the changes and discussed most of the new concepts. With all those differences between the framework versions, there are quite a few things that work better with v4, but also some things that hopefully will improve over time.

Things are better

Overall, I believe the following capabilities make v4 a more versatile framework for building bots:

  • Monitoring has been improved drastically – we can now monitor the bot itself, its infrastructure, and the users’ flow through our bot
  • Authentication has been simplified
  • We now have more consistent coding model across the different languages (C#, JS, etc.)
  • We ca better reuse dialog code
  • Overall, Bot Framework v4 offers more freedom in terms of designing and integrating bots

Things that need improvement

On the flip side, there a few things that hopefully will get some more attention in the future:

  • Overall, building a bot in v4 takes a bit more reading for getting started
  • Documentation is still a bit behind
  • Code bloat – Bot Framework v4 requires us to write a lot more lines of code compared to v3. Better code refactoring capabilities hopefully will help to mitigates this, but it will definitely require some thinking and optimisation before it will pay off

Last but not least, if you are familiar with Bot Framework v3, you’ll be able to get the twist of v4 after a day or two. Understanding the new concepts for updating state, handling dialogs and turns helped me to quickly transition some v3 bots to the new framework version.

If you have any questions or want to give feedback, feel free to reach out via Twitter or just comment below.

Leave a Reply

Your email address will not be published. Required fields are marked *