Azure Search Quickstart Tutorial

Overview

This post is a quickstart tutorial for getting to know what Azure Search is all about. Hence, we’ll walk through the key features of Azure Search and look at how to use them through its REST API. Azure Search is a “search-as-a-service” offering in Azure that allows developers to quickly index and search through vast amounts of information. We’ll also have a quick glimpse at Analyzers, Suggesters, and Synonyms.
This post is structured into five sections:
  1. Understanding the key concepts of Azure Search
  2. Deploying an Azure Search service in Azure
  3. Creating an Index
  4. Putting data into an Index
  5. Searching an Index
Let’s get started!

Understanding the key concepts of Azure Search

Azure Search is a “search-as-a-service” offering in Azure. It allows us to index data and search through it. Retrieved data can be filtered, faceted, scored and ranked, auto-completed and enriched through synonyms. By supporting geo-location, features like “find near me” are easy to implement. Furthermore, Azure Cognitive Search allows adding support for extracting text out of images, videos, or audio content.
Azure Search Overview
Azure Search Overview
As a first step, we briefly need to understand partitions and replicas.

Partitions and Replicas

In order to handle large amounts of requests, Azure Search has the concept of partitions and replicas:
  • Partitions allows our Azure Search service to store and search through more documents
  • Replicas allows our Azure Search service to handle a higher load of search queries
Or speaking in terms of avoiding bottlenecks:
  • If we need to serve more concurrent queries, we’ll add more replicas
  • If we need to index more documents, we’ll add more partitions
Azure Search offers a free service tier, which consists of a single partition and a single replica of your data. For production, this is obviously not sufficient. Hence, a Search service must have 2 replicas for read-only SLA and 3 replicas for read/write SLA.

Indexes and Documents

An Index is the persistent store for documents and other constructs in Azure Search. A single instance of Azure Search can have multiple Indexes. A document is a single unit of searchable data in our index.
Here are some examples:
  • In an hotel booking website, each hotel would be one document in our index
  • In a blog, each blog post would be one document in our index
  • In a web shop, each item being sold would be one document in our index
If we related this concept to databases, we can roughly say that an Index relates to a table, while a document relates to a row.
All operations that we’ll perform on Azure Search are with regard to an specific Index.
Now that we understand the basic concepts of Azure Search, we can start creating out testing instance.

Deploying an Azure Search service in Azure

Deploying an Azure Search service is straight forward, as we just need to specify the sku for the service level we want to use. For our example, free is sufficient:
az group create -n clemens-search-rg -l westeurope
az search service create -n clemens-search -g clemens-search-rg --sku free -l westeurope
Our response looks good:
 
  "hostingMode":"default",
  "id":"/subscriptions/xxxxx/resourceGroups/clemens-search-rg/providers/Microsoft.Search/searchServices/clemens-search",
  "location":"West Europe",
  "name":"clemens-search",
  "partitionCount":1,
  "provisioningState":"succeeded",
  "replicaCount":1,
  "resourceGroup":"clemens-search-rg",
  "sku": 
    "name":"free"
  },
  "status":"running",
  ... 
  "type":"Microsoft.Search/searchServices"
}
Great, our Azure Search instance is now running under https://clemens-search.search.windows.net. In the next step we’ll create an search Index.

Creating an Index

Before creating an Index, we need to understand that each index has one or more fields:
  • Each field has a type (string, list, boolean, integer, double, date or GPS location)
  • Each field also has attributes:
  • Key – unique ID per document, used to look document up, every index must have exactly one key, required to be a String
  • Retrievable – field is returned in search results
  • Filterable – field can used in filter queries
  • Sortable – search results can be sorted by this field
  • Facetable – field can be used in faceted navigation structure for user self-directed filtering (used for grouping multiple documents together, e.g., item location, shipper, etc.)
  • Searchable – field becomes full-text searchable
More on field attributes can be found here.
Indexes each have a unique name and can be created in the portal or via API. Therefore, let’s just fire some raw https calls against our new instance for creating our first index:
POST https://clemens-search.search.windows.net/indexes?api-version=2017-11-11
Content-Type: application/json
api-key: xxxxxxxx
In the body, we’ll specify the fields and their attributes for our index:
{
  "name": "hotels",
  "fields": [
    {"name": "hotelId", "type": "Edm.String", "key": true, "searchable": false, "sortable": false, "facetable": false},
    {"name": "baseRate", "type": "Edm.Double"},
    {"name": "description", "type": "Edm.String", "filterable": false, "sortable": false, "facetable": false},
    {"name": "description_fr", "type": "Edm.String", "filterable": false, "sortable": false, "facetable": false, "analyzer": "fr.lucene"},
    {"name": "hotelName", "type": "Edm.String", "facetable": false},
    {"name": "category", "type": "Edm.String"},
    {"name": "tags", "type": "Collection(Edm.String)"},
    {"name": "parkingIncluded", "type": "Edm.Boolean", "sortable": false},
    {"name": "smokingAllowed", "type": "Edm.Boolean", "sortable": false},
    {"name": "lastRenovationDate", "type": "Edm.DateTimeOffset"},
    {"name": "rating", "type": "Edm.Int32"},
    {"name": "location", "type": "Edm.GeographyPoint"}
  ]
}
The response contains all fields and their associated attributes:
 
  "@odata.context":"https://clemens-search.search.windows.net/$metadata#indexes/$entity",
  "@odata.etag":"\"0x8D608170F1D644C\"",
  "name":"hotels",
  "fields": 
     
      "name":"hotelId",
      "type":"Edm.String",
      "searchable":false,
      "filterable":true,
      "retrievable":true,
      "sortable":false,
      "facetable":false,
      "key":true,
      "indexAnalyzer":null,
      "searchAnalyzer":null,
      "analyzer":null,
      "synonymMaps":[]
    },
  ...
  ],
  "scoringProfiles":[],
  "defaultScoringProfile":null,
  "corsOptions":null,
  "suggesters":[],
  "analyzers":[],
  "tokenizers":[],
  "tokenFilters":[],
  "charFilters":[]
}
In the portal, we can easily see that our new index has been created:
Our Azure Search Index in the Portal
Our Azure Search Index in the Portal
We can also get a quick, visual overview on what fields are contained in it:
Details for our Azure Search Index
Details for our Azure Search Index
Now that we have our Index set up, let’s briefly look into a few more advanced concepts.

Analyzers

Each field can optionally have an Analyzer attached to it. Analyzers are components that allow to process text in search queries and indexed documents. If we look back at our examples about, we see that the analyzers fields have all been set to null, as we didn’t specify any.
As an example, Analyzers usually perform the following tasks:
  • Remove stopwords and punctuation (e.g., non-essential words) from documents and queries
  • Break phrases into smaller chunks
  • Transform all text to lower-case
  • Reduce words to root forms (allows searching regardless of tense of words)
As a result, Analyzers improve our overall search robustness significantly and help delivering better search results. Analyzers can be added during Index creation or added to existing fields at later point. Adding them later will force a rebuild of the Index, thus performance will be impacted during rebuild.
Overall, Azure Search comes with many Language Analyzers for different languages, but we can also add or write our own. More details can be found here.

Suggesters

Suggesters enable autocomplete suggestions (“search-as-you-type”) in Azure Search. This allows to easily implement typeahead, which is obviously a must-have feature for all modern applications. Same as Analyzers, we need to add Suggesters to our Index.
The overall requirements are pretty straight forward:
  • Only one Suggester per Index
  • The Suggester needs to be tied to specific fields
  • Suggesters work best on short fields, such as names, titles, etc. (instead of long text fields)
Suggesters are accessed via the Suggestion API, which is optimized for low latency:
GET https://clemens-search.search.windows.net/indexes/[index name]/docs/suggest?hotel
api-key: xxxxxxx
We need to input at least 3 characters in order to get a result back. By specifying a Suggester on multiple fields, a web shop using Azure Search might not only typeahead product names, but also product groups or vendor names.
More details regarding Suggesters can be found here.

Synonyms

Synonyms allow us to specify equivalent terms for expanding the scope of a query. As an example, if the user is searching for the term “cat”, the Synonym feature automatically can add further terms, such as “kitten” and “kitties”.
Synonym Expansion and Mapping
As Synonyms are added during query time, we can add or change synonyms without performance impact or having to rebuild the index. Synonyms can be defined to define the search query or map it:

Expansion

By specifying a comma separated list of synonyms, Azure Search will automatically expand the search query:
NYC, New York, New York City
A search query for “NYC” will be expanded to “NYC” OR “New York” OR “New York City”.

Explicit Mapping

By specifying an explicit mapping convention, Azure Search automatically maps all specified terms to a standard term:
NYC, New York, New York City => New York City
A search query for either “NYC”, or “New York”, or “New York City” will be replaced with a search query for “New York City”. This mapping is uni-directional. More details on Synonyms and how to set them can be found here.

Putting data into an Index

Azure Search supports two ways for getting data into an Index:
  • By programmatically pushing data into Azure Search via its API
  • By automatically pulling data from a data source into the Index
Depending on the use case, we might want to use one, or even both methods.

Pushing data into an Index

With one single API call, we can push up to 1000 documents at once into an Index (max. 16 MB per request). The API call is straight forward, we only need to POST JSON documents with an @search.action for each document:
POST https://clemens-search.search.windows.net/indexes/hotels/docs/index?api-version=2017-11-11
Content-Type: application/json
api-key: xxxxxxx
Our request body looks like this:

  "value":
    
      "@search.action":"upload",
      "hotelId":"1",
      "baseRate":199.0,
      "description":"Best hotel in town",
      "description_fr":"Meilleur hôtel en ville",
      "hotelName":"Fancy Stay",
      "category":"Luxury",
      "tags":
        "pool",
        "view",
        "wifi",
        "concierge"
      ],
      "parkingIncluded":false,
      "smokingAllowed":false,
      "lastRenovationDate":"2010-06-27T00:00:00Z",
      "rating":5,
      "location":
        "type":"Point",
        "coordinates":
          -122.131577,
          47.678581
        ]
      }
    },
    
      "@search.action":"upload",
      "hotelId":"2",
      "baseRate":79.99,
      "description":"Cheapest hotel in town",
      "description_fr":"Hôtel le moins cher en ville",
      "hotelName":"Roach Motel",
      "category":"Budget",
      "tags":
        "motel",
        "budget"
      ],
      "parkingIncluded":true,
      "smokingAllowed":true,
      "lastRenovationDate":"1982-04-28T00:00:00Z",
      "rating":1,
      "location":
        "type":"Point",
        "coordinates":
          -122.131577,
          49.678581
        ]
      }
    },
    
      "@search.action":"merge",
      "hotelId":"3",
      "baseRate":279.99,
      "description":"Surprisingly expensive",
      "lastRenovationDate":null
    },
    
      "@search.action":"delete",
      "hotelId":"4"
    }
  ]
}
Our response returns a short report for each document:

  "@odata.context":"https://clemens-search.search.windows.net/indexes('hotels')/$metadata#Collection(Microsoft.Azure.Search.V2017_11_11.IndexResult)",
  "value":
    
      "key":"1",
      "status":true,
      "errorMessage":null,
      "statusCode":201
    },
    
      "key":"2",
      "status":true,
      "errorMessage":null,
      "statusCode":201
    },
    
      "key":"3",
      "status":true,
      "errorMessage":null,
      "statusCode":201
    },
    
      "key":"6",
      "status":true,
      "errorMessage":null,
      "statusCode":200
    }
  ]
}

Pulling existing data into an Index

We often already have a data source that contains our data and is well maintained, but does not offer flexible search capabilities. In this case, we can define an Azure Search Indexer to pull in data from one of the following sources:
  • Azure Blob storage
  • Azure Table storage
  • Azure Cosmos DB
  • Azure SQL database
  • SQL Server on Azure VMs
Pulling existing data into an Index
Pulling existing data into an Index
Indexers have a definition for how frequently is fetched from the source. Most Indexers support tracking changes in the source data, e.g. they automatically add new or remove deleted documents.
  1. Import Data in Portal
  2. Select Data Source
  3. Select and configure the Data Source
  4. Customize the preliminary target Index
  5. (Optional) Setup Analyzers and a Suggester
The Search Explorer in the Azure Portal allows us to quickly check if data is pulled correctly into Azure Search:
Search Explorer allows quick checking to see if the Indexer worked
Search Explorer allows quick checking to see if the Indexer worked
Looks good! Lastly, let’s look how we can search our Index.

Searching an Index

Now that we have data in our Index, we can search through it. A search query includes full requirements on how search should be performed:
  • Match criteria for finding documents in the Index
  • Execution instructions for the search engine
  • Directives for shaping the response (which fields to return, sorting, filtering, etc.)
When not other specified, a search query runs against all searchable fields as a full-text search, returning an un-scored result set in arbitrary order. Let’s look at an example:
For searching our hotel index, we can use the GET format and encode all terms into the URL:
GET https://clemens-search.search.windows.net/indexes/hotels/docs?search=budget&$select=hotelName&api-version=2017-11-11
Content-Type: application/json
api-key: xxxxxxx
Alternatively, we can use the POST format, which is more readable and easier to construct for complex queries :
POST https://clemens-search.search.windows.net/indexes/hotels/docs/search?api-version=2017-11-11
Content-Type: application/json
api-key: xxxxxxx
{
  "search": "budget",
  "select": "hotelName"
}
Both will return an identical response:
 
  "@odata.context":"https://clemens-search.search.windows.net/indexes('hotels')/$metadata#docs(hotelName)",
  "value": 
     
      "@search.score":0.77480876,
      "hotelName":"Roach Motel"
    }
  ]
}

Search Options

Let’s have a look at an exemplary search query from the documentation:
{
  "queryType": "simple",
  "search": "seattle townhouse* +\"lake\"",
  "searchFields": "description, city",
  "count": "true",
  "select": "listingId, street, status, daysOnMarket, description",
  "top": "10",
  "orderby": "daysOnMarket"
}
This is definitely a bit more powerful than our previous example, hence let’s look at the different options:
  • queryType: sets the parser, simple is the default full-text parser, but other parsers for regex, proximity search (location), fuzzy and wildcard search, etc. exist
  • search: our search term in form of text, * can be used
  • searchFields constrains the query to specific fields
  • count: returns how many documents were found in total
  • select: selects the fields that should be returned in the response
  • top: limits the number of returned hits
  • orderby: orders the results
More information on search queries can be found here.

Further Reading

Azure Search features a pretty extensive official documentation with lots of examples.
Azure Cognitive Search allows to add data extraction, natural language processing and image processing to the Azure Search indexing pipeline. This is allows to index unstructured documents, such as images, videos or audio files and makes them searchable. More details can be found on the Azure Cognitive Search website.

Summary

Azure Search is a “search-as-a-service” offering in Azure. It allows to search through structured, as well as semi-structured and unstructured data (by using Cognitive Search). Azure Search is built upon the concepts of Indexes and documents, where each documents belongs to an index. All operations are performed on an specific index, such as adding, deleting, or updating documents.
Azure Search has several features that can make search a lot more robust:
  • Analyzers can automatically remove unnecessary words, reduce words to root tense, remove punctuation, etc.
  • Synonym support automatic expands or maps search queries
  • Suggesters enable typeahead, which is expected by users in modern applications
Data can either be pushed programmatically into Azure Search, or added by having an Indexer pull it from a data source such as Azure Blob, Table, CosmosDB, Azure SQL and others on a periodic basis.
If you have any questions or found this Azure Search quickstart tutorial helpful, feel free to reach out to me on Twitter @clemenssiebler.

Leave a Reply

Your email address will not be published. Required fields are marked *