Enriched Search Experience poster with some sound Cloud reference architectures

Building AI Applications for Azure Cloud

19 min readJan 18, 2022

There’s a vast array of Cognitive Services (Language, Vision, Speech just to name a few) available as PaaS in each major Cloud these days. In this article we will share a point of view on building PaaS-based secure Artificial Intelligence (AI) applications for Azure Cloud and offer some comments about the qualities of a sample solution architecture. It is a typical requirement across different industries to be able to gain meaningful insights from the variety of structured, semi-structured and unstructured data in the Cloud including domain-specific analysis use cases.

If you are reading this article we assume you are already familiar with Azure Cloud, in fact, you can always find more information about Azure Cloud & its Cloud Computing Services here.

In a related article here dedicated to the Edge we have already touched upon some key assumptions and certain considerations about technology choices for the target scenario. However, in this article we highlight just a few important points for implementing this workload in the Cloud as provided below:

App is a PaaS-based Artificial Intelligence (AI) app for knowledge mining in the Cloud. Naturally, we take advantage of a wide set of Cognitive Services already available in Azure Cloud as PaaS with SLAs (including Cognitive Search), as well as other necessary PaaS services from identity, compute, storage, networking to databases, caches, BI, etc. We only containerize those components which are not present in Azure Cloud as PaaS and deploy those into Azure Kubernetes Services (AKS) cluster(s).
Complete web application and services (web APIs). Deploying web application with front-end and back-end in the Cloud requires work to properly secure it. Azure Active Directory (AAD) is used as an identity backbone for authentication & authorization.
App can be deployed in Azure Cloud through Azure DevOps (ADO) pipelines or GitHub Actions.

Architecture

Azure Cloud has a lot of PaaS (Platform-as-a-Service) services to offer which makes the experience building AI applications a pleasant one. When designing a solution architecture we’ll logically look at the first-party Microsoft capabilities first and then we’ll pick the best-of-breed OSS (Open Source Software) to augment our solution where necessary. In this article we delved deep into the decision making process around programming languages and frameworks, thus in this article we’ll get straight to the proposed architecture. The following diagram illustrates a simplified sample solution architecture:

On a high level the proposed solution architecture takes advantage of different data stores, employs various automated pipelines for processing data ingest (so called Content Cracking Pipelines), and utilizes Azure Cloud platform capabilities across the board to minimize the amount of code written while fully achieving the goals. When it comes to the custom code we prefer to use a well proven MLOpsPython template available here for all things Python+ MLOps, to minimize the amount of custom code (and associated future technical debt) we like to leverage Azure Custom Vision and/or Azure Machine Learning (AML) AutoML feature for scenarios where we bring our own data and the algorithmic part has already been provided by Microsoft as a managed service, be it a simple regression model for predicting numeric values or a more sophisticated CNN for multi-label image classification.

If we zoom in into the sample solution architecture above, we’ll actually see that for handling different types of content, such as documents, images and media content, it is efficient to have a set of dedicated content cracking pipelines (3), one per content type. Documents Cracking Pipeline is the most straightforward: an unstructured document (in any format) comes in and we crack it open with the help of Azure Cognitive Search to extract what needs to be extracted. Images Cracking Pipeline will require an extra step to get from an image to a text (using OCR), in addition to this we may extract a lot more useful information from images as needed using standard and custom Computer Vision (CV) capabilities, etc. Media content Cracking Pipeline (audio and video files) may be implemented by using Azure Video Indexer which is now called Azure Video Analyzer for Media (more info available here).

Sample Content Cracking Pipelines for documents, images and media content

The proposed architecture is flexible and modular which makes the end “product” extensible and applicable in many scenarios and across industries. To equip the solution with more features we can always find some new AI capabilities which became available in Azure Cloud, in the OSS community, or by other vendors, then if we treat all capabilities as individual APIs it is truly a plug-n-play exercise to make it available in the solution. In case you build your ML models from scratch (say, in custom Python code) you may consider MLOpsPython reference architecture which helps to operationalize your custom code for production. We’ve dived deep into MLOps topic in this article if you are interested to learn more.

Finally, because this Cloud-based solution uses the same data model and APIs standards as the Edge-based solution described here, we’ve combined the 2 solution architectures into a single one by using Kafka event streaming layer (deployed in AKS cluster in Azure Cloud) to build a Hybrid Cloud knowledge mining solution that can be deployed across Azure Public Cloud, Azure Government Cloud, S/TS Cloud, Azure Stack Hub/Edge, etc. to operate in online, offline or occasionally connected mode. Thus the entire sample solution architecture may look like the following:

PaaS services

Microsoft Azure Cloud provides a variety of PaaS (Platform-as-a-Service) services with respective SLAs which is more than enough to be the application of our interest. These PaaS components may be viewed as micro-services comprising the overall micro-services-based solution architecture. The sample app referred to in this article is relatively small. The sample list of 32 PaaS services is presented in the table below:

It is common nowadays that modern sophisticated AI apps may take advantage of multiple types of data stores and intelligence services at once to achieve set objectives, and our app is not an exception. The app leverages different types of databases according to their advantages: from non-relational for handling “master” data to relational for handling “transactional” data, and graph databases for working with Subject-Predicate-Object triples.

Best-of-breed AI apps often utilize the best of the OSS (Open Source Software) according to their respective licenses, so does our AI app. There may be multiple choices (*) considered for Open Information Extraction (OIE) triples extraction as described in the table above. Then because these OSS components will be containerized and deployed in the Cloud, we may take the full advantage of establishing Kubernetes clusters using AKS to provide these capabilities.

Central for the data ingest process is the Content Cracking Pipeline(s) written in NodeJS which orchestrates the extraction of different types of information and implements all the necessary supportive skills. Once the ingest is completed the end users will be able to query and explore information in the convenience of a web browser.

Noteworthy: The capabilities landscape in the Cloud (as well as on the Edge) is ever-changing and always expanding. That’s why it’d be wise to keep an eye on the latest improvements and newly introduced capabilities, because what you’ve implemented yesterday by hand (and/or using OSS) may already be available out-of-the-box in the Cloud today or is coming tomorrow. For example, Azure Cognitive Services documentation provide convenient “What’s new” synopsis’ pages like this for CV or this for Text Analytics. Even more specifically, text summarization feature went on a Public Preview in November 2021 in Azure Cloud, while before that we implemented the same capability using alternate means.

Resource allocation

While you may be somewhat constrained in resources while building applications for the Edge, however the Cloud provides a virtually plentiful pool of resources with a lot of options for scale up/down and scale in/out as necessary (including on demand). Buying large, one of the important decisions which has to be made is about what kind of compute to use for the custom-build components, dedicated compute or serverless compute. There’re nuances associated with this decision and we shed some more light on the topic in another article here.

Functionality

As illustrated in the Showcase section below, typically the functionality of a knowledge mining app in the Cloud contains the necessary pieces to ingest the data into the system, process the data by extracting the useful nuggets and then provide means for users to effectively explore the information obtained. That’s why logically the functionality may be divided into 2 main parts: Content Cracking and Content Exploration. While Content Cracking is more like a science, and despite the technical complexity of underlying algorithms it is still pretty straightforward what’s going on there — especially when all AI capabilities are well packaged and most of them are managed services/APIs available in Azure Cloud, the Content Exploration is more like an art. Traditionally knowledge mining applications would provide a Bing/Google-like user interface for entering search terms for users to enjoy scrolling through lists of results (with some extra tags). However, the demands of a modern day set the bar and the expectation of the users much higher: for example, in addition to the found content we might be interested in analyzing associated sentiment(s) or how about exploring the content as a dynamic and interactive network of nodes and connections. In fact, these new challenges can effectively be solved today. Specifically, Bing News API can be used to analyze the most fresh and time-sensitive data on different actual topics, along with Twitter or Facebook APIs which provide an endless barrel for sentiment intelligence. This can build a solid foundation for emerging topics analysis, trends detection, etc. depending on the use case on hand.

Semantic search and graph search techniques are proven to be more efficient for complex topics investigations where a number of “static” searches simply won’t add up to a deeper insight automagically, and you need to unfold the insight step-by-step by travelling a graph and validating or invalidating hypothesis’ one-by-one. There may be various options to tackle the graph challenge. What we found efficient are a Strategic Graph and a Tactical Graph approaches. With a Strategic Graph you would typically apply a top-down approach when starting with numerous search results you gradually reduce the search space down to a handful of insights. This makes a Strategic Graph a perfect option for Cloud-native apps which can tap into large datasets and compute. With a Tactical Graph you would typically go bottom-up instead when starting with a smaller subset of search results you gradually uncover what you are looking for by expanding the promising areas of the search space (and collapsing unpromising areas of the search space once you invalidate certain hypothesis’). Graphs allow to model relational and cause-effect links, and can help to build out a basis for more in depth scenario modeling and behavioral & systems simulation use cases (for example, state machines).

Below diagram depicts a sample design of a Strategic Graph. The main building blocks for the graph are triples (Subject-Predicate-Object) obtained by using OpenIE (Open Information Extraction) which we store in a graph database. For a given search term the resulting graph may be several levels deep (supporting the width and depth of analysis). Once the graph is built out we use graph traversal technique to search through the graph in addition to interactive expansion or collapsing of sub-graphs for a smooth user experience. For Strategic Graph you may consider using Apache TinkerPop+ Gremlin API or Azure Cosmos DB + Gremlin API (great for graph traversals).

To render a Tactical Graph we also use triples. However, for a given search term the resulting graph is supposed to be shallow to only depict the essential information (for example, for a quick situational awareness analysis) as illustrated on a diagram below. In case the user wants to go deeper on a certain sub-graph or cut off an unwanted sub-graph, they will also be able to expand or collapse the graph as appropriate interactively. For Tactical Graph you may consider using Apache Jena Fuseki + SPARQL (great for pattern matching).

Security and Identity

When it comes to security and identity for Cloud-native apps the rule of thumb for us is to leverage managed identities where possible. Azure allows to use system managed identities and user assigned managed identities associated with different services (where available). System managed identity is tied to the life cycle of the service it is associated with (one-to-one relationship), while one could assign one or many user assigned managed identities to a single service (one-to-many relationship) and user assigned managed identity “survives” even when one of the associated services gets physically deleted. Managed identity is not a panacea though (even though it is typically perfect and most secure for service-to-service communications), and in some scenarios you may want to use alternate means. In the example below we highlight a sample scenario which leverages multiple authentication flows across different components including front-end and back-end. Our UI (front-end) is an SPA (Single Page Application) which uses Azure AD integrated auth and fronted with Azure Application Gateway (AAG). Our back-end APIs are protected with Azure API management (to harden things up you can also set up networking rules for your server in a way that only API management resource IP address has access to the server). For client-server communication we utilize OAuth 2.0 implicit grant flow (described here and here) a as way for the client to obtain access tokens and securely call server-side APIs. Additional claims in JWT access tokens may also be used to implement RBAC (Role Base Access Control) based functionality on both, client and server side. The server has Zero Trust in the client, that’s why any incoming server request is validated to make sure the validity of provided access token. For the rest of the back-end communication (if managed credentials can’t be used for different reasons) you may consider implementing OAuth 2.0 client credential flow with respective secrets and/or certificates securely stored in Azure Key Vault and only provided at the time of an automated deployment of components, say, via Azure DevOps.

Last but not the least to mention in this section is that you can find comprehensive Security Testing guidelines for web apps here. And by the way, Azure API management offers a solid protection against OWASP Top 10 and beyond which you should certainly consider employing.

Scalability and Portability

With so many PaaS services available in Azure Cloud, building Cloud-native apps becomes a much more pleasant exercise. Typically different components such as compute, storage, networking, etc. services offer a rich array of scalability options. With serverless compute it is typical to leverage on-demand Cloud elasticity and scalability to meet ever changing demands. From the portability perspective, moving your Cloud-native distributed architecture solution which relies on PaaS services from a specific vendor to the Edge or to another Cloud is no easy task unless you architected your solution to be fully containerized in the first place (and running on Kubernetes in any Cloud or on the Edge). In fact, many of Microsoft first-party Cognitive services are already available as containers. If you want to learn more about the same capability as described in this article running on the Edge, please consider reviewing this article.

Documentation

We’ve already described the value of a good documentation in this article. Here we just stress on the importance of well documented APIs and Open API specification, and how easy it is to expose your APIs based on Open API spec by using Azure API management.

OSS and Licensing

Some of the components of our solution architecture may be based on OSS which means we have to be mindful about the licenses and usage terms. We discussed this topic in more details in the similar section in this article. The rule of thumb for us though will be to containerize and properly secure all the respective OSS components and predominantly deploy them into AKS (Azure Kubernetes Service) clusters for production use.

Deployment Automation and DevOps

Azure Cloud provides multiple convenient choices for deployment automation and DevOps (or GitOps) which include Terraform, Bicep, GitHub Actions, ARM templates, etc. The choice will be yours depending on the organizational standards, expertise on the team, preferences and/or other factors.

Showcase

If interested you may review this video published by the trusted partners which talks about the advantages of building AI application in Azure Cloud on an example of a similar Enriched Search Experience Project sample Cloud-native app. Also if you would like to see an example of a sophisticated Artificial Intelligence (AI) app deployed on Azure Stack Edge using GitOps, you are welcome to watch a Channel 9 video here.

Cloud Native and Serverless Platform

As described in the diagram below (Evolution to Cloud Native by Lee Calcote ©) over time the software engineering practices shifted towards SRE + Eventing + Functions + Serverless compute which are all interconnected and kind of represent the “sides of the same coin”.

Azure Cloud conveniently provides all the ingredients for success for building modern Cloud Native apps. Let’s take a quick look at the components of Azure Serverless Platform and see how they can be applied for designing modern Cloud Native apps. And also how the initial sample solution architecture may be further enhanced taking advantage of some additional components.

In the initial solution architecture we assumed that the content ingest process is dependent upon a file(s) drop in a Blob Storage (for example, a user using a cross-platform Azure Storage Explorer app to drop files). This can be further automated by introducing Automated Events with a help of Azure Event Grid for receiving discrete or streaming data from upstream systems as necessary. Other components of Azure Serverless Platform we’d like to highlight are Azure Functions (the whole orchestration logic and APIs may be based on Azure Functions instead of an Azure Web App(server component)), Azure Application Insights (which we’ve already been using for Observability & Monitoring), and Azure Logic Apps or Azure Power Apps (which can aid with Testability and more).

Noteworthy: To always be up to speed with the latest best practices for Cloud Native apps development, please consider reviewing Azure Architecture Center often. This is just one example of a popular approach leveraging Event Grid in a serverless application.

Extensibility is a common and much desired trait of modern AI apps. Typically even before you roll out a new AI capability based on a trained model you’d want to have a plan of how to retrain your model when the time comes and the new set of data is available or you enhance your training code. Nowadays often these AI capabilities are implemented by using Python itself, one of Python-based frameworks or their Python-based APIs, for example, for statistical calculations (Scikit-Learn) or CV/NLP/Speech/etc. models (PyTorch or TensorFlow). For these purposes we found that using Azure Machine Learning (AML) for hosting models and exposing API endpoints yields numerous benefits, even for packaging Python artifacts (code) and not necessarily models themselves. This way you can standardize the processes for your Data Science department and also take advantage of rich capabilities of AML endpoints with authentication, SSL, different options for compute targets (ACI, AKS, managed endpoints, etc.) and MLOps/DevOps automation.

Azure Machine Learning Models & Endpoints

Noteworthy: Going the last mile when rolling out a new AI capability may be a rocky road when you know that before the roll out the endpoint needs to be properly secured with authentication, SSL, etc. in addition to the business value of the model which you’ve worked so hard on delivering. Below we provide an illustration of how AML may help to take this head ache away by automating Release pipelines for AML endpoints with authentication and SSL enabled. Specifically, in the pipeline code below we take advantage of Azure Key Vault for storing a PFX certificate to automate the enablement of SSL, and provide primary and secondary keys for key-based authentication for ACI-based endpoint.

There’re different ways of packaging your Python-based solution artifacts as described on the diagram below:

Typically we register models in AML Model registry before deploying them into endpoints, these models may be trained using different frameworks. However, instead of a model you may also want to register Python code artifacts which implement useful calculations and produce business value. For example, consider a Python class/module packaged into a folder or business logic directly embedded into your scoring script (score.py) along with a JSON configuration.

The code below illustrates how to bring in a JSON config into your scoring script:

You may also choose to pickle JSON config, then the code may look like the following:

In case you supply the business logic in Python classes/modules, the following code demonstrates how to dynamically load those in your scoring script:

Truly, you can mix and match these useful nuggets to package your “models” the way you’d like, especially taking into consideration that in AML you may package multiple “models” under a single endpoint.

AutoML (Automated Machine Learning) in AML

We’ve just discussed about how to package your “models” in different ways. However, typically the most time consuming exercise (apart from working the the data which is always a must-have pre-requisite) is still building and testing the models themselves. And although there’s not much way around it when you are building something custom completely from scratch, there’re some challenges which have been already well addressed by SOTA (State-of-the-art) approaches. Despite their inherent complexity, these SOTA-based capabilities are becoming more of a utility/commodity and repeatedly delivering them is becoming increasingly easier. This is where Automated Machine Learning comes in. AutoML in AML already supports automated training for classification, regression, time-series forecasting, and computer vision (including image classification, object detection and instance segmentation) tasks. We found that using AutoML may significantly reduce the development time (time-to-value) as well as your future technical debt.

Below we illustrate an AutoML regression model and how the resulting Pickle model gets packaged along with the scoring script and all necessary dependencies. We also found AutoML Explainability feature (preview) quite useful as also is illustrated below.

Similarly to leveraging AutoML for classic ML tasks with Scikit-Learn producing a Pickle model, AutoML may also be leveraged for DL (Deep Learning). Below we illustrate how to train Image Classification models, for example, using Vision transformer networks or Residual networks in AutoML. By the result you get a fully-packaged trained PyTorch model (similar to what you get with this PyTorch tutorial) with a lot of goodies. Please find more information about training CV models in AutoML here.

Clean architecture

Initially the solution architecture above was implemented using JS-stack (JavaScript/TypeScript/ReactJS for the client side and NodeJS for the server side) for consistency. However, over time for better extensibility and scalability we re-implemented the server (orchestration) layer of the solution using .NET Core platform according to the clean architecture principles. Please find more information about Clean architecture in “Clean Architecture with ASP.NET Core with Steve “Ardalis” Smith” here and here.

Noteworthy: You can find more information about common web application architectures here. For architecting modern web apps with ASP.NET Core and Azure Cloud please consider consulting with guidance in ebook here (or this ebook by O’Reilly). Also Azure Architecture Center provides more detailed guidance per topic as required, for example, the best practices for caching can be found here.

Clean architecture rules:

The Application Core contains the Domain Model
All projects depend on the Application Core
Dependencies point inward towards this core
Inner projects define interfaces
Outer projects implement them
Avoid direct dependency on the Infrastructure project*

Clean Architecture Templates by Steve “Ardalis” Smith on GitHub

On GitHub Steve “Ardalis” Smith” and team provide a number of useful boilerplate templates to get started with Clean Architecture quickly. Namely, there’s a template for a web app (with UI) and another one for a worker process (without UI) as depicted below:

Clean Architecture GitHub Templates by Steve “Ardalis” Smith in Visual Studio

Put in practice and adopted to the purposed of our project, Clean Architecture template for the server (orchestration) layer may look like the following:

In this template we take advantage of Dependency Inversion/Dependency Injection (DI), Separation of concerns, CQRS (Command-Query-Responsibility-Segregation) & Mediator (MediatR) patterns. This template specifically aims for very important solution architecture properties such as testability, maintainability and extensibility. The dependencies between projects in the template are illustrated below:

Sample orchestration project dependencies

The same orchestration engine can handle multiple functional flows such as data ingest flow and ML inference flow as depicted below:

DB Ingest & ML Inference execution routes in Clean Architecture solution template

Testing distributed architecture solutions may be challenging because multiple components may be involved in every particular flow. Let’s consider an end-to-end flow for data ingest when the data arrives from outside (upstream systems) via Event Hub + Event Grid, then gets processed via our orchestration engine and finally lands in a data store of our choice. Our testing harness may provide individual unit tests and necessary integration tests, however, we may leverage Logic Apps and a plenty of connectors available there for an easy-to-build end-to-end flow. An example of such end-to-end flow which uses Logic Apps Blob connector and Event Hub connector is presented below:

Azure Logic App for mocking, functional & load testing

Similar to Logic Apps, we might leverage Power Apps (also equipped with a lot of connectors to common capabilities) or custom built Function Apps (in case we needed something purely custom) to aid with testability and more. In fact, it is not uncommon for some components of the solution to be based entirely on low code platform capabilities.

Disclaimer

Opinions expressed are solely of the author and do not express the views and opinions of author’s current employer, Microsoft.

Books you might enjoy reading

You might enjoy reading the following books in line with the topic of this article:

“Design Patterns: Elements of reusable object-oriented software” by Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides (1994) (link)
“An atypical ASP.NET Core 5 design patterns guide: A SOLID adventure into architectural design patterns, .NET 5, and C#” by Carl-Hugo Marcotte (2020) (link)
“Release it! Design and deploy production-ready software” by Michael Nygard (2018) (link)
“Site Reliability Engineering: How Google runs production systems” by Betsy Beyer, Chris Jones, Jennifer Petoff & Niall Richard Murphy (2016) (link)

Building AI Applications for Azure Cloud

Written by Alex Anikiev