Microsoft HoloLens 2 MR headset (on the right) & Meta Oculus Quest 2 VR headset (on the left)

Thoughts about Rapid Prototyping and Simulations

Alex Anikiev

--

With the rapid pace of business as today, rapid prototyping becomes an extremely effective method to bring your product to the market fast and then iterate on it as necessary, or to deliver on your project on-time with quality. Agile methodology lays a great foundation for both product and project work from the process perspective. The coding part is also becoming increasingly affordable and available, modern devices and services typically come well-equipped with ready-to-use SDKs and samples. From the AI and Data Science perspective there’s also no lack of a world-class first-party managed services or open source capabilities built on proven SOTA (State-of-the-art) approaches. One example is a suite of Cognitive Services available in Microsoft Azure Cloud which includes (but not limited to) Language, Vision, Speech, Decision capabilities. For the convenience of developers, Azure Cognitive Services capabilities have been grouped into respective portals (studios) where the relevant use cases are highlighted:

Azure Cognitive Services — Language Studio: https://language.cognitive.azure.com/

Azure Cognitive Services — Vision Studio: https://preview.vision.azure.com/

Azure Cognitive Services — Speech Studio: https://speech.microsoft.com/portal

Please find more information about Azure Cognitive Services here: https://docs.microsoft.com/en-us/azure/cognitive-services/what-are-cognitive-services

In addition to the highlighted above capabilities, Microsoft Azure Cloud has many more capabilities to offer such as OpenAI (currently in Preview), Cognitive Search, Video Analyzer for Media (fka, Video Indexer), Custom Vision, Azure Machine Learning Studio, etc. All of these are perfect building blocks for rapid prototyping and bringing your solution to life quickly and with confidence.

AI/ML capabilities in the Cloud and on the Edge make it possible to implement various use cases in different industries. One of the areas of rapid growth nowadays is simulations. Simulation may have many faces from simulation of abstract processes (theoretical simulations) to simulation of particular events or object/subject behaviors (practical or applied simulations). Great advancements in XR (eXtended Reality) also contribute to the ability of developers to implement some amazing experiences using simulations (which is going to be a subject of our next article :)).

However, in this article we’ll share our experience implementing a more classic use case of a simulation of events leveraging AI/ML on an example of a classification task based on labelled data. For example, we want to implement a complex system for obtaining and analyzing crowd insights leveraging image classifiers, time-series algorithms, advanced statistical methods, etc. For the image classification part we aim to leverage SOTA Computer Vision approaches with the help of Azure Machine Learning (AML).

In our previous article (here) we’ve already shared our experiences in building image classifiers using Azure Machine Learning (AML) and its awesome AutoML feature. And in this article we’ll further narrow down the scope of our interest to the data, the process of data labelling and generation of synthetic data when it is appropriate.

Whether you are an AI/ML Engineer or a Data Scientist, you will likely agree that collecting quality data and properly (accurately) labelling it for your supervised Machine Learning tasks (including Computer Vision image classification) is a vitally needed and at the same time the most tedious process of all. We picked the crowd insights use case for illustration here because we also wanted to stress on the complexity of real-world use cases, their high demands about the diversity/representation of your training data for your solution to generalize well enough, and also the challenges you face when you need to scale your solution out.

Below diagram depicts a sample thinking process which one may go through while solving a Computer Vision image classification challenge for crowd insights use case. In this article together we’ll walk through different stages of understanding how to bring your solution from a rapid prototype (Proof of concept = POC) to an MVP (Minimal Viable Product) and all the way to a Production-grade Scalable product or project.

Illustration of the thinking process to facilitate labelled data acquisition for ML

First, we just got a task on-hand and we’re eager to get going to proof the point that our solution works on the data we’ve acquired (assuming this data is in sufficient quantity and quality). Based on the textbook guidance, we’ll get to the data labelling before we can train our image classifier (on the train set) and test it (on the test set).

Manual Labelling

By using Azure Machine Learning (AML) Data Labelling feature we can set up a Labelling project and manually label our data samples. Depending on the number of classes for image classification, you may need to label quite a bunch of images to have a sufficient representation for each class and ensure reliable inference results. For example, Azure Custom Vision portal suggests you to have minimum ~50 images per class to have somewhat reliable inference results (obviously, this number is a simple quantitative measure, not the qualitative one, that’s why “somewhat”). The process of data labelling is a tedious process, and when you’ve just thought that you are done with labelling data for a particular scene in question, you discover that you’ll need to take care of another scene — and here you go again labelling data for that scene. Needless to mention, that the environmental characteristics, lighting conditions, image quality, etc. may vary from scene to scene which makes your labelling efforts a moving target. Then you start thinking: maybe I should outsource this work to someone else? This could be an option, but is there a better way? Or maybe I should be using Object Detection approach instead of Image Classification approach, or even use some already pretrained models? All of these are legitimate concerns. We’ll keep exploring them in the next section.

But to close off with this section, as a developer you may also be interested in manipulating with your labelled data further, maybe writing some automation scripts, etc. To better understand how this can be done, let’s see how Azure Machine Learning (AML) stores the results in our Data Labelling project. Below illustration shows how Azure Machine Learning (AML) allows you to export the labelled data as CSV file, COCO file or Azure ML dataset.

Export Labelling results in AML as an CSV file

If you choose to export the labelled data as a CSV file, it is important to understand that the actual images corresponding to the labels will be stored elsewhere (for example, in the associated Azure Blob Storage as linked to in the Url field). Thus, for the future manipulation it might be handy to “combine” your labels and images where the name of your image has its label as a file name prefix. Below sample script does exactly this:

Hint: One concrete example for us to leverage this technique is when we labelled data in Azure Machine Learning (AML) Data Labelling project, export the results and import those into Custom Vision portal to quickly train our image classifiers and experiment with things. Once enough experiments have been conducted in Custom Vision portal, we may come back to Azure Machine Learning (AML) and set up permanent MLOps Model Training Pipelines to make the re-training process fully repeatable.

Autodesk Maya 3D Modeling

Manual labelling works until you need to label massive volumes of images. This logically leads to an idea of synthetic data generation and then using synthetic data for training classifiers or possibly using a hybrid approach by combining synthetic data with real data to achieve the best results.

For the crowd insights use cases you often work with people (persons) as subjects, but for a broader use case it may animals, or different objects such as cars, motorcycles, bicycles, etc.

In this section we want to explore what would it take to model a people’s crowd scene in 3D and render a labelled image from it. If we can script out the scene creation and rendering, we’d be able to generate the necessary volumes of synthetic labelled data via code.

First off, there’re many 3D modeling software packages available: Blender, Maya, Houdini, etc. For the purposes of this article we’re going to use Autodesk Maya 2022. Maya has a rich toolset for 3D modeling and rendering (including Maya Software, Maya Hardware and Arnold renderers).

The next challenge to solve is to figure out how do we build a model of a human who will be placed into Maya 3D scene. For this purpose we’re going to leverage MakeHuman OSS. Please find more information about MakeHuman software here: http://www.makehumancommunity.org/

MakeHuman software

MakeHuman models are amazingly detailed and configurable which will allow us to introduce the necessary diversity into our resulting image dataset.

Next, we’re going to need to bring our MakeHuman models into Maya for which we can leverage Filmbox (FBX) file format supported by both MakeHuman and Maya.

Export MakeHuman model as FBX file:

Export MakeHuman model as FBX

And then import an FBX file into Maya as shown below:

Import FBX human model into Maya

One thing to note is that MakeHuman models are indeed very detailed and realistic (especially if we look at the number of vertices, faces and triangles which make up one human model — quite astonishing). MakeHuman models are pre-baked this way and ready for configuration if needed.

MakeHuman model level of detail

After we bring one or more template MakeHuman models into Maya, we can leverage Maya’s Python SDK to automate the process of scene creation. We may write a Python script and execute it in Maya’s Script Editor, or we may create a custom Maya plug-in and run it as shown below:

Running Maya Python plug-in

Below sample script illustrates how to automate scene generation in Maya using Python:

In this script we use template MakeHuman models imported into Maya as meshes to duplicate them in place, then move+rotate+scale them to add an element of randomness, and finally render labelled images. By the result, we can generate a lot of labelled data for a single scene. When adding more scenes to the mix, we achieve a multiplier effect with even more volume. Please note that Maya allows you to control all aspects of 3D scene including camera positioning, lighting conditions, etc.

Synthetic Data Generation with Python

Using Maya 3D modeling capabilities to generate large volumes of synthetic data is very convenient, but there’re some nuances which may screw you up though. Just to name a few: it may be difficult to precisely reconstruct the scene itself in Maya, or the quality of real images may be significantly lower than the quality of rendered images from Maya. And, by and large, achieving photo-realistic quality with 3D models in Maya may be a challenge.

In this case, it may make sense to leverage some real images of people instead or in addition to 3D models, and mount them in into a real image of an empty scene. Below illustration contrasts real people photographs with images of 3D models:

People models (Photos) & MakeHuman models

To better illustrate the point we leveraged a collection of Microsoft Office Cutout people and other custom-built models, and stored them in a secured private GitHub repo here: https://github.com/alexanikiev/cutoutpeople.

More MakeHuman models

We used MakeHuman software to create a number of custom models to ensure enough diversity and representation for the train data set. We stored these images and PNG files with transparent background.

Hint: You can remove the background from your PNG images using a simple Python script or one of the online tools such as Adobe’s tool: adobe.com/express/feature/image/remove-background

More variability for MakeHuman models (dressed in different colors, etc.)

With working with synthetic data there’s never enough variability and you’d always wish for more. Some creative ways for introducing more variability into your data set include varying shapes/sizes of your human models, varying their race/pose/etc, dressing them differently, etc. MakeHuman allows to make a lot of modifications to your human models during design-time including redressing, and some can be done during run-time by means of a simple Python scripting (dressing models in different colors clothes).

We shared a sample minimalistic synthetic scene images generation in Python on GitHub here: https://github.com/alexanikiev/synscenepy

This way we can generate a lot of labelled images for training image classifiers via code to achieve promising inference results. You may further boost up the results by combining real labelled data with synthetically generated labelled data.

Using Point Clouds and LiDAR Scanning

Using Python scripting to generate volumes of labelled data is indeed one simple and effective method. However, there’re other challenges associated with this approach which include difficulties to achieve the right level of diversity for your images (depending on the number of models you have at your disposal and your abilities to vary them to generate more unique variants), and additional privacy concerns which may arise while using photographs of real people during training and inference.

Luckily, there’re technologies already adopted in the market which allow to overcome these challenges. One noteworthy technology in this context is LiDAR (Light Detection and Ranging). Please find more information about LiDAR here: https://en.wikipedia.org/wiki/Lidar. LiDAR scanners allow to build a Point Cloud-based representation of 3D space. LiDAR scanners are widely used in geo-spatial research, but can also be used in different industry scenarios including Retail and crowd insights use cases. Point Cloud is a set of data points in space, think of a set of (X, Y, Z) coordinates. Please find more information about Point Cloud here: https://en.wikipedia.org/wiki/Point_cloud. A recommended format to represent and exchange LiDAR Point Cloud data is LAS. Please find more information about LAS format here: https://en.wikipedia.org/wiki/LAS_file_format.

In application to our crowd insights use case, Point Cloud can be represented in Maya as MASH. Once you create a MASH element based on a Mesh in Maya, you can control the number of points to be displayed via MASH_Distribute node as shown below:

Point Cloud as Maya MASH

Now to turn this visual MASH into an array of (X, Y, Z) coordinates we can look up its MASH_Python node with the associated Python code as shown below:

Point Cloud data in Maya

This way we can effectively model Point Clouds in Maya. Below we illustrate a Point Cloud for MakeHuman model with different degrees of fidelity, from 100 data points to 100000 data points:

Sample Point Clouds in Maya

With a sample Python script we visualize Cloud Point data points as a plot:

It’s a long script because of 1000 data points hard-coded in there, but it is a self-sufficient one which you can copy-paste and run yourself to generate the very exact plot with a human shape Point Cloud.

By leveraging LiDAR scanners (for example, this one) the “images” (based on Point Clouds) you use for inference will be “anonymous” to satisfy your potential privacy requirements. At the same time you can use Maya MASHes to model Point Clouds for human models for generating labelled training set via code. Obviously, when using Point Clouds to represent humans, also it is going to be easier to satisfy the diversity and representation requirements for your synthetically generated training set (you can mostly focus on the essential characteristics for human models such as size/form/pose/etc.).

Simulation UX (User Experience)

When working on business value-adding AI/ML capabilities and focusing so much on the quality of your models and the solution architecture overall, you should not forget about the User Experience either. A sound user experience certainly adds a lot to the success of your product or project. Depending on the details of your use case, you will invest appropriately into building the right User Experience yourself or using already available components (OSS or as a part of a vendor/partner platform which you may be using).

In this article we’ll just share a single handy example of how you may rapidly build a UI for your simulation project using Python. There’s a number of great Python libraries available out there for these purposes. Below we provide an example of such UI using Streamlit Python library:

Streamlit library allows to build an interactive UI available in a web browser based on the number of prebuilt components including different forms of user input, various types of charts, etc.

Running Streamlit app

Streamlit apps implement Responsive Design out-of-the-box and look good on devices with different form factors including desktops, laptops and mobile devices. If needed Streamlit apps can be easily containerized and deployed in the Cloud via DevOps (just like other Python apps). Below is an illustration of an interactive UI for a simulation app built in Streamlit:

Sample interactive UI for a simulation app using Streamlit

Please find more information about Python Streamlit library here: https://streamlit.io/.

Also please consider using Microsoft Azure Power Apps and/or PowerBI for your UX needs. These capabilities are great not only for rapid prototyping, but also for building available and secure apps in the Cloud.

Thank You

PS. Please consider supporting Ukraine 🇺🇦 Armed Forces ❤️ here: https://bank.gov.ua/en/about/support-the-armed-forces.

--

--