Artificial Intelligence for the Creation of Synthetic Environments

By Bodhiswatta Chatterjee
Applied Researcher

Nov 30^th 2020

Want More?

Stay connected and subscribe to this blog.

The ability to generate massive numbers of labeled training examples using 3D models in a geospatially accurate setting is a challenge. But what if you add feature attributes such as material IDs and textures that would allow the collection of signatures for targets using different physic-based sensor modalities such as radar, infrared and night vision? And what if you also add the ability to build and maintain a 3D database from the torrent of Geospatial Big Data Sources including satellite imagery, IoT sensors, GIS data, Lidar and other sensor types?

Impossible, right? Wrong.

VELOCITY is a 3D database production solution designed to solve all of these problems and challenges through automated workflows and AI-ready approaches for processing disparate data types.

Built by Presagis, VELOCITY is based on over 30 years of experience in creating 3D databases to support simulation and training exercises for military and civilian applications. Our approach removes the human-in-the loop to create simulation-ready 3D databases in hours as opposed to months. VELOCITY uses standards based geo-processing to create richly attributed and highly-accurate 3D synthetic terrains designed for training and simulation databases.

Our approach removes the human-in-the loop to create simulation-ready 3D databases in hours as opposed to months.

For example, users can include 3D models anywhere in the scene, generate labeled training examples, and leverage multi-sensor views of features including radar, infra-red, night vision and other modalities. Users can also include weather, entities and patterns of life and other simulations into their databases while taking full advantage of feature datasets that include material identification, textures and other physics-based attributes.

So how does VELOCITY do this?

In order to train AI networks, synthetic data can be generated using satellite imagery. Here’s how we accomplished it:

We crop the area of interest and extract a footprint vector of the data a user wants to generate.
Once extracted, we place the data in the VELOCITY automated pipeline and use this to generate various permutations of the data required, such as different textures, colors, materials, etc…). This will allow users to create richer, more realistic environments.
Using these various permutations, we then generate geo-specific 3D models of buildings, landmarks, and hundreds of other features.
Once the 3D models have been generated, we use Vega Prime – the Presagis 3D rendering software – to accurately place the models on top of the satellite imagery.
Snapshots of the 3D model and imagery are taken from many different angles using many different sensors, which then allows us to generate hundreds of thousands of labelled images, ready to use for AI training.

As simple as it is described above, there is a large amount of heavy lifting happening behind the scenes. And reader be warned; things are about to get a lot more technical…

On the left, is a 3D representation of Open Street Map (OSM) data which have data for only a few large buildings. On the right, the same imagery in CDB format created with AI and VELOCITY.

The Science behind VELOCITY

In this decade we have seen the power of Artificial Intelligence (AI) as it has proven its effectiveness in solving hard and long standing open problems in computer vision. Effective solutions to problems like image classification, object detection and semantic segmentation have been achieved using a specific type of AI technique called Deep Neural Networks. Analysis and extraction of geo-spatial features from remote sensor imagery has always been an area of interest for creation of 3D synthetic environments in remote sensing research.

Creation of synthetic environments requires a large amount of Geographic Information System (GIS) data in the form of vectors such as building footprints, road networks, vegetation scatter, hydrography, etc. Publicly-available GIS information (e.g. Open Street Maps [OSM]) often contains insufficient amount of information and is not correlated with the Electro Optical (EO) or Infrared (IR) imagery of the area. Manually-labeled data of much better quality can be acquired, albeit at a higher cost.

Current advances in computer vision tasks allow object detection and semantic segmentation with relatively high accuracy using deep neural networks. These are ideal for the purpose of extraction of features like (building, roads, trees, water, etc.) from remote sensor imagery. A simple conversion of the extracted features into vectors will suffice to feed in a 3D synthetic environments reconstruction. The extracted features can be used directly to create a geo-typical synthetic environment using VELOCITY, an AI powered content creation tool.

Building Footprint Extraction

Among all the geospatial features, building footprints are considered to be the most important as buildings are the most important and defining feature of a 3D urban synthetic environment. With the availability of very-high resolution satellite imagery, remote sensing community is pursuing automatic techniques for extracting building footprints for cities with varied building types. There a few interesting and high quality datasets [INRIA, AIRS, SPACENET] openly available for the purpose of training and benchmarking of deep neural networks for building segmentation and building footprint extraction.

The most important component of this pipeline is the CCN based neural network architecture as the quality of building footprint vectors depend heavily on the prediction accuracy of the model.

A generic process of AI based building footprint extraction starts with the remote sensor imagery feed to a Convolutional Neural Network (CNN) based semantic segmentation AI model. It produces a binary segmentation mask where each pixel of the input image is either classified as a building or non-building pixel. The next step is to extract the building boundaries from the binary segmentation mask as a post processing, and refine the extracted footprint vectors if required. The most important component of this pipeline is the CCN based neural network architecture as the quality of building footprint vectors depend heavily on the prediction accuracy of the model.

Convolutional Neural Network Architectures

In computer vision, the task of masking out pixels belonging to different classes of objects such as background or buildings is referred to as semantic segmentation. Recently, there has been lot of research to find the best CNN architecture to solve the problem of semantic segmentation. Among all the architectures U-Net is the most common and widely used due to its simple and easy to train design and wide adoption by the remote sensing community.

The U-Net architecture is also known as an encoder-decoder style architecture. The first part of the neural network is called the encoder as it extracts features from the input image and the later part of the network is called decoder as it maps the down sampled features back to the spatial regions (pixels) of the input image. There are skip connections between the encoder and decoder blocks at every level of the network. The skip connections help faster training of the network as it facilitates faster gradient back-propagation. A lot of research has been done to find the best block architecture for the U-Net style segmentation networks.

In the actual design of U-Net architecture each block at each layer consisted of 2 convolution layers. Further research supported the fact that more complex block structures provide better prediction accuracy, the most frequently used blocks in remote sensing are ResNet, DenseNet and wide-ResNets. In one of our studies we found that dense blocks with feature recalibration using squeeze and excitation (SE) blocks work best for building segmentation. We used a fully convolutional Dense U-Net style architecture with 103 convolutional layers. We introduced the idea of feature recalibration for each dense block as every feature extracted by the CNN layer are not equally important. To evaluate our intuition we benchmarked our model on the INRIA Aerial Image labeling challenge and we were able to get significantly better than state-of-the-art on this dataset. The proposed research was conducted in collaboration with Concordia University’s Immersive and Creative Technologies lab, more details about this work can be found here.

Shows results of building footprint extraction using ICT-Net on Hawaii imagery. (a) Color imagery. (b) Single channel prediction mask of same size as input imagery obtained as output of the network. (c) Confidence of network prediction shown as heat map. Red to Blue signifies high to low confidence for the building class. (d) Extracted and refined building footprints shown as green polygons on top of the imagery.

Segmentation mask to 3D synthetic environment

The output of the segmentation neural network model is a mask of the same size as the input image with each pixel labeled as building/non-building. The next step for extraction of building footprints from the binary mask would be to extract the building contours (boundaries) in the form of vectors. The extracted vectors often contain jittery building boundaries and need to be refined. In most cases a polygon simplification algorithm like Douglas-Peucker would suffice. To obtain simplified but very high quality building footprint vectors, we apply a well thought-out set of post processing steps which includes a selection of one or more of the following techniques (i) bounding box replacement, (ii) square simplification or (iii) Douglas-Peucker algorithm for polygon simplification, based on a predefined threshold value.

The refined vectors are feed into the VELOCITY framework, for creation of a synthetic database.

An automated pipeline, VELOCITY allows us to process GIS data on a large scale and create procedural databases. It has the capacity to manipulate, attribute and process the GIS source data (which can consist of vectors, imagery and raster data). An attribution such as height is inferred using algorithms that are based on the area of the footprint. If no additional information is given in the pipeline, the roof type and color are randomly attributed to the buildings from a library of building templates, resulting in a variety of 3D models in the database. This creates a geo-typical representation of the real world environment once Velocity takes the processed data and starts publishing the new content database (i.e. the 3D scene).

#AI #Automation #Content Creation #Synthetic Environment

Want More?

Stay connected and subscribe to this blog.

Back to articles

VELOCITY uses AI-powered automation to quickly create synthetic, large-scale, 3D urban environments.

vIITSEC Paper Presentation

Presagis is proud to present a paper discussing advancements in the field of AI/ML in content creation.

Titled Creating Geospecific Synthetic Environments Using Deep Learning and Process Automation (presented by authors Bodhiswatta Chatterjee, Bhakti Patel, and Hermann Brassard) , the paper was presented at vIITSEC 2020 and is now available.

View/Download the Paper from NTSA