Facilitating Dynamic Building Information Models with Mobile Robots

Automation in construction and facility management could significantly improve efficiency and productivity throughout the lifecycle of modern facilities. However, for robots and autonomous systems to operate in effectively in dynamic and unstructured environments such as a construction sites, they must be able to infer or obtain a semantic model of the environment.

Previous efforts in Civil Engineering and Robotics have focused on developing static maps of the environment, often neglecting the possibility that objects can be moved. This project investigates how information from mobile robots can be used to dynamically update an existing model of the building environment with semantic-rich information. A new object-detection-first (ODF) computer vision algorithm is developed to track and localize common building site objects. Additional algorithms are proposed for merging and removing objects from the building information model. The proposed system is validated using data collected with a purpose-built mobile robot.

Data is collected using a SLAM-enabled mobile robot with RGB-D sensors. Images from the mobile robot vision system is analysed with our ODF computer vision algorithm. Objects are added and removed from the building information model in real-time

Framework

We start by proposing a generic framework for updating building information models using mobile robots. In this framework, each mobile robot obtains information about its environment using a combination of LiDAR and RGB-D cameras. An object recognition algorithm is used to detect objects in the field-of-view of the mobile robot and determine the position of each object relative to the mobile robot. The global position of the mobile robot is estimated using a localization algorithm such as Adaptive Monte Carlo Localization (AMCL) or Continuous-Time SLAM. An object tracking and global object localization layer estimates the global positions of detected objects, and filters false-positive detections. Information is broadcasted to the building information model where is used to dynamically update the model.

The proposed computational framework

Object-Detection-First Computer Vision Algorithm

The core component of the proposed system is a computer vision algorithm capable of classifying and localizing objects of interest in images captured by a camera on the mobile robot. Inspired by the recent progress in 2D instance segmentation, an object-detection-first system is developed, where objects are first identified in 2D images, before their size and location is estimated. To maximize detection accuracy, the system is based on the Mask R-CNN architecture, which has obtained state-of-the-art accuracy in a number of instance segmentation challenges [21]. Mask R-CNN consists of three stages. The first stage, called a Region Proposal Network (RPN), proposes candidate object bounding boxes. The second stage, performs classification, bounding-box regression, and instance segmentation in on selected regions of the image. The second stage is extended to predict the three‐dimensional size of objects as well as the color of the object in the absence of shadows.

Head architecture of the proposed defect detection network. Numbers denote spatial resolution and channels. Arrows denote either convolution, deconvolution, or fully connected layers as can be inferred from context (convolution preserves spatial dimension while deconvolution increases it). All convolution layers are 3×3, except the output convolution layer which is 1×1. Deconvolution layers are 2×2 with stride 2. The ReLU activation function is used in hidden layers.

The input to the system is a three‐dimensional array of image pixels, in RGB. The input image can have any size, but we choose to resize the image to have a maximum side length of 600 px. The output is a list of bounding box coordinates.

Size Estimation is integrated into the ODF prediction stage [expand] [Loss function]

Color Estimation is integrated into the ODF prediction stage [expand] [Loss function]

Position Estimation is undertaken using the depth image and the predicted mask. The approximate position of each foreground object is calculated using depth information from the RGB-D camera. Z_camera is estimated by averaging the depth measurements across the area within the segmentation mask. The position of the object can be estimated using the pinhole camera model: [Equations] where f is the focal length of the camera and (x_com, y_com) is the center of mass of the mask in pixel coordinates.

The approximate position of each foreground object is calculated using depth information from the RGB-D camera.

Operational system is trialled in a modern facility

Mobile robot recognizing objects

Object Association and Tracking

In general, it is inappropriate to directly update a building information model using predictions from an object detector, as false detections tend to occur with a non-trivial probability. To overcome this issue, a Kalman filter is used to track the position of objects across successive camera frames. The Kalman filter has the following important features that the proposed system can benefit from:

Correction of the predicted position based on new measurements
Reduction of false-positive detections introduced by the object detector
Association of multiple objects to their tracks

Objects are currently tracked in the image coordinate system. Future work will investigate tracking objects a global three dimensional coordinate system. The Kalman filter is initialized using a constant velocity model.

Associate of detections to tracks

Mobile Robot

A mobile robot is designed and constructed to test the proposed algorithms. The mobile robot is based on the A4WD1 rover platform from Lynxmotion. An NVIDIA TX2 computer is fitted to the robot for data processing and recording. The mobile robot is also fitted with a Zed RGB-D camera for collection of RGB-D images. A RPLIDAR A2 horizontal laser scanner is mounted on the mobile robot for collecting laser scan data.

The global position of the mobile robot within the building is constantly estimated using an advanced simultaneous localization and mapping (SLAM) algorithm [4]. The RPLIDAR sensor is used to provide laser scan data for SLAM at approximately 10 Hz, and the Zed RGB-D camera is used to provide visual odometry for SLAM at 60 Hz. An initial map of the building is created by manually driving the mobile robot around the building, while running the SLAM algorithm.

MOBILE ROBOT SLAMMING

Mobile robot localization using SLAM.

Paper

A 2D-3D Object Detection System for Updating Building Information Models with Mobile Robots
M. Ferguson, K. H. Law
[Coming Soon]

Dataset

The dataset is split into test and train sequences. The training sequence contains 5121 objects labelled across 821 RGB-D images. A furthur 1010 objects across 186 images form the test set

Download [Coming Soon]

Source Code

SurfaceNet Source Code [Coming Soon]
Mobile Robot Source Code [Coming Soon]