Communications-aware distributed deep reinforcement learning in multi-robot teams for underwater inspection tasks

Gianni Di Caro

CMU-Q Point of Contact

Mobile multi-robot systems (MMRS) can effectively provide automated services in a number of scenarios of practical interest featuring large regions of operations, multiple tasks to be executed over extended time spans, and areas presenting hazards or that are hardly accessible. E.g., this is the case of many scenarios in surveillance, security, monitoring, inspection, logistics, and disaster response applications. � The strategic goal of this project is to develop novel algorithmic and software tools for tackling underwater inspection tasks (UIT) using autonomous MMRS. UIT of socio-economic interest include inspection of oil and gas offshore platforms, as well as inspection of port infrastructures, ship hulls integrity, archeological and extraction sites, sank vessels, fish farms. The use of MMRS can naturally provide parallelism, spatial distribution, redundancy of resources, robustness, and cooperative synergies, that altogether are a good fit for the requirements of UIT. However, system-level controls (planning and coordination) and communications need to be effectively and jointly designed to unleash the true potential of an MMRS. This design presents a number of fundamental challenges. Actions and motion of individual robots, as well as of the joint system, must be planned carefully to ensure mission completion and generate desired synergies. Coordination is necessary to coherently adapt the plans when needed, to let robots cooperate and to avoid conflicts, interferences, and collisions. Communications are the glue that lets the robots effectively implement coordination. In underwater scenarios, communications can play a critical role, given that they are commonly based on acoustic transmissions, which are inherently short range, low bandwidth, and relatively unreliable (Chitre, Shahabudeen, and Stojanovic 2008). Both centralized and distributed architectures have been extensively proposed for control and communication in MMRS. For planning and coordination, centralization has the advantage of relying on a holistic system view. However, it is typically computationally expensive (i.e., computations do not scale well with the number of robots/tasks) and requires reliable and highbandwidth communications to gather robots’ information and to send out plans. This is particularly relevant when closed-loop replanning must be iteratively performed to deal with dynamic and uncertain events, which is the typical case in underwater scenarios. � In this project we will adopt a distributed architecture for controls and communications. This choice stands on the fact that underwater communications do not meet the requirements for a centralized approach in a dynamic scenario, while evidence shows that in a distributed approach team synergies can be obtained with relatively low communication overhead (Rathnam and Birk 2011). Planning and coordination will happen in a fully distributed modality, supported by local communications among the robots. Explicitly accounting for underwater constraints and aiming to define a general solution for autonomous MMRS deployment, no special networking infrastructure is assumed to be in place. Instead, communications will be supported by a mobile ad hoc network (MANET) established by the robots themselves, where communications are based on range-limited acoustic transmissions, and data is disseminated in a multi-hop way. In distributed MMRS the interplay between controls (planning, coordination) and communications is a central problem that can be understood as a multi-objective optimization problem giving rise to a control-communication dilemma (e.g., (Scherer and Rinner 2020) includes a general overview of different adopted models, (Feo, Kudelski, Gambardella and Di Caro 2013) show some specific solutions). Joint robot plans must be defined to maximize mission performance, which in the case of UIT amounts to inspect the given region and reliably assess the presence of issues in the minimal makespan. At the same time, robots must stay in communication range in order to be able to exchange information and therefore support coordination and joint action planning. However, network provisioning might clash with mission goals, making the optimal balance hard to realize in practice. E.g., a simple solution to network provisioning would be obtained by letting the robots stick relatively close to each other at all times during the mission. However, this would not allow the robots to really spread around the area, increasing therefore mission’s makespan. A number of approaches have been proposed to address the control-communication dilemma, in MMRS, with a predominance of solutions based on the explicit formulation of mathematical programming and/or optimal control models. Communication requirements are usually integrated in model constraints or utility, and expressed in different forms (e.g., enforcing continual connectivity, establishing routing paths, admitting delay-tolerant data exchanges, setting up meeting points for recurrent connectivity, and so on). While these approaches have merits and have proven to be quite effective in a number of cases, it is also true that they are complex to formulate, usually involves a number of critical parameters about the problem environment that can hardly generalize across different environments, need to be solved heuristically since finding optimal solutions would be intractable, are based on not always realistic assumptions about the characteristics of the communications and of the problem in general (we refer to the related work in (Scherer and Rinner 2020) for an overview and list of references). � In this project, we aim to jointly tackle planning, coordination, and communication using an approach based on Distributed Deep Reinforcement Learning (D-DRL), rather than relying on explicit mathematical modeling. The goal is to let the robots jointly learn an action policy that can balance the need for mission accomplishment while supporting necessary networking. A reinforcement learning (RL) (Sutton and Barto 2018) approach has been chosen for number of reasons. First, it is model free and has the potential to adapt to changes and to different environments. Observing that underwater scenarios feature both dynamic changes (e.g., due to environment dynamics and to robots’ interactions) and properties that are inherently difficult to model with sufficient precision (e.g., water currents and temperature gradients, visibility, speed and reliability of acoustic transmissions, geometry of the structure to inspect), an adaptive modelfree approach seems a suitable choice. We expect that it can generalize and adapt better than the statically defined mathematical models. Moreover, compared to these models, the approach is potentially simpler since it does not require to explicit the model and can naturally output a stochastic policy, which is expected to be more robust than a deterministic one. � The main operational goal of the project is to devise innovative solutions for the application of D-DRL to MMRS in the context of dynamic, communication-restricted real-world problems featuring the control-communication dilemma. The need for Deep RL is justified by the fact that the joint state of the MMRS and the environment is very large and possibly continuous, therefore asking for neural network approximators, and robot actions have both temporal and causal correlations, that ask for recurrent networks. Therefore, for policy learning a convolutional neural network approximator with recurrent connections will be employed. DRL and even more D-DRL, usually require very long training times and extensive episodic samples in order to effectively learn and reach convergence. This has been a major impediment in the application of RL to real-world robotics. However, it is now well understood that the availability of realistic simulators can allow to learn effectively (and safely) in simulation, before moving to field tests for further refinements. E.g., this is a now a consolidated approach in the application of RL to self-driving cars (Dosovitskiy 2017). In this project we will take the same path. We will develop a realistic simulator and train the D-DRL algorithm in simulation. Given that for UIT even performing a single field mission would be extremely time and money expensive, developing everything in simulation is a reasonable, flexible, and cost-effective approach. To support this way of proceeding, we need a simulator for performing UIT that can provide both realistic simulation of acoustic communications in a MANET and realistic simulation of physical motion and sensing in MMRS. Since such a simulator is not fully available, we set its development as a second operational goal for the project, functional to achieve the strategic goal. � The additional operational goal of the project is to provide a simulation package for RL training and experimenting in UIT using MMRS. The simulation environment will be developed in in the ROS and NS-3 environments and will blend multi-robot simulation, underwater multi-hop communications, and 3D modeling of typical structures to inspect. It can be used for offline and online D-DRL, as well as to perform validation tests and experiments. In the simulation scenario, robots will be equipped with (virtual) cameras that will be used to perform visual recognition of possible issues (e.g., cracks in a pipeline). � Project impact: The proposed algorithms will provide novel solutions for problem representation and policy learning in D-DLR for MMRS, explicitly tackling the inclusion of communication constraints. Project’s outputs will be original contributions to the recently developing fields of DLR and D-DLR, which are gaining much momentum in the scientific community. Moreover, up to our knowledge, our system will be the first application of DRL to underwater inspection tasks. The simulation package will also be a tangible contribution to the scientific community, since it might become a reference platform for the training of RL agents, as well as for performing realistic experiments in underwater scenarios. We expect that our solutions, once migrated to physical platforms, will show the ability to perform UIT in automated modality and with unparalleled efficiency and accuracy. The availability of such robotics and software tools will have apparent socio-economic impact, especially in the context of Qatar’s oil and gas industry.

Project

UREP 27-146-2-043

Year

2021

Status

Closed