Veröffentlicht: von

We are delighted to announce that our article “Iterative Oblique Decision Trees Deliver Explainable RL Models” was accepted and is now part of the special issue “Advancements in Reinforcement Learning Algorithms” in the MDPI journal Algorithms (impact factor 2.2, CiteScore 3.7) .

Explainability in AI and RL (known as XAI and XRL) becomes increasingly important. In our paper we investigate several possibilities to replace complex “black box” deep reinforcement learning (DRL) models by intrinsically interpretable decision trees (DTs) which require orders of magnitudes fewer parameters. A highlight of our paper is that we find on seven classic control RL problems that the DTs achieve similar reward as the DRL models, sometimes even surpassing the reward of the DRL models. The key to this success is an iterative sampling method that we have developed.

In our work, we present and compare three different methods of collecting samples to train DTs from DRL agents. We test our approaches on seven problems including all classic control environments from Open AI Gym, LunarLander, and the CartPole-SwingUp challenge. Our iterative approach combining exploration of DTs and DRL agent’s predictions, in particular, is able to generate shallow, understandable, oblique DTs that solve the challenges and even outperform the DRL agents they were trained from. Additionally we demonstrate how, given their simpler structure and fewer parameters, DTs allow for inspection and insights, and offer higher degrees of explainability.
To readers interested in explainable AI and understandable reinforcement learning in particular, we recommend to take a look at our open-access article.

Decision surface MountainCarThe figure shows the decision surfaces of DRL models (1st column) and various DT models (2nd and 3rd column) on the environments MountainCar (upper row) and MountainCarContinuous (lower row). The little black dots visualize various episodes showing how MountainCar rolls back and forth in the valley until it finally reaches the goal on the mountain top (x=0.5). The DRL models exhibit more complicated decision surfaces, while the DT models reach the same performance (number in round brackets in the title) with simpler decision surfaces.