Veröffentlicht: von

As mentioned in a previous blog post, we developed an iterative algorithm for training decision trees (DTs) from trained deep reinforcement learning (DRL) agents. The algorithm combines the simple structure of DTs and the predictive power of well-performing DRL agents. In our publication, we tested the idea on seven different control problems and successfully trained shallow DTs for each of these challenges, containing orders of magnitude fewer parameters than the DRL agents whose behavior they imitate.

So far for the simulation... The real world is generally more challenging.

Thanks to a fruitful collaboration with Prof. Tichelmann's Lab of Applied Artificial Intelligence, we were now able to put the idea to the test on a real-world robotics task. The lab offers a real-world implementation of the cart pole swing-up environment, a well-known benchmark for control problems and reinforcement learning. A physical pendulum is attached to a cart via an unactuated hinge. Only by swift movements of the cart to the left or to the right, the pendulum is first to be swung up and then balanced in the unstable equilibrium. During a previous bachelor's thesis, a DQN could be trained to solve the challenge successfully. We now used this DRL agent as oracle for our experiment. While the additional challenges of a real-world experiment were noticeable, the algorithm proved its robustness and managed to find a DT on par with the DQN agent, while using fewer parameters. Further details can be found in our latest paper.

A video shows the DT agent in operation.