Pruning replay buffer for efficient training of deep reinforcement learning

Reinforcement learning (RL) is a type of machine learning that develops artificial intelligence by training an algorithm through multiple generations to understand what strategies to use in various situations. RL has applications in virtually every field, from transportation to research. However, RL is limited in that it is very resource intensive, partially because of the necessity of a large replay buffer, which contains the data learned from each episode. This study provides knowledge on replay buffer reward mechanics to inform the creation of new pruning methods for improving RL efficiency. Specifically, we develop a novel approach designed to reduce storage complexity of the replay buffer and training data and thus improve model efficiency. We create three algorithms, Threshold Replay Buffer Pruning (TRBP), Cluster Replay Buffer Pruning (CRBP), and Inverse Threshold Replay Buffer Pruning (ITRBP), for this purpose, testing three contradicting theories on reward mechanics. We hypothesized that TRBP’s theory would be the most conducive to real-world conditions, which our results corroborated. These results indicated that TRBP can achieve a 2-fold reduction in replay buffer size with only a 5% reduction in score, while CRBP and ITRBP performed much worse. This supported the hypothesis that TRBP’s reward thesis is the most accurate out of the three algorithms, as well as demonstrated that TRBP is a potentially effective replay buffer pruning algorithm.