Reinforcement Learning Recommendation with Attention-Guided State Modeling
Main Article Content
Abstract
This study proposes a novel reinforcement learning-based recommendation algorithm that effectively captures dynamic user preferences and optimizes long-term engagement. Unlike traditional recommendation models that focus solely on short-term accuracy, the proposed framework formulates recommendation as a sequential decision-making problem within a Markov Decision Process (MDP). It employs an attention-enhanced gated recurrent unit (GRU) network to model temporal dependencies in user-item interactions and introduces a hybrid reward-shaping strategy that integrates explicit feedback (ratings) and implicit engagement signals (clicks, dwell time). A deep Q-learning architecture with dual online-target networks ensures stable convergence under sparse and delayed feedback conditions. Experiments conducted on the Yelp dataset show that the proposed RL-Rec algorithm outperforms existing baselines such as MF, NeuMF, GRU4Rec, and DQNRec by significant margins—achieving improvements of 13.6% in Precision@10, 15.4% in NDCG@10, and 13.0% in cumulative reward. The results demonstrate smoother reward convergence and higher recommendation diversity, indicating enhanced exploration-exploitation balance. Ablation studies confirm that both attention mechanisms and recurrent state modeling substantially contribute to accuracy and policy stability. Overall, this research highlights the potential of reinforcement learning to drive next-generation recommendation algorithms that are adaptive, interpretable, and robust in dynamic environments.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Mind forge Academia also operates under the Creative Commons Licence CC-BY 4.0. This allows for copy and redistribute the material in any medium or format for any purpose, even commercially. The premise is that you must provide appropriate citation information.