Improving Reasoning Stability in Large Language Models via Iterative Self-Questioning and Semantic Calibration

Main Article Content

Callum Redfern

Abstract

Large language models (LLMs) have demonstrated strong capabilities in natural language understanding and generation; however, their reasoning stability and consistency remain limited, particularly in multi-step inference tasks. This paper proposes a novel framework that integrates iterative self-questioning with semantic calibration to improve reasoning robustness. The approach introduces a multi-stage reasoning loop in which intermediate outputs are recursively evaluated and refined using a confidence-aware scoring mechanism. Experiments are conducted on multiple benchmark datasets, including GSM8K, StrategyQA, and MultiArith. The proposed method improves reasoning accuracy from 78.4% to 85.9% (+7.5%) on GSM8K compared to standard chain-of-thought prompting. On StrategyQA, accuracy increases from 71.2% to 76.8%, while consistency across repeated runs improves by 12.3%. Furthermore, hallucination rates are reduced by 18.6%, as measured by factual consistency metrics. Ablation studies show that semantic calibration contributes the most significant performance gain (+4.2%), followed by iterative refinement (+3.1%). These results demonstrate that structured reasoning enhancement mechanisms can substantially improve the reliability of LLM outputs in complex reasoning scenarios.

Article Details

How to Cite
Redfern, C. (2026). Improving Reasoning Stability in Large Language Models via Iterative Self-Questioning and Semantic Calibration. Journal of Computer Science and Software Applications, 6(4). Retrieved from https://mfacademia.org/index.php/jcssa/article/view/271
Section
Articles