Abstract:
In underground coal mines, mining and transportation operations conducted within confined spaces generate high-concentration coal dust, leading to severe dust fog occlusion and feature degradation in surveillance videos. This significantly degrades image quality and severely hampers the performance of target detection algorithms in recognizing abnormal human behaviors. To address the interference of dust and fog environments in coal mines on behavior detection algorithms, GRR-YOLO is proposed as an improved YOLOv11n-based method. To handle the substantial amount of coal dust in images, an image dust removal module named GDformer is introduced, which is based on dual-frequency domain fusion. By performing learnable feature transformations and fusion between the frequency and spatial domains, it achieves synergistic modeling of global information and local details. Additionally, residual channel prior (RCP) information is leveraged to enhance the restoration of local details, significantly improving image clarity. In the backbone network, we incorporate a feature extraction network guided by contextual priors, termed ReLookNet. A dynamic contextual information flow enhancement model integrating gated dynamic spatial aggregation (GDSA) is constructed to achieve dual guidance of features and weights, thereby strengthening the model’s global semantic understanding of scenes and its ability to model contextual dependencies. Furthermore, a feature fusion network with a recalibration mechanism, named Re-reviewFPN, is proposed. Through the selective boundary aggregation (SBA) module and the lightweight feature enhancement module (FEM), a bidirectional interaction mechanism is employed to achieve complementary enhancement of boundary details and high-level semantic information, optimizing cross-scale feature fusion. Experimental results on the DsLMF+ dataset, a dedicated underground coal mine behavior dataset, demonstrate that GRR-YOLO achieves a mean average precision (mAP@0.5) of 84.3% and an
F1 score of 79.1%, outperforming state-of-the-art models including recent YOLO variants and RTDETR-18. Notably, the model maintains a highly compact architecture with only 2.4 M parameters and 6.2 GFLOPs, achieving an inference speed of 253 FPS, which fully satisfies real-time processing demands in underground environments. These results confirm that GRR-YOLO effectively mitigates dust-induced image degradation, significantly boosting the accuracy and robustness of behavior detection. Moreover, it strikes an optimal balance among accuracy, computational efficiency, and model complexity, underscoring its strong potential for practical deployment in real-world mining scenarios.