融合图像去尘与特征重校准的矿井人员行为检测方法

周孟然; 秦超

doi:10.12438/cst.2025-1523

融合图像去尘与特征重校准的矿井人员行为检测方法

周孟然,
秦超

Underground personnel behavior detection framework incorporating image dehazing and feature recalibration

摘要

摘要: 煤矿井下受限空间内进行采掘与运输作业时，会产生高浓度煤尘，导致监控视频出现严重的尘雾遮挡与特征退化，图像质量显著下降，极大地制约了目标检测算法在识别人员异常动作时的性能。为解决煤矿井下尘雾环境对行为检测算法的干扰问题，提出了一种基于改进YOLOv11n的方法——GRR-YOLO。为应对图像中存在的大量煤尘，提出基于双频域融合的图像去尘模块（GDformer），通过在频域与空域之间进行可学习的特征转换与融合，实现了全局信息与局部细节的协同建模，并借助残留通道先验（RCP）信息，增强局部细节恢复，显著提升图像清晰度。在主干网络中引入基于上下文先验引导的特征提取网络（ReLookNet）。通过构建结合门控动态空间聚合（GDSA）的动态上下文信息流增强模型，实现特征与权重的双重引导，增强了模型对场景的全局语义理解与上下文依赖建模能力。此外，提出引入重校准机制的特征融合网络（Re-reviewFPN）。通过选择性边界聚合（SBA）模块和轻量级特征增强模块（FEM），利用双向交互机制实现边界细节与高级语义信息的互补增强，优化跨尺度特征融合。实验结果表明：在煤矿井下行为数据集(DsLMF+)上，GRR-YOLO的均值平均精度（mAP@0.5）达到84.3%，F1分数为79.1%，其综合性能优于包括YOLO系列最新变体及RTDETR-18在内的多种先进模型。值得注意的是，该模型在仅包含2.4 M参数和6.2 GFLOPs计算量的轻量化架构下实现了上述优异性能，推理速度达到253 FPS，完全满足井下实时处理的需求。这表明GRR-YOLO模型有效缓解了尘雾导致的图像退化问题，显著提升了行为检测的精度与鲁棒性。并且在精度、速度与复杂度之间取得了较好的平衡，具备较高的实际应用价值。

Abstract: In underground coal mines, mining and transportation operations conducted within confined spaces generate high-concentration coal dust, leading to severe dust fog occlusion and feature degradation in surveillance videos. This significantly degrades image quality and severely hampers the performance of target detection algorithms in recognizing abnormal human behaviors. To address the interference of dust and fog environments in coal mines on behavior detection algorithms, GRR-YOLO is proposed as an improved YOLOv11n-based method. To handle the substantial amount of coal dust in images, an image dust removal module named GDformer is introduced, which is based on dual-frequency domain fusion. By performing learnable feature transformations and fusion between the frequency and spatial domains, it achieves synergistic modeling of global information and local details. Additionally, residual channel prior (RCP) information is leveraged to enhance the restoration of local details, significantly improving image clarity. In the backbone network, we incorporate a feature extraction network guided by contextual priors, termed ReLookNet. A dynamic contextual information flow enhancement model integrating gated dynamic spatial aggregation (GDSA) is constructed to achieve dual guidance of features and weights, thereby strengthening the model’s global semantic understanding of scenes and its ability to model contextual dependencies. Furthermore, a feature fusion network with a recalibration mechanism, named Re-reviewFPN, is proposed. Through the selective boundary aggregation (SBA) module and the lightweight feature enhancement module (FEM), a bidirectional interaction mechanism is employed to achieve complementary enhancement of boundary details and high-level semantic information, optimizing cross-scale feature fusion. Experimental results on the DsLMF+ dataset, a dedicated underground coal mine behavior dataset, demonstrate that GRR-YOLO achieves a mean average precision (mAP@0.5) of 84.3% and an F1 score of 79.1%, outperforming state-of-the-art models including recent YOLO variants and RTDETR-18. Notably, the model maintains a highly compact architecture with only 2.4 M parameters and 6.2 GFLOPs, achieving an inference speed of 253 FPS, which fully satisfies real-time processing demands in underground environments. These results confirm that GRR-YOLO effectively mitigates dust-induced image degradation, significantly boosting the accuracy and robustness of behavior detection. Moreover, it strikes an optimal balance among accuracy, computational efficiency, and model complexity, underscoring its strong potential for practical deployment in real-world mining scenarios.

HTML全文

参考文献(32)

施引文献

资源附件(0)