Advance Search
SUN Chuanmeng,JIAO Bin,FU Yiyan,et al. Underground coal-rock image recognition using Swin-UNet with agent attention mechanism[J]. Coal Science and Technology,2025,53(11):158−171. DOI: 10.12438/cst.2025-0946
Citation: SUN Chuanmeng,JIAO Bin,FU Yiyan,et al. Underground coal-rock image recognition using Swin-UNet with agent attention mechanism[J]. Coal Science and Technology,2025,53(11):158−171. DOI: 10.12438/cst.2025-0946

Underground coal-rock image recognition using Swin-UNet with agent attention mechanism

  • To address the challenges of coal-rock image segmentation under complex underground mining conditions—such as low illumination, high noise, and motion blur—this paper proposes an improved semantic segmentation model named Agent Swin-UNet, which integrates an Agent Attention mechanism into the Swin-UNet (Sliding Window Transformer U-Net) framework. The model adopts Swin Transformer as the backbone network, leveraging its hierarchical Window Multi-Head Self-Attention (W-MSA/SW-MSA) mechanism to establish long-range illumination dependencies, thereby alleviating detail loss in dark regions and degradation of local features. An Agent Attention Module is embedded into the skip connections between the encoder and decoder. This module introduces a triple-cooperative mechanism that employs agent tokens to realize an “aggregation-broadcast” style of feature interaction, reducing computational complexity from O(N2) to O(Nn) while preserving global semantic modeling capability and significantly improving computational efficiency. By incorporating spatially-aware bias, the model enhances its adaptability to noise distribution and effectively suppresses unstructured interference, while the integration of depthwise separable convolution (DWC) strengthens local texture reconstruction, improving boundary delineation and fine-detail recovery. To mitigate the severe foreground-background imbalance inherent in coal-rock imagery, a composite loss function combining cross-entropy, Dice, and multi-scale structural similarity (MS-SSIM) losses is designed. This hybrid supervision optimizes the training process from multiple perspectives—classification consistency, regional overlap, and structural similarity—enhancing semantic coherence and boundary completeness under class-imbalance conditions. Experiments on the Shaanxi-Shanxi-Hebei Structural Coal Dataset demonstrate that Agent Swin-UNet achieves 91.26% mIoU and 88.81% mPA on the standard test set, outperforming Segmenter, DeepLabv3, and the baseline Swin-UNet. Under noise interference with an intensity of 0.05, its mIoU remains 84.14%, indicating excellent noise robustness. Ablation studies further confirm that the Agent Attention Module is the principal source of performance improvement, particularly in high-noise environments (> 0.05). The proposed method provides a robust and efficient solution for rapid coal-rock segmentation and intelligent excavation in complex underground environments.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return