露天矿无人驾驶道路可行驶深度感知模型

刘光伟; 雷健; 张磊; 郭直清; 郭伟强; 朱虹锦; 柴森霖; 秦飞龙; 谯乾林

doi:10.12438/cst.2025-0726

摘要: 露天矿复杂动态环境下，道路可行驶区域精准分割是无人驾驶技术推广的关键瓶颈。传统视觉感知方法因局部特征提取局限和单一模态信息缺陷，难以应对长距离依赖建模、低对比度区域判别等挑战。为此，提出了一种基于空域−深度跨模态融合的露天矿道路可行驶区域深度感知模型。模型以金字塔视觉Transformer（PVTv2）为空域分支主干，利用线性注意力机制实现多尺度语义特征高效提取，解决了长距离依赖问题并降低高分辨率图像计算复杂度。采用DepthAnything模型生成鲁棒深度图，引入了CATANet令牌聚合机制挖掘深度特征中的空间几何信息，辅助区分颜色纹理相近区域。设计多层次自适应融合模块，通过通道级与像素级注意力动态加权及跨尺度上下文传播，实现了语义特征与几何特征的精细化交互；构建了包含加权交叉熵、深度正则化损失等的多组件联合损失函数，强化对类别不平衡、边界模糊等问题的处理能力。结果表明：该模型在自主构建的露天矿道路数据集上表现优异，平均交并比（mIoU）达87.6%，显著优于FCN、U-Net等传统模型；深度方差（DV）仅为0.052，有效保证道路区域深度值均匀性。消融试验验证了各核心组件的必要性和有效性，视频序列测试显示模型在复杂路况、极端天气下具有强鲁棒性。本研究为露天矿无人驾驶系统提供了可靠的环境感知方案，对智能矿山建设具有重要意义。

Abstract: In the complex dynamic environment of open-pit mines, the accurate segmentation of road-drivable areas is a key bottleneck in the promotion of unmanned driving technology. Traditional visual perception methods are difficult to cope with challenges such as long-distance dependence modeling and low-contrast region discrimination due to local feature extraction limitations and single-modal information defects. To this end, a depth perception model for the drivable area of open-pit road based on airspace-depth cross-modal fusion is proposed. The model uses the Pyramid Visual Transformer ( PVTv2 ) as the spatial branch backbone and uses the linear attention mechanism to achieve efficient extraction of multi-scale semantic features, which solves the long-distance dependence problem and reduces the computational complexity of high-resolution images. The DepthAnything model is used to generate a robust depth map, and the CATANet token aggregation mechanism is introduced to mine the spatial geometric information in the depth features to assist in distinguishing regions with similar color and texture. A multi-level adaptive fusion module is designed to realize the fine interaction between semantic features and geometric features through channel-level and pixel-level attention dynamic weighting and cross-scale context propagation. A multi-component joint loss function including weighted cross entropy, depth regularization loss, and so on is constructed to strengthen the ability to deal with problems such as category imbalance and boundary ambiguity. The results show that the model performs well on the self-built open-pit mine road data set, and the average intersection ratio ( mIoU ) reaches 87.6%, which is significantly better than traditional models such as FCN and U-Net. The depth variance ( DV ) is only 0.052, which effectively ensures the uniformity of the depth value in the road area. The ablation experiment verifies the necessity and effectiveness of each core component. The video sequence test shows that the model has strong robustness under complex road conditions and extreme weather. A reliable environmental perception scheme for the unmanned system of open-pit mines is provided, which is of great significance to the construction of intelligent mines.

露天矿无人驾驶道路可行驶深度感知模型

Drivable depth perception model of unmanned roads in open-pit mines