Abstract:
In the complex dynamic environment of open-pit mines, the accurate segmentation of road-drivable areas is a key bottleneck in the promotion of unmanned driving technology. Traditional visual perception methods are difficult to cope with challenges such as long-distance dependence modeling and low-contrast region discrimination due to local feature extraction limitations and single-modal information defects. To this end, a depth perception model for the drivable area of open-pit road based on airspace-depth cross-modal fusion is proposed. The model uses the Pyramid Visual Transformer ( PVTv2 ) as the spatial branch backbone and uses the linear attention mechanism to achieve efficient extraction of multi-scale semantic features, which solves the long-distance dependence problem and reduces the computational complexity of high-resolution images. The DepthAnything model is used to generate a robust depth map, and the CATANet token aggregation mechanism is introduced to mine the spatial geometric information in the depth features to assist in distinguishing regions with similar color and texture. A multi-level adaptive fusion module is designed to realize the fine interaction between semantic features and geometric features through channel-level and pixel-level attention dynamic weighting and cross-scale context propagation. A multi-component joint loss function including weighted cross entropy, depth regularization loss, and so on is constructed to strengthen the ability to deal with problems such as category imbalance and boundary ambiguity. The results show that the model performs well on the self-built open-pit mine road data set, and the average intersection ratio ( mIoU ) reaches 87.6%, which is significantly better than traditional models such as FCN and U-Net. The depth variance ( DV ) is only 0.052, which effectively ensures the uniformity of the depth value in the road area. The ablation experiment verifies the necessity and effectiveness of each core component. The video sequence test shows that the model has strong robustness under complex road conditions and extreme weather. A reliable environmental perception scheme for the unmanned system of open-pit mines is provided, which is of great significance to the construction of intelligent mines.