Analysis of Pedestrian Semantic Segmentation Technology in Autonomous Driving Scenarios under Occlusion Conditions

Yingxin He

doi:10.54691/c3jh3t05

Authors

Yingxin He

DOI:

https://doi.org/10.54691/c3jh3t05

Keywords:

Semantic Segmentation; Occlusion-Aware Modeling; Instance-Level Reasoning; Boundary Refinement; Temporal Feature Alignment.

Abstract

Semantic segmentation has become a cornerstone of visual scene understanding, particularly in safety-critical domains such as autonomous driving, robotics, and urban surveillance. Recent advances in convolutional and Transformer-based deep learning models have yielded strong performance on standard benchmarks under ideal conditions. However, traditional semantic segmentation methods struggle in real-world scenes characterized by occlusions, overlaps, and partial visibility, which often result in prediction failures and poor generalization. In response to these limitations, occlusion-aware segmentation has emerged as a new paradigm, incorporating visibility modeling, occlusion reasoning, structural restoration, and temporal completion strategies. This paper presents a comprehensive survey of both traditional and occlusion-aware semantic segmentation approaches, with a structured analysis of their evolution, strengths, and limitations. The review highlights the growing importance of integrating structural and contextual reasoning to improve robustness in occluded environments and identifies future directions for developing scalable, accurate, and occlusion-resilient segmentation systems.

Downloads

Download data is not yet available.

References

[1] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).

[2] Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587..

[3] Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801-818).

[4] Zhang, S., Yang, J., & Schiele, B. (2018). Occluded pedestrian detection through guided attention in CNNs. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6995-7003).

[5] Leibe, B., Seemann, E., & Schiele, B. (2005, June). Pedestrian detection in crowded scenes. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05) (Vol. 1, pp. 878-885). IEEE.

[6] Zhang, Y., Qiu, Z., Yao, T., Liu, D., & Mei, T. (2018). Fully convolutional adaptation networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6810-6818).

[7] Du, S., Du, S., Liu, B., & Zhang, X. (2021). Incorporating DeepLabv3+ and object-based image analysis for semantic segmentation of very high resolution remote sensing images. International Journal of Digital Earth, 14(3), 357-378.

[8] He, H., Yang, D., Wang, S., Wang, S., & Li, Y. (2019). Road extraction by using atrous spatial pyramid pooling integrated encoder-decoder network and structural similarity loss. Remote Sensing, 11(9), 1015.

[9] Lian, X., Pang, Y., Han, J., & Pan, J. (2021). Cascaded hierarchical atrous spatial pyramid pooling module for semantic segmentation. Pattern Recognition, 110, 107622.

[10] Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2881-2890).

[11] Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., & Huang, H. (2018). Multi-scale context intertwining for semantic segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 603-619).

[12] Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N., & Liang, J. (2019). Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging, 39(6), 1856-1867.

[13] Yuan, Y., Xie, J., Chen, X., & Wang, J. (2020, August). Segfix: Model-agnostic boundary refinement for segmentation. In European conference on computer vision (pp. 489-506). Cham: Springer International Publishing.

[14] Yuan, Y., Chen, X., & Wang, J. (2020, August). Object-contextual representations for semantic segmentation. In European conference on computer vision (pp. 173-190). Cham: Springer International Publishing.1.

[15] Wang, L., Li, R., Duan, C., Zhang, C., Meng, X., & Fang, S. (2022). A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images. IEEE Geoscience and Remote Sensing Letters, 19, 1-5.

[16] Huang, L., Yuan, Y., Guo, J., Zhang, C., Chen, X., & Wang, J. (2019). Interlaced sparse self-attention for semantic segmentation. arXiv preprint arXiv:1907.12273.

[17] Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Occlusion-aware R-CNN: Detecting pedestrians in a crowd. In Proceedings of the European conference on computer vision (ECCV) (pp. 637-653).

[18] Lazarow, J., Lee, K., Shi, K., & Tu, Z. (2020). Learning instance occlusion for panoptic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10720-10729).

[19] Ke, L., Tai, Y. W., & Tang, C. K. (2021). Deep occlusion-aware instance segmentation with overlapping bilayers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4019-4028).

[20] Gao, T., Packer, B., & Koller, D. (2011, June). A segmentation-aware object detection model with occlusion handling. In CVPR 2011 (pp. 1361-1368). IEEE.