D-GPH: Dynamic Graph-Prior Hybrid Detector with Uncertainty-Aware Refinement

Authors

  • Yuxi Han

DOI:

https://doi.org/10.54691/kc64wt10

Keywords:

Object Detection; Mask R-CNN; OverLoCK; Dynamic Prior Router; Manhattan-constrained Graph Attention; Heterogeneous Fusion; Varifocal Loss; Uncertainty-Aware Refinement.

Abstract

As a classic paradigm of two-stage target detection, Mask R-CNN achieves robust instance awareness through regional proposal and feature alignment. In its structure, the backbone network is responsible for extracting multi-scale features from the image layer by layer, and usually uses convolutional networks such as RESNET, which is difficult to use the global context prior to suppress the background noise in the degraded environment; Neck network (neck) relies on fixed multi-scale fusion scheme, lacks the ability of adaptive expression for different image contents, and is limited by the local receptive field of convolution operator, so its global modeling is easily disturbed by uncorrelated long-range targets; The detection head carries out category determination and boundary box regression on the candidate region (ROI), and usually uses the same set of forward and loss for difficult samples and easy samples. It is not enough to distinguish the fuzzy target with low confidence and small area, which is prone to missed detection and false detection. To solve these problems, this paper proposes a dynamic graph prior hybrid detection framework and named it D-GPH. Based on the global context prior generated by the backbone, the framework innovatively introduces the dynamic prior router (DPR), which generates adaptive prior injection strength for each layer according to the current image content, and realizes dynamic modulation by layer and graph before FPN fusion. In the neck design, this paper constructs Manhattan constraint graph attention module (MC-GAT), which explicitly suppresses non local noise by introducing spatial distance penalty, and cooperatively extracts global topology and local texture details with heterogeneous fusion layer. In the detection stage, this paper designed the uncertain perceptual refining head (UAR head), established the quality evaluation mechanism by using the exponential moving average (EMA) of varifocal loss and IOU, and started the secondary refining prediction based on channel recalibration for difficult samples, so as to significantly improve the positioning accuracy of complex targets. The experimental results show that the performance of D-GPH on MS coco dataset is better than Mask R-CNN benchmark, especially in the detection of fuzzy targets and difficult samples.

Downloads

Download data is not yet available.

References

[1] Feng D, Harakeh A, Waslander S L, et al. A review and comparative study on probabilistic object detection in autonomous driving[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(8): 9961-9980.

[2] Yao H, Liu Y, Li X, et al. A detection method for pavement cracks combining object detection and attention mechanism[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 22179-22189.

[3] Waithe D, Brown J M, Reglinski K, et al. Object detection networks and augmented reality for cellular detection in fluorescence microscopy[J]. Journal of Cell Biology, 2020, 219(10): e201903166.

[4] Purwono P, Ma'arif A, Rahmaniar W, et al. Understanding of convolutional neural network (cnn): A review[J]. International Journal of Robotics and Control Systems, 2022, 2(4): 739-748.

[5] Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022.

[6] Ren J, Chen X, Liu J, et al. Accurate single stage detector using recurrent rolling convolution [C] //Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 5420-5428.

[7] Liao G, Gao W, Jiang Q, et al. Mmnet: Multi-stage and multi-scale fusion network for rgb-d salient object detection[C]//Proceedings of the 28th ACM international conference on multimedia. 2020: 2436-2444.

[8] Ouyang W, Luo P, Zeng X, et al. Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection[J]. arXiv preprint arXiv:1409.3505, 2014.

[9] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.

[10] Koonce B. ResNet 50[M]//Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization. Berkeley, CA: Apress, 2021: 63-72.

[11] Lou M, Yu Y. Overlock: An overview-first-look-closely-next convnet with context-mixing dynamic kernels[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2025: 128-138.

[12] Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network[J]. Physica d: Nonlinear phenomena, 2020, 404: 132306.

[13] Scarselli F, Gori M, Tsoi A C, et al. The graph neural network model[J]. IEEE transactions on neural networks, 2008, 20(1): 61-80.

[14] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

[15] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.

[16] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.

[17] Chen Y, Liu S, Shen X, et al. Fast point r-cnn[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 9775-9784.

[18] Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6154-6162.

[19] Cheng T, Wang X, Huang L, et al. Boundary-preserving mask r-cnn[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 660-676.

[20] Wu M, Yue H, Wang J, et al. Object detection based on RGC mask R-CNN[J]. IET Image Processing, 2020, 14(8): 1502-1508.

[21] Lin K, Zhao H, Lv J, et al. Face Detection and Segmentation Based on Improved Mask R-CNN[J]. Discrete dynamics in nature and society, 2020, 2020(1): 9242917.

[22] Han Y, et al. OverLoCK-GPH: A Bio-Inspired Object Detector with Graph-Prior Modulation and Hybrid Instance Refinement[J]. (Placeholder for your original OverLoCK-GPH paper).

[23] Han Y, Huang G, Song S, et al. Dynamic neural networks: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 44(11): 7436-7456.

[24] Zhang H, Wang Y, Dayoub F, et al. Varifocalnet: An iou-aware dense object detector [C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 8514-8523.

[25] Pan G, Wang X, Li Z, et al. An underwater biological target detection algorithm based on improved RT-DETR[J]. Fishery Modernization, 2025, 52(5): 107-116.

Downloads

Published

2026-04-20

Issue

Section

Articles

How to Cite

Han, Y. (2026). D-GPH: Dynamic Graph-Prior Hybrid Detector with Uncertainty-Aware Refinement. Scientific Journal of Technology, 8(4), 73-87. https://doi.org/10.54691/kc64wt10