Comparative Performance Analysis of YOLOv12 and RF-DETR in Face Detection

Authors

  • David Hendrawan STMIK Widya Cipta Dharma, Indonesia
  • Wahyuni STMIK Widya Cipta Dharma, Indonesia
  • Pitrasacha Adytia STMIK Widya Cipta Dharma, Indonesia
Pages Icon

DOI:

https://doi.org/10.63158/journalisi.v8i2.1561

Keywords:

YOLOv12, RF-DETR, WIDER FACE, Inference Latency, Edge Deployment

Abstract

Face detection in dense and occluded environments remains a significant challenge in computer vision. This study compares the CNN-based YOLOv12 and the Transformer-based RF-DETR to determine the optimal balance between accuracy and latency for resource-constrained edge computing. Using the WIDER FACE dataset and an NVIDIA T4 GPU, multiple model variants were evaluated. Due to GPU memory constraints during training of the RF-DETR Medium variant, a standardized batch size of 8 was implemented across all models. To ensure methodological rigor, quantitative metrics (precision, recall, F1-score, mAP) were strictly assessed on the validation set. Concurrently, a 100-image subset of the test set was used exclusively for inference efficiency benchmarking, completely separate from detection evaluation. Results indicate YOLOv12X achieved superior overall detection performance (F1-score: 0.764, mAP@50:95: 0.440), significantly outperforming RF-DETR Medium. For real-time applications, YOLOv12M demonstrated the highest efficiency (36.17 FPS vs. 23.32 FPS). Qualitatively, YOLOv12 maintained high sensitivity in crowded scenes, whereas RF-DETR provided stable small-scale face detection despite its lower recall. Overall, under these constrained-hardware conditions, YOLOv12 appears to be a highly viable solution for surveillance systems, while RF-DETR offers a stable alternative for small-object detection when computational overhead and training budgets are less restrictive.

Downloads

Download data is not yet available.

References

[1] A. S. Sanchez-Moreno, J. Olivares-Mercado, A. Hernandez-Suarez, K. Toscano-Medina, G. Sanchez-Perez, and G. Benitez-Garcia, “Efficient face recognition system for operating in unconstrained environments,” J. Imaging, vol. 7, no. 9, p. 161, Sep. 2021, doi: 10.3390/jimaging7090161.

[2] Z. Yu, H. Huang, W. Chen, Y. Su, Y. Liu, and X. Wang, “YOLO-FaceV2: A scale and occlusion aware face detector,” Pattern Recognit., vol. 155, p. 110714, Nov. 2024, doi: 10.1016/j.patcog.2024.110714.

[3] B. Balachander, B. Sarveswari, S. N. S, C. S. Sowmiya, and R. Rajeshwari, “Performance analysis of lightweight YOLOv12 framework for object detection using real time Web camera based inputs,” in 2025 International Conference on Recent Innovation in Science Engineering and Technology (ICRISET), Aug. 2025, pp. 1–6. doi: 10.1109/ICRISET64803.2025.11252243.

[4] D. Mamieva, A. B. Abdusalomov, M. Mukhiddinov, and T. K. Whangbo, “Improved face detection method via learning small faces on hard images based on a deep learning approach,” Sensors, vol. 23, no. 1, p. 502, Jan. 2023, doi: 10.3390/s23010502.

[5] H. Du, H. Shi, D. Zeng, X.-P. Zhang, and T. Mei, “The elements of end-to-end deep face recognition: A survey of recent advances,” ACM Comput Surv, vol. 54, no. 10s, p. 212, Sep. 2022, doi: 10.1145/3507902.

[6] Q. Xu, Z. Zhu, H. Ge, Z. Zhang, and X. Zang, “Effective face detector based on YOLOv5 and superresolution reconstruction,” Comput. Math. Methods Med., vol. 2021, no. 1, p. 7748350, 2021, doi: 10.1155/2021/7748350.

[7] M. E. Atik and M. Arkali, “Benchmarking YOLO and Transformer-based detectors for olive tree crown identification in UAV imagery,” Geomatics, vol. 6, no. 2, p. 22, Feb. 2026, doi: 10.3390/geomatics6020022.

[8] N. Dahiya et al., “Optimised RFO tuned RF-DETR model for precision urine microscopy for renal and systemic disease diagnosis,” Sci. Rep., vol. 15, no. 1, p. 25842, Jul. 2025, doi: 10.1038/s41598-025-11725-0.

[9] M. L. Ali and Z. Zhang, “The YOLO framework: A comprehensive review of evolution, applications, and benchmarks in object detection,” Computers, vol. 13, no. 12, p. 336, Dec. 2024, doi: 10.3390/computers13120336.

[10] R. Sapkota, Z. Meng, M. Churuvija, X. Du, Z. Ma, and M. Karkee, “Comprehensive performance evaluation of YOLOv12, YOLO11, YOLOv10, YOLOv9 and YOLOv8 on detecting and counting fruitlet in complex orchard environments,” Agric. Commun., vol. 4, no. 1, p. 100125, Mar. 2026, doi: 10.1016/j.agrcom.2026.100125.

[11] Y. Sun, Z. Sun, and W. Chen, “The evolution of object detection methods,” Eng. Appl. Artif. Intell., vol. 133, p. 108458, Jul. 2024, doi: 10.1016/j.engappai.2024.108458.

[12] K. Huang, M. Wen, C. Wang, and L. Ling, “T-SSD: A transformer-based single-stage multi-scale sampling object detector,” in Proceedings of 2023 the 13th International Workshop on Computer Science and Engineering, WCSE, 2023. doi: 10.18178/wcse.2023.06.022.

[13] M. Chaman, A. E. Maliki, H. Dahou, and A. Hadjoudja, “Benchmarking YOLO-based deep learning models for real-time object detection in hybrid ADAS and intelligent transportation systems,” Results Eng., vol. 29, p. 108942, Mar. 2026, doi: 10.1016/j.rineng.2025.108942.

[14] J. M. Villarroel, L. C. De Jesus, J. Ancheta, H. Villaruel, and L. Samaniego, “Comparative analysis of YOLOv12 and RF-DETR models for coffee leaf disease detection using Roboflow,” in 2025 IEEE 14th Global Conference on Consumer Electronics (GCCE), Sep. 2025, pp. 1493–1494. doi: 10.1109/GCCE65946.2025.11274815.

[15] M. S. Aqilla, M. G. Abdurahman, A. B. Hendry, M. R. Arjasubrata, and M. D. Sulistiyo, “A comparative analysis of YOLOv12 and RFDETR for weapon detection in CCTV footage,” in 2025 5th International Conference of Science and Information Technology in Smart Administration (ICSINTESA), Nov. 2025, pp. 35–40. doi: 10.1109/ICSINTESA68165.2025.11413754.

[16] S. Zanniko, J. Cahyo, A. A. S. Gunawan, and R. C. Pradana, “Comparative analysis of RF-DETR and YOLOv12 in breast cancer detection and classification,” in 2025 International Conference on Information Management and Technology (ICIMTech), Aug. 2025, pp. 246–251. doi: 10.1109/ICIMTech67074.2025.11265111.

[17] T. Sabrina, I. Damayanti, I. Mahardika, M. R. Arjasubrata, and M. D. Sulistiyo, “Comparative analysis of CNN and transformer models for cigarette and e-cigarette detection,” in 2025 8th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Dec. 2025, pp. 345–350. doi: 10.1109/ISRITI68345.2025.11393399.

[18] C. L. Buana, G. F. Shidik, M. N. Ubaidillah, Y. R. Grafer, F. A. Kristiyan, and E. J. Kusuma, “Comparison of YOLO and RF-DETR performance in detecting personal protective equipment in construction environments using tokenization,” in 2025 International Seminar on Application for Technology of Information and Communication (iSemantic), Sep. 2025, pp. 151–157. doi: 10.1109/ISemantic67418.2025.11292414.

[19] U. Aymon, N. S. Kamarudin, and A. F. Ab. Nasir, “Facial expression recognition with YOLOv11 and YOLOv12: A comparative study,” in 2025 IEEE 9th International Conference on Software Engineering & Computer Systems (ICSECS), Oct. 2025, pp. 18–23. doi: 10.1109/ICSECS65227.2025.11279248.

[20] Q. Tang, Y. Li, Y. Cai, X. Peng, and X. Liu, “Face detection based on DF-Net,” Electronics, vol. 12, no. 19, p. 4021, Sep. 2023, doi: 10.3390/electronics12194021.

[21] M. G. Ragab et al., “A comprehensive systematic review of YOLO for medical object detection (2018 to 2023),” IEEE Access, vol. 12, pp. 57815–57836, 2024, doi: 10.1109/ACCESS.2024.3386826.

[22] Y. Ji et al., “Transmission line defect detection algorithm based on improved YOLOv12,” Electronics, vol. 14, no. 12, p. 2432, Jun. 2025, doi: 10.3390/electronics14122432.

[23] L. T. Ramos and A. D. Sappa, “A comprehensive analysis of YOLO architectures for tomato leaf disease identification,” Sci. Rep., vol. 15, no. 1, p. 26890, Jul. 2025, doi: 10.1038/s41598-025-11064-0.

[24] N. Ghosh and G. Mandal, “Classification of canine dermatological diseases using YOLOv12 with R-ELAN,” in 2025 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Nov. 2025, pp. 542–546. doi: 10.1109/ICCIKE67021.2025.11318252.

[25] A. S. Silva, F. H. A. Moraes Neto, P. H. Ferreira, and D. B. Costa, “CNN-based YOLOv12 for damage assessment in residential roofs,” IEEE Access, vol. 13, pp. 193311–193322, 2025, doi: 10.1109/ACCESS.2025.3629630.

[26] N. Deluxni, P. Sudhakaran, M. Alsafyani, and A. Yousef, “Underwater debris segmentation using improved YOLOv12s with recursive efficient layer aggregation and FlashAttention for autonomous underwater vehicle,” IEEE Access, vol. 13, pp. 200239–200252, 2025, doi: 10.1109/ACCESS.2025.3636283.

[27] A. A. Murat and M. S. Kiran, “A comprehensive review on YOLO versions for object detection,” Eng. Sci. Technol. Int. J., vol. 70, p. 102161, Oct. 2025, doi: 10.1016/j.jestch.2025.102161.

[28] J. Terven, D.-M. Córdova-Esparza, and J.-A. Romero-González, “A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS,” Mach. Learn. Knowl. Extr., vol. 5, no. 4, pp. 1680–1716, Nov. 2023, doi: 10.3390/make5040083.

Downloads

Published

2026-04-26

Issue

Section

Articles

How to Cite

[1]
D. Hendrawan, Wahyuni, and P. Adytia, “Comparative Performance Analysis of YOLOv12 and RF-DETR in Face Detection”, journalisi, vol. 8, no. 2, pp. 2414–2440, Apr. 2026, doi: 10.63158/journalisi.v8i2.1561.