Performance Comparison of YOLOv10, YOLOv11, and YOLOv12 Models on Human Detection Datasets

Authors

  • Viky Hendriko Multi Data Palembang University, Indonesia
  • Dedy Hermanto Multi Data Palembang University, Indonesia

DOI:

https://doi.org/10.47709/brilliance.v5i1.6447

Keywords:

Dataset, Detection, Model, Performance, You Only Look Once

Abstract

One popular of object detection model for object detection is You Only Look Once (YOLO) with humans are among the most often utilized for detection objects. Despite the various of human datasets, just a few research that compared the datasets performance against various versions of the YOLO algorithm. This study compares the performance of YOLOv10, YOLOv11, and YOLOv12 on eight different datasets, such as CrowdHuman, CityPersons, Wider Person, Mall Dataset, INRIA, Microsoft Common Object (MS COCO), PASCAL VOC, and MOT17. Precision, recall, mAP@50, and mAP@50-95 are used to measure the YOLO model version's performance on each dataset. The results indicate that each datasets have different perfomance on each version of YOLO, so the performance on model depends on the variation of the dataset. The best results on the MOT17 dataset are obtained by YOLOv12, with 0.909 in precision, 0.775 in recall, 0.88 in mAP@50, and 0.695 in mAP@50-95. On the City Person dataset. However, YOLOv11 performs best result, with 0.782 in precision, 0.529 in recall, 0.694 in mAP@50, and 0.476 in mAP@50-95. Therefore, choosing a YOLO version that is appropriate for the dataset's complexity is essential to creating the best detection model Therefore, selecting the appropriate YOLO version according to the dataset complexity is crucial to obtain the most optimal detection model.

References

Alif, Mujadded Al Rabbani, and Muhammad Hussain. 2025. “YOLOv12: A Breakdown of the Key Architectural Features.” ArXivLabs.

Dalal, N., and B. Triggs. 2005. “Histograms of Oriented Gradients for Human Detection.” Pp. 886–93 vol. 1 in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). Vol. 1.

Everingham, M., L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2007. “The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.” Retrieved March 16, 2025 (http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html).

Everingham, M., L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2012. “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.” Retrieved March 16, 2025 (http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html).

Everingham, Mark, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. “The Pascal Visual Object Classes (VOC) Challenge.” International Journal of Computer Vision 88(2):303–38. doi: 10.1007/s11263-009-0275-4.

Fang, Wei, Lin Wang, and Peiming Ren. 2019. “A Novel Squeeze YOLO-Based Real-Time People Counting Approach.” International Journal of Bio-Inspired Computation 14(4):1. doi: 10.1504/ijbic.2019.10024002.

Jiao, Licheng, Fan Zhang, Fang Liu, Shuyuan Yang, Lingling Li, Zhixi Feng, and Rong Qu. 2020. “A Survey of Deep Learning-Based Object Detection.” IEEE Access 7(3):128837–68. doi: 10.1109/ACCESS.2019.2939201.

Jocher, Glenn, Jing Qiu, and Ayush Chaurasia. 2023. “Ultralytics YOLO.”

Kaur, Jaskirat, and Williamjeet Singh. 2022. “Tools, Techniques, Datasets and Application Areas for Object Detection in an Image: A Review.” Multimedia Tools and Applications 81(27):38297–351. doi: 10.1007/s11042-022-13153-y.

Lin, Tsung Yi, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. “Microsoft COCO: Common Objects in Context.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8693 LNCS(PART 5):740–55. doi: 10.1007/978-3-319-10602-1_48.

Lou, Haitong, Xuehu Duan, Junmei Guo, Haiying Liu, Jason Gu, Lingyun Bi, and Haonan Chen. 2023. “DC-YOLOv8: Small-Size Object Detection Algorithm Based on Camera Sensor.” Electronics (Switzerland) 12(10):1–14. doi: 10.3390/electronics12102323.

Loy, Chen Change, Shaogang Gong, and Tao Xiang. 2013. “From Semi-Supervised to Transfer Counting of Crowds.” Proceedings of the IEEE International Conference on Computer Vision 2256–63. doi: 10.1109/ICCV.2013.270.

Milan, Anton, Laura Leal-Taixe, Ian Reid, Stefan Roth, and Konrad Schindler. 2016. “MOT16: A Benchmark for Multi-Object Tracking.” ArXivLabs 1–12.

Pahlevi, Said Mirza. 2024. Kecerdasan Buatan Dengan Deep Computer Vision. Jakarta: PT Elex Media Komputindo.

Paszke, Adam, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. “PyTorch: An Imperative Style, High-Performance Deep Learning Library.” Advances in Neural Information Processing Systems 32(NeurIPS).

Qiu, Xiaoyang, Yajun Chen, Wenhao Cai, Meiqi Niu, and Jianying Li. 2024. “LD-YOLOv10: A Lightweight Target Detection Algorithm for Drone Scenarios Based on YOLOv10.” Electronics (Switzerland) 13(16). doi: 10.3390/electronics13163269.

Sanchez, S. A., H. J. Romero, and A. D. Morales. 2020. “A Review: Comparison of Performance Metrics of Pretrained Models for Object Detection Using the TensorFlow Framework.” IOP Conference Series: Materials Science and Engineering 844(1). doi: 10.1088/1757-899X/844/1/012024.

Sapkota, Ranjan, Zhichao Meng, Martin Churuvija, Xiaoqiang Du, Zenghong Ma, and Manoj Karkee. 2024. “Comprehensive Performance Evaluation of YOLOv10, YOLOv9 and YOLOv8 on Detecting and Counting Fruitlet in Complex Orchard Environments.” (March).

Shao, Shuai, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian Sun. 2018. “CrowdHuman: A Benchmark for Detecting Human in a Crowd.” Artikel Ilmiah 1–9.

Srivastava, Shrey, Amit Vishvas Divekar, Chandu Anilkumar, Ishika Naik, Ved Kulkarni, and V. Pattabiraman. 2021. “Comparative Analysis of Deep Learning Image Detection Algorithms.” Journal of Big Data 8(1). doi: 10.1186/s40537-021-00434-w.

Surbakti, Agung Wibowo Ardiyanta, and Rahmi Eka Putri. 2022. “Penghitung Pengunjung Dan Deteksi Masker Menggunakan OpenCV Dan YOLO.” Chipset 3(02):83–93. doi: 10.25077/chipset.3.02.83-93.2022.

Tian, Yunjie, Qixiang Ye, and David Doermann. 2025. “YOLOv12: Attention-Centric Real-Time Object Detectors.” ArXivLabs.

Wang, Ao, Hui Chen, Lihao Liu, Kai Chen, and C. V May. 2024. “YOLOv10?: Real-Time End-to-End Object Detection.” ArXivLabs 1–18.

Zhang, Shanshan, Rodrigo Benenson, and Bernt Schiele. 2017. “CityPersons: A Diverse Dataset for Pedestrian Detection.” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 2017-Janua:4457–65. doi: 10.1109/CVPR.2017.474.

Zhang, Shifeng, Yiliang Xie, Jun Wan, Hansheng Xia, Stan Z. Li, and Guodong Guo. 2020. “WiderPerson: A Diverse Dataset for Dense Pedestrian Detection in the Wild.” IEEE Transactions on Multimedia 22(2):380–93. doi: 10.1109/TMM.2019.2929005.

Zhao, Chenjie, Ryan Wen Liu, Jingxiang Qu, and Ruobin Gao. 2024. “Deep Learning-Based Object Detection in Maritime Unmanned Aerial Vehicle Imagery: Review and Experimental Comparisons.” Engineering Applications of Artificial Intelligence 128:1–32. doi: 10.1016/j.engappai.2023.107513.

Downloads

Published

2025-07-21

How to Cite

Hendriko, V., & Hermanto, D. (2025). Performance Comparison of YOLOv10, YOLOv11, and YOLOv12 Models on Human Detection Datasets. Brilliance: Research of Artificial Intelligence, 5(1), 440–450. https://doi.org/10.47709/brilliance.v5i1.6447

Similar Articles

<< < 13 14 15 16 17 18 19 20 21 22 > >> 

You may also start an advanced similarity search for this article.