Hyperparameter Sensitivity of Vanilla Knowledge Distillation for Compact CNNs on CIFAR-100

Mochamad Rizal Fauzan; Raden Muhammad Rafi Rachman; Shifa Rangga Saputra; Daffa Irsyad Nugraha

doi:10.47709/cnahpc.v8i2.8239

Authors

Mochamad Rizal Fauzan National Taipei University of Technology
Raden Muhammad Rafi Rachman Universitas Pendidikan Indonesia
Shifa Rangga Saputra Universitas Pendidikan Indonesia
Daffa Irsyad Nugraha Universitas Pendidikan Indonesia

DOI:

https://doi.org/10.47709/cnahpc.v8i2.8239

Keywords:

CIFAR-100, compact neural networks, knowledge distillation, loss balancing, temperature scaling

Abstract

Knowledge distillation has become an effective strategy for improving compact convolutional neural networks, yet the performance of vanilla knowledge distillation in lightweight image classification is still often reported using default hyperparameter settings without systematic justification. This study addresses the limited empirical understanding of how two core vanilla knowledge distillation hyperparameters, temperature scaling (T) and loss balancing (?), affect compact convolutional neural networks under a unified experimental setting. Using CIFAR-100 as the benchmark dataset, a ResNet-50 teacher was employed to distill knowledge into two lightweight student models, MobileNetV2 and ShuffleNetV2 ×1.0. Performance was evaluated using top-1 accuracy, top-5 accuracy, parameter count, and inference latency. The teacher achieved 81.24% top-1 accuracy and 96.05% top-5 accuracy. Under the default distillation setting, MobileNetV2 improved from 79.18% to 80.83% top-1 accuracy and from 95.77% to 96.40% top-5 accuracy, while reducing latency from 3.98 ms to 3.44 ms. ShuffleNetV2 ×1.0 improved from 77.00% to 78.36% top-1 accuracy and from 94.81% to 95.45% top-5 accuracy, with only a marginal latency increase from 4.23 ms to 4.29 ms. To examine hyperparameter sensitivity, an ablation study was conducted on MobileNetV2 with T = 2, 4, and 6, and ? = 0.3, 0.5, and 0.7. The best configuration was obtained at T = 4 and ? = 0.3, yielding 80.88% top-1 accuracy and 96.51% top-5 accuracy. These results show that vanilla knowledge distillation consistently improves compact convolutional neural networks, but its effectiveness depends strongly on careful hyperparameter selection rather than inherited default settings.

Downloads

Download data is not yet available.

Author Biography

Mochamad Rizal Fauzan, National Taipei University of Technology

Mochamad Rizal Fauzan received the B.Ed. degree in Electrical Engineering Education from Universitas Pendidikan Indonesia, Bandung, Indonesia, in 2025. He is currently pursuing the M.Sc.Eng. degree in Electrical Engineering and Computer Science at National Taipei University of Technology, Taipei, Taiwan. Since 2025, he has been a Graduate Research Assistant at the Ubiquitous Computing Laboratory, Department of Electrical Engineering, National Taipei University of Technology. His research interests include artificial intelligence of things (AIoT), computer vision, machine learning, deep learning, edge AI, and intelligent monitoring systems. He has been actively involved in research and development projects related to object detection, environmental monitoring, smart automation, and technology-enhanced engineering education. He has also contributed to scientific publications, international academic presentations, and innovation-oriented engineering projects.

References

Begum, M., Hasan Shuvo, M., Kamal Nasir, M., Hossain, A., Jakir Hossain, M., Ashraf, I., Uddin, J., & Samad, M. A. (2024). LCNN: Lightweight CNN Architecture for Software Defect Feature Identification Using Explainable AI. IEEE Access, 12(April), 55744–55756. https://doi.org/10.1109/ACCESS.2024.3388489

Chen, C., Mat Isa, N. A., & Liu, X. (2025). A review of convolutional neural network based methods for medical image classification. Computers in Biology and Medicine, 185, 109507. https://doi.org/10.1016/j.compbiomed.2024.109507

Chen, S. L., Chen, T. Y., Mao, Y. C., Lin, S. Y., Huang, Y. Y., Chen, C. A., Lin, Y. J., Chuang, M. H., & Abu, P. A. R. (2023). Detection of Various Dental Conditions on Dental Panoramic Radiography Using Faster R-CNN. IEEE Access, 11(November), 127388–127401. https://doi.org/10.1109/ACCESS.2023.3332269

Fauzan, M. R., Pramudita, R., Rizqulloh, M. A., & Sartika, N. (2025). Integrated Energy Monitoring and Control System with Tri-Node ESP32 Architecture. Proceedings of 2025 11th International Conference on Wireless and Telematics, ICWT 2025, 1–6. https://doi.org/10.1109/ICWT66752.2025.11181758

Krizhevsky, A., & Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

Liu, L., Wang, Y., Peng, J., & Zhang, L. (2024). GLR-CNN: CNN-Based Framework With Global Latent Relationship Embedding for High-Resolution Remote Sensing Image Scene Classification. IEEE Transactions on Geoscience and Remote Sensing, 62, 1–13. https://doi.org/10.1109/TGRS.2024.3434452

Liu, Y., Xue, J., Li, D., Zhang, W., Chiew, T. K., & Xu, Z. (2024). Image recognition based on lightweight convolutional neural network: Recent advances. Image and Vision Computing, 146, 105037. https://doi.org/10.1016/j.imavis.2024.105037

Ma, N., Sun, L., He, Y., Zhou, C., & Dong, C. (2023). CNN-TransNet: A Hybrid CNN-Transformer Network With Differential Feature Enhancement for Cloud Detection. IEEE Geoscience and Remote Sensing Letters, 20, 1–5. https://doi.org/10.1109/LGRS.2023.3288742

Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). Shufflenet V2: Practical guidelines for efficient cnn architecture design. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11218 LNCS, 122–138. https://doi.org/10.1007/978-3-030-01264-9_8

Mao, S., Li, H., Zhang, Y., & Shi, Y. (2024). Prediction of Ionospheric Electron Density Distribution Based on CNN-LSTM Model. IEEE Geoscience and Remote Sensing Letters, 21, 1–5. https://doi.org/10.1109/LGRS.2024.3437650

Rafidison, M. A., Ramafiarisona, H. M., Randriamitantsoa, P. A., Rafanantenana, S. H. J., Toky, F. M. R., Rakotondrazaka, L. P., & Rakotomihamina, A. H. (2023). Image Classification Based on Light Convolutional Neural Network Using Pulse Couple Neural Network. Computational Intelligence and Neuroscience, 2023(1), 7371907. https://doi.org/10.1155/2023/7371907

Rybczak, M., & Kozakiewicz, K. (2024). Deep Machine Learning of MobileNet, Efficient, and Inception Models. Algorithms 2024, Vol. 17, Page 96, 17(3), 96. https://doi.org/10.3390/a17030096

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474

Si, M., Wang, Y., Siljak, H., Seow, C., & Yang, H. (2023). A Lightweight CIR-Based CNN With MLP for NLOS/LOS Identification in a UWB Positioning System. IEEE Communications Letters, 27(5), 1332–1336. https://doi.org/10.1109/LCOMM.2023.3260953

Somantri, M., Fauzan, M. R., & Surya, I. (2025). Optimization of IoT-based monitoring system for automatic power factor correction using PZEM-004T sensor. Indonesian Journal of Electrical Engineering and Computer Science, 39(2), 860. https://doi.org/10.11591/ijeecs.v39.i2.pp860-873

Song, J., Liang, R., Yuan, B., & Hu, J. (2025). DiMO-CNN: Deep Learning Toolkit-Accelerated Analytical Modeling and Optimization of CNN Hardware and Dataflow. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 44(1), 251–265. https://doi.org/10.1109/TCAD.2024.3429419

Wang, J., Zhang, X., Gao, G., Lv, Y., Li, Q., Li, Z., Wang, C., & Chen, G. (2023). Open Pose Mask R-CNN Network for Individual Cattle Recognition. IEEE Access, 11(September), 113752–113768. https://doi.org/10.1109/ACCESS.2023.3321152

Wang, Y., Zhang, T., Zhao, L., Hu, L., Wang, Z., Niu, Z., Cheng, P., Chen, K., Zeng, X., Wang, Z., Wang, H., & Sun, X. (2024). RingMo-Lite: A Remote Sensing Lightweight Network With CNN-Transformer Hybrid Framework. IEEE Transactions on Geoscience and Remote Sensing, 62, 1–20. https://doi.org/10.1109/TGRS.2024.3360447

Zamanidoost, Y., Ould-Bachir, T., & Martel, S. (2025). OMS-CNN: Optimized Multi-Scale CNN for Lung Nodule Detection Based on Faster R-CNN. IEEE Journal of Biomedical and Health Informatics, 29(3), 2148–2160. https://doi.org/10.1109/JBHI.2024.3507360

Zheng, C., Hu, C., Chen, Y., & Li, J. (2023). A Self-Learning-Update CNN Model for Semantic Segmentation of Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 20, 1–5. https://doi.org/10.1109/LGRS.2023.3261402