Hyperparameter Sensitivity of Vanilla Knowledge Distillation for Compact CNNs on CIFAR-100
DOI:
https://doi.org/10.47709/cnahpc.v8i2.8239Keywords:
CIFAR-100, compact neural networks, knowledge distillation, loss balancing, temperature scalingAbstract
Knowledge distillation has become an effective strategy for improving compact convolutional neural networks, yet the performance of vanilla knowledge distillation in lightweight image classification is still often reported using default hyperparameter settings without systematic justification. This study addresses the limited empirical understanding of how two core vanilla knowledge distillation hyperparameters, temperature scaling (T) and loss balancing (?), affect compact convolutional neural networks under a unified experimental setting. Using CIFAR-100 as the benchmark dataset, a ResNet-50 teacher was employed to distill knowledge into two lightweight student models, MobileNetV2 and ShuffleNetV2 ×1.0. Performance was evaluated using top-1 accuracy, top-5 accuracy, parameter count, and inference latency. The teacher achieved 81.24% top-1 accuracy and 96.05% top-5 accuracy. Under the default distillation setting, MobileNetV2 improved from 79.18% to 80.83% top-1 accuracy and from 95.77% to 96.40% top-5 accuracy, while reducing latency from 3.98 ms to 3.44 ms. ShuffleNetV2 ×1.0 improved from 77.00% to 78.36% top-1 accuracy and from 94.81% to 95.45% top-5 accuracy, with only a marginal latency increase from 4.23 ms to 4.29 ms. To examine hyperparameter sensitivity, an ablation study was conducted on MobileNetV2 with T = 2, 4, and 6, and ? = 0.3, 0.5, and 0.7. The best configuration was obtained at T = 4 and ? = 0.3, yielding 80.88% top-1 accuracy and 96.51% top-5 accuracy. These results show that vanilla knowledge distillation consistently improves compact convolutional neural networks, but its effectiveness depends strongly on careful hyperparameter selection rather than inherited default settings.
Downloads
References
Begum, M., Hasan Shuvo, M., Kamal Nasir, M., Hossain, A., Jakir Hossain, M., Ashraf, I., Uddin, J., & Samad, M. A. (2024). LCNN: Lightweight CNN Architecture for Software Defect Feature Identification Using Explainable AI. IEEE Access, 12(April), 55744–55756. https://doi.org/10.1109/ACCESS.2024.3388489
Chen, C., Mat Isa, N. A., & Liu, X. (2025). A review of convolutional neural network based methods for medical image classification. Computers in Biology and Medicine, 185, 109507. https://doi.org/10.1016/j.compbiomed.2024.109507
Chen, S. L., Chen, T. Y., Mao, Y. C., Lin, S. Y., Huang, Y. Y., Chen, C. A., Lin, Y. J., Chuang, M. H., & Abu, P. A. R. (2023). Detection of Various Dental Conditions on Dental Panoramic Radiography Using Faster R-CNN. IEEE Access, 11(November), 127388–127401. https://doi.org/10.1109/ACCESS.2023.3332269
Fauzan, M. R., Pramudita, R., Rizqulloh, M. A., & Sartika, N. (2025). Integrated Energy Monitoring and Control System with Tri-Node ESP32 Architecture. Proceedings of 2025 11th International Conference on Wireless and Telematics, ICWT 2025, 1–6. https://doi.org/10.1109/ICWT66752.2025.11181758
Krizhevsky, A., & Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Liu, L., Wang, Y., Peng, J., & Zhang, L. (2024). GLR-CNN: CNN-Based Framework With Global Latent Relationship Embedding for High-Resolution Remote Sensing Image Scene Classification. IEEE Transactions on Geoscience and Remote Sensing, 62, 1–13. https://doi.org/10.1109/TGRS.2024.3434452
Liu, Y., Xue, J., Li, D., Zhang, W., Chiew, T. K., & Xu, Z. (2024). Image recognition based on lightweight convolutional neural network: Recent advances. Image and Vision Computing, 146, 105037. https://doi.org/10.1016/j.imavis.2024.105037
Ma, N., Sun, L., He, Y., Zhou, C., & Dong, C. (2023). CNN-TransNet: A Hybrid CNN-Transformer Network With Differential Feature Enhancement for Cloud Detection. IEEE Geoscience and Remote Sensing Letters, 20, 1–5. https://doi.org/10.1109/LGRS.2023.3288742
Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). Shufflenet V2: Practical guidelines for efficient cnn architecture design. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11218 LNCS, 122–138. https://doi.org/10.1007/978-3-030-01264-9_8
Mao, S., Li, H., Zhang, Y., & Shi, Y. (2024). Prediction of Ionospheric Electron Density Distribution Based on CNN-LSTM Model. IEEE Geoscience and Remote Sensing Letters, 21, 1–5. https://doi.org/10.1109/LGRS.2024.3437650
Rafidison, M. A., Ramafiarisona, H. M., Randriamitantsoa, P. A., Rafanantenana, S. H. J., Toky, F. M. R., Rakotondrazaka, L. P., & Rakotomihamina, A. H. (2023). Image Classification Based on Light Convolutional Neural Network Using Pulse Couple Neural Network. Computational Intelligence and Neuroscience, 2023(1), 7371907. https://doi.org/10.1155/2023/7371907
Rybczak, M., & Kozakiewicz, K. (2024). Deep Machine Learning of MobileNet, Efficient, and Inception Models. Algorithms 2024, Vol. 17, Page 96, 17(3), 96. https://doi.org/10.3390/a17030096
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
Si, M., Wang, Y., Siljak, H., Seow, C., & Yang, H. (2023). A Lightweight CIR-Based CNN With MLP for NLOS/LOS Identification in a UWB Positioning System. IEEE Communications Letters, 27(5), 1332–1336. https://doi.org/10.1109/LCOMM.2023.3260953
Somantri, M., Fauzan, M. R., & Surya, I. (2025). Optimization of IoT-based monitoring system for automatic power factor correction using PZEM-004T sensor. Indonesian Journal of Electrical Engineering and Computer Science, 39(2), 860. https://doi.org/10.11591/ijeecs.v39.i2.pp860-873
Song, J., Liang, R., Yuan, B., & Hu, J. (2025). DiMO-CNN: Deep Learning Toolkit-Accelerated Analytical Modeling and Optimization of CNN Hardware and Dataflow. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 44(1), 251–265. https://doi.org/10.1109/TCAD.2024.3429419
Wang, J., Zhang, X., Gao, G., Lv, Y., Li, Q., Li, Z., Wang, C., & Chen, G. (2023). Open Pose Mask R-CNN Network for Individual Cattle Recognition. IEEE Access, 11(September), 113752–113768. https://doi.org/10.1109/ACCESS.2023.3321152
Wang, Y., Zhang, T., Zhao, L., Hu, L., Wang, Z., Niu, Z., Cheng, P., Chen, K., Zeng, X., Wang, Z., Wang, H., & Sun, X. (2024). RingMo-Lite: A Remote Sensing Lightweight Network With CNN-Transformer Hybrid Framework. IEEE Transactions on Geoscience and Remote Sensing, 62, 1–20. https://doi.org/10.1109/TGRS.2024.3360447
Zamanidoost, Y., Ould-Bachir, T., & Martel, S. (2025). OMS-CNN: Optimized Multi-Scale CNN for Lung Nodule Detection Based on Faster R-CNN. IEEE Journal of Biomedical and Health Informatics, 29(3), 2148–2160. https://doi.org/10.1109/JBHI.2024.3507360
Zheng, C., Hu, C., Chen, Y., & Li, J. (2023). A Self-Learning-Update CNN Model for Semantic Segmentation of Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 20, 1–5. https://doi.org/10.1109/LGRS.2023.3261402
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Mochamad Rizal Fauzan, Raden Muhammad Rafi Rachman, Shifa Rangga Saputra, Daffa Irsyad Nugraha

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.











