Systematic Literature Review: A Comparison of Clustering Methods in Data Mining
DOI:
https://doi.org/10.47709/cnahpc.v8i1.7333Keywords:
Clustering, Data Mining, K-Means, DBSCAN, Hierarchical Clustering, Method Evaluation, Unsupervised Data AnalysisAbstract
Clustering is one of the fundamental techniques in data mining used to group data instances based on inherent similarities without relying on predefined labels. This technique plays a crucial role in numerous domains, including customer behavior analysis, pattern recognition, anomaly detection, bioinformatics, and many other applications that require a deeper understanding of hidden structures within data. Over the past decades, a wide range of clustering methods has been developed such as K-Means, DBSCAN, Hierarchical Clustering, density-based approaches, model-based clustering, and more recent algorithms that incorporate machine learning and deep learning paradigms. Each method offers distinct advantages and limitations and is suited for different data characteristics and analytical objectives. The SLR process includes identifying relevant articles, screening for quality and eligibility, extracting essential data, and synthesizing findings according to predefined systematic criteria. The primary aim of this review is to identify emerging research trends, understand methodological advancements, assess the performance of different clustering methods across diverse data contexts such as varying dataset sizes, noise levels, dimensionality, and cluster distributions and provide insights into the key factors that influence the selection of appropriate clustering techniques. The findings of this review indicate that no single clustering method consistently outperforms others in all scenarios. Certain algorithms may produce optimal results for low-dimensional datasets yet perform inadequately when applied to complex, high-dimensional data. Conversely, some methods are effective at identifying clusters with irregular shapes but require sensitive parameter tuning or exhibit higher computational costs. Therefore, the choice of clustering technique should be guided by the specific characteristics of the dataset, the objectives of the analysis, and evaluation criteria such as accuracy, computational efficiency, interpretability, and robustness to noise. Overall, this review aims to serve as a comprehensive reference for researchers, practitioners, and decision-makers in selecting the most suitable clustering method for their specific analytical needs. Additionally, the study highlights potential avenues for future research, including the development of hybrid algorithms, automated parameter selection techniques, and the integration of clustering with modern machine learning approaches to enhance performance and adaptability across various data environments
Downloads
References
Ansari, Y. (2024). Multi-Cluster Graph (MCG): A Novel Clustering-Based Multi-Relation Graph Neural Networks for Stock Price Forecasting. IEEE Access, 12, 154482 – 154502. https://doi.org/10.1109/ACCESS.2024.3476159
Ashrafzadeh, M., Taheri, H. M., Gharehgozlou, M., & Hashemkhani Zolfani, S. (2023). Clustering-based return prediction model for stock pre-selection in portfolio optimization using PSO-CNN+MVF. Journal of King Saud University - Computer and Information Sciences, 35(9). https://doi.org/10.1016/j.jksuci.2023.101737
Baig, A., Blau, B. M., & Griffith, T. G. (2021). Firm Opacity and the Clustering of Stock Prices: the Case of Financial Intermediaries. Journal of Financial Services Research, 60(2–3), 187 – 206. https://doi.org/10.1007/s10693-020-00341-w
Baig, A. S., Blau, B. M., & DeLisle, R. J. (2022). Does mutual fund ownership reduce stock price clustering? Evidence from active and index funds. Review of Quantitative Finance and Accounting, 58(2), 615 – 647. https://doi.org/10.1007/s11156-021-01004-0
Baig, A. S., Chaudhry, M. I., & DeLisle, R. J. (2024). Dynamics of price clustering in the Pakistan stock exchange. Managerial Finance, 50(3), 590 – 613. https://doi.org/10.1108/MF-01-2023-0016
Cen, Y., Luo, M., Cen, G., Zhao, C., & Cheng, Z. (2022). Financial Market Correlation Analysis and Stock Selection Application Based on TCN-Deep Clustering. Future Internet, 14(11). https://doi.org/10.3390/fi14110331
Chebbi, T., Migdady, H., Hmedat, W., & Shehadeh, M. (2024). Another look at the price clustering behavior: evidence from the Muscat stock exchange. Review of Behavioral Finance, 16(5), 773 – 791. https://doi.org/10.1108/RBF-02-2023-0053
Chen, J. (2024). Jump Clustering, Information Flows, and Stock Price Efficiency. Journal of Financial Econometrics, 22(5), 1588 – 1615. https://doi.org/10.1093/jjfinec/nbae009
Das, T., Halder, A., & Saha, G. (2024). Application Of Density-Based Clustering Approaches For Stock Market Analysis. Applied Artificial Intelligence, 38(1). https://doi.org/10.1080/08839514.2024.2321550
Eggimann, S., Vulic, N., Rüdisüli, M., Mutschler, R., Orehounig, K., & Sulzer, M. (2022). Spatiotemporal upscaling errors of building stock clustering for energy demand simulation. Energy and Buildings, 258. https://doi.org/10.1016/j.enbuild.2022.111844
Fang, Z., & Chiao, C. (2021). Research on prediction and recommendation of financial stocks based on K-means clustering algorithm optimization. Journal of Computational Methods in Sciences and Engineering, 21(5), 1081 – 1089. https://doi.org/10.3233/JCM-204716
Febe, M., Theotista, G., & Winson. (2025). UNDERSTANDING LQ45 STOCKS (2021-2023) WITH K-MEANS CLUSTERING. Barekeng, 19(1), 153 – 162. https://doi.org/10.30598/barekengvol19iss1pp153-162
Firman Ashari, I., Dwi Nugroho, E., Baraku, R., Yanda, I. N., & Liwardana, R. (2023). Analysis of Elbow, Silhouette, Davies-Bouldin, Calinski-Harabasz, and Rand-Index Evaluation on K-Means Algorithm for Classifying Flood-Affected Areas in Jakarta. In Journal of Applied Informatics and Computing (JAIC) (Vol. 7). Retrieved from http://jurnal.polibatam.ac.id/index.php/JAIC
Gatta, F., Iorio, C., Chiaro, D., Giampaolo, F., & Cuomo, S. (2023). Statistical arbitrage in the stock markets by the means of multiple time horizons clustering. Neural Computing and Applications, 35(16), 11713 – 11731. https://doi.org/10.1007/s00521-023-08313-6
Guan, B., Zhao, C., Yuan, X., Long, J., & Li, X. (2024). Price prediction in China stock market: an integrated method based on time series clustering and image feature extraction. Journal of Supercomputing, 80(7), 8553 – 8591. https://doi.org/10.1007/s11227-023-05562-z
Guo, Y., Guo, J., Sun, B., Bai, J., & Chen, Y. (2022). A new decomposition ensemble model for stock price forecasting based on system clustering and particle swarm optimization. Applied Soft Computing, 130. https://doi.org/10.1016/j.asoc.2022.109726
Hao, X., Liu, C., Liu, M., Zhang, C., & Zheng, L. (2023). Solving a real-world large-scale cutting stock problem: A clustering-assignment-based model. IISE Transactions, 55(11), 1160 – 1173. https://doi.org/10.1080/24725854.2022.2133196
Hendrastuty, N. (2024). Penerapan Data Mining Menggunakan Algoritma K-Means Clustering Dalam Evaluasi Hasil Pembelajaran Siswa. Jurnal Ilmiah Informatika Dan Ilmu Komputer (JIMA-ILKOM), 3(1), 46–56. https://doi.org/10.58602/jima-ilkom.v3i1.26
Jaroonchokanan, N., Termsaithong, T., & Suwanna, S. (2022). Dynamics of hierarchical clustering in stocks market during financial crises. Physica A: Statistical Mechanics and Its Applications, 607. https://doi.org/10.1016/j.physa.2022.128183
Li, M., Zhu, Y., Shen, Y., & Angelova, M. (2023). Clustering-enhanced stock price prediction using deep learning. World Wide Web, 26(1), 207 – 232. https://doi.org/10.1007/s11280-021-01003-0
Li, X., & Wu, P. (2022). Stock Price Prediction Incorporating Market Style Clustering. Cognitive Computation, 14(1), 149 – 166. https://doi.org/10.1007/s12559-021-09820-1
Lobão, J. (2024a). Efficiency and price clustering in Islamic stocks: evidence from three Asian countries. Journal of Islamic Accounting and Business Research, 15(1), 136 – 152. https://doi.org/10.1108/JIABR-05-2022-0140
Lobão, J. (2024b). Efficiency and price clustering in the Baltic stock exchanges: evidence from a micro-level analysis. Journal of Baltic Studies, 55(3), 493 – 511. https://doi.org/10.1080/01629778.2023.2251459
Lobão, J., Pacheco, L., & Carvalho, D. (2024). Exploring the Nordic numbers: an analysis of price clustering in Scandinavian stocks. Review of Behavioral Finance, 16(6), 1012 – 1028. https://doi.org/10.1108/RBF-01-2024-0007
Lúcio, F., & Caiado, J. (2022). COVID-19 and Stock Market Volatility: A Clustering Approach for S&P 500 Industry Indices. Finance Research Letters, 49. https://doi.org/10.1016/j.frl.2022.103141
Lupu, I., Criste, A., Dragu, A. D., & Albu, T. D. (2024). Volatility Transitions in European Stock Markets: A Clustering-Based Approach. Romanian Journal of Economic Forecasting, 27(3), 65 – 80. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85217535322&partnerID=40&md5=ac94a0fb236ab6976fe2845d650c9dba
Mansano, R. E., Allem, L. E., Del-Vecchio, R. R., & Hoppen, C. (2022). Balanced portfolio via signed graphs and spectral clustering in the Brazilian stock market. Quality and Quantity, 56(4), 2325 – 2340. https://doi.org/10.1007/s11135-021-01227-2
Mattera, R., Athanasopoulos, G., & Hyndman, R. (2024). Improving out-of-sample forecasts of stock price indexes with forecast reconciliation and clustering. Quantitative Finance, 24(11), 1641 – 1667. https://doi.org/10.1080/14697688.2024.2412687
Michis, A. A. (2022). Multiscale Partial Correlation Clustering of Stock Market Returns. Journal of Risk and Financial Management, 15(1). https://doi.org/10.3390/jrfm15010024
Naderi, S., Heslop, S., Chen, D., Watts, S., MacGill, I., Pignatta, G., & Sproul, A. (2023). Clustering based analysis of residential duck curve mitigation through solar pre-cooling: A case study of Australian housing stock. Renewable Energy, 216. https://doi.org/10.1016/j.renene.2023.119064
Ngoc, C. D., Huy, T. P., Cam, T. T. T., My, T. L. T., Thuy, H. N. T., & Hai, M. N. (2023). Political Affiliate Clustering with Machine Learning in Vietnam Stock Exchange Market. Journal of International Commerce, Economics and Policy, 14(3). https://doi.org/10.1142/S1793993323500242
Oyewole, G. J., & Thopil, G. A. (2023). Data clustering: application and trends. Artificial Intelligence Review, 56(7), 6439–6475. https://doi.org/10.1007/s10462-022-10325-y
Padsala, D., Jhaveri, R. H., Patel, A. D., Mohammed Alotaibi, F., & Reddy Gadekallu, T. (2025). Integrating Clustering and Regularization for Robust LSTM-Based Stock Price Prediction. Fusion: Practice and Applications, 18(2), 251 – 261. https://doi.org/10.54216/FPA.180218
Park, S. H., & Lim, B. (2023). Insider trade clustering and large variations in stock prices: evidence from the Korean market. Asia-Pacific Journal of Accounting and Economics, 30(5), 1368 – 1389. https://doi.org/10.1080/16081625.2021.1915166
Purwandari, T., Riaman, Hidayat, Y., Sukono, Ibrahim, R. A., & Hidayana, R. A. (2023). Selecting and Weighting Mechanisms in Stock Portfolio Design Based on Clustering Algorithm and Price Movement Analysis. Mathematics, 11(19). https://doi.org/10.3390/math11194151
Razo-De-Anda, J. O., Romero-Castro, L. L., & Venegas-Martínez, F. (2023). Contagion Patterns Classification in Stock Indices: A Functional Clustering Analysis Using Decision Trees. Mathematics, 11(13). https://doi.org/10.3390/math11132961
Ridwan, A. F., Napitupulu, H., & Sukono. (2022). Decision-making in formation of mean-VaR optimal portfolio by selecting stocks using K-means and average linkage clustering. Decision Science Letters, 11(4), 431 – 442. https://doi.org/10.5267/j.dsl.2022.7.002
Safari-Monjeghtapeh, L., & Esmaeilpour, M. (2024). Clustering of listed stock exchange companies active in the cement using the FPC clustering algorithm. Data-Centric Engineering, 5. https://doi.org/10.1017/dce.2024.36
Sun, L., Wang, K., Balezentis, T., Streimikiene, D., & Zhang, C. (2021). Extreme point bias compensation: A similarity method of functional clustering and its application to the stock market. Expert Systems with Applications, 164. https://doi.org/10.1016/j.eswa.2020.113949
Sun, L., Zhu, L., Li, W., Zhang, C., & Balezentis, T. (2022). Interval-valued functional clustering based on the Wasserstein distance with application to stock data. Information Sciences, 606, 910 – 926. https://doi.org/10.1016/j.ins.2022.05.112
Theotista, G., Febe, M., & Ryan, M. S. (2025). ANALYSING MARKET DYNAMICS: REVEALING OBSCURED PATTERNS IN LQ45 STOCKS (2021-2023) USING WARD’S HIERARCHICAL CLUSTERING. Barekeng, 19(1), 163 – 172. https://doi.org/10.30598/barekengvol19iss1pp163-172
Urumov, G., & Chountas, P. (2022). Clustering stock price volatility using intuitionistic fuzzy sets. Notes on Intuitionistic Fuzzy Sets, 28(3), 343 – 352. https://doi.org/10.7546/nifs.2022.28.3.343-352
Wang, J., Chen, Y., Qiu, S., & Cui, Q. (2021). Cuckoo search optimized integrated framework based on feature clustering and deep learning for daily stock price forecasting. Economic Computation and Economic Cybernetics Studies and Research, 55(3), 55 – 70. https://doi.org/10.24818/18423264/55.3.21.04
Wang, J., & Zhu, S. (2023). A multi-factor two-stage deep integration model for stock price prediction based on intelligent optimization and feature clustering. Artificial Intelligence Review, 56(7), 7237 – 7262. https://doi.org/10.1007/s10462-022-10352-9
Wang, X., Yang, K., & Liu, T. (2021). Stock Price Prediction Based on Morphological Similarity Clustering and Hierarchical Temporal Memory. IEEE Access, 9, 67241 – 67248. https://doi.org/10.1109/ACCESS.2021.3077004
Wu, D., Wang, X., & Wu, S. (2022). Construction of stock portfolios based on k-means clustering of continuous trend features. Knowledge-Based Systems, 252. https://doi.org/10.1016/j.knosys.2022.109358
Xing, J., Li, B., & Yang, Y. (2023). Community detection and clustering characteristics analysis of the stock market. Managerial and Decision Economics, 44(7), 3893 – 3906. https://doi.org/10.1002/mde.3929
Xu, M., Han, X., Zhang, J., Huang, K., Peng, M., Qiu, B., & Yang, K. (2024). Integrating Ward’s Clustering Stratification and Spatially Correlated Poisson Disk Sampling to Enhance the Accuracy of Forest Aboveground Carbon Stock Estimation. Forests, 15(12). https://doi.org/10.3390/f15122111
Yang, C.-H., Lee, B., Lee, Y.-I., Chung, Y.-F., & Lin, Y.-D. (2025). An autoencoder-based arithmetic optimization clustering algorithm to enhance principal component analysis to study the relations between industrial market stock indices in real estate. Expert Systems with Applications, 266. https://doi.org/10.1016/j.eswa.2024.126165
Zai, C., & Komputer, T. (n.d.). IMPLEMENTASI DATA MINING SEBAGAI PENGOLAHAN DATA. In Portaldata.org (Vol. 2).
Zainudin, N. S., Ting, C.-Y., Khor, K.-C., Ng, K.-H., Tong, G.-K., & Kalid, S. N. (2024). Clustering Defensive Shariah-compliant Stocks Using Financial Performance as the Indicator. International Journal on Informatics Visualization, 8(1), 115 – 122. https://doi.org/10.62527/joiv.8.1.2269
Zhang, S., & Fang, W. (2021). Multifractal behaviors of stock indices and their ability to improve forecasting in a volatility clustering period. Entropy, 23(8). https://doi.org/10.3390/e23081018
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Roni Setyawan, Budi Murtiyasa

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.











