Systematic Literature Review: A Comparison of Clustering Methods in Data Mining

Authors

  • Roni Setyawan Universitas Muhammadiyah Surakarta
  • Budi Murtiyasa Universitas Muhammadiyah Surakarta

DOI:

https://doi.org/10.47709/cnahpc.v8i1.7333

Keywords:

Clustering, Data Mining, K-Means, DBSCAN, Hierarchical Clustering, Method Evaluation, Unsupervised Data Analysis

Abstract

Clustering is one of the fundamental techniques in data mining used to group data instances based on inherent similarities without relying on predefined labels. This technique plays a crucial role in numerous domains, including customer behavior analysis, pattern recognition, anomaly detection, bioinformatics, and many other applications that require a deeper understanding of hidden structures within data. Over the past decades, a wide range of clustering methods has been developed such as K-Means, DBSCAN, Hierarchical Clustering, density-based approaches, model-based clustering, and more recent algorithms that incorporate machine learning and deep learning paradigms. Each method offers distinct advantages and limitations and is suited for different data characteristics and analytical objectives. The SLR process includes identifying relevant articles, screening for quality and eligibility, extracting essential data, and synthesizing findings according to predefined systematic criteria. The primary aim of this review is to identify emerging research trends, understand methodological advancements, assess the performance of different clustering methods across diverse data contexts such as varying dataset sizes, noise levels, dimensionality, and cluster distributions and provide insights into the key factors that influence the selection of appropriate clustering techniques. The findings of this review indicate that no single clustering method consistently outperforms others in all scenarios. Certain algorithms may produce optimal results for low-dimensional datasets yet perform inadequately when applied to complex, high-dimensional data. Conversely, some methods are effective at identifying clusters with irregular shapes but require sensitive parameter tuning or exhibit higher computational costs. Therefore, the choice of clustering technique should be guided by the specific characteristics of the dataset, the objectives of the analysis, and evaluation criteria such as accuracy, computational efficiency, interpretability, and robustness to noise. Overall, this review aims to serve as a comprehensive reference for researchers, practitioners, and decision-makers in selecting the most suitable clustering method for their specific analytical needs. Additionally, the study highlights potential avenues for future research, including the development of hybrid algorithms, automated parameter selection techniques, and the integration of clustering with modern machine learning approaches to enhance performance and adaptability across various data environments

Downloads

Download data is not yet available.

References

Ansari, Y. (2024). Multi-Cluster Graph (MCG): A Novel Clustering-Based Multi-Relation Graph Neural Networks for Stock Price Forecasting. IEEE Access, 12, 154482 – 154502. https://doi.org/10.1109/ACCESS.2024.3476159

Ashrafzadeh, M., Taheri, H. M., Gharehgozlou, M., & Hashemkhani Zolfani, S. (2023). Clustering-based return prediction model for stock pre-selection in portfolio optimization using PSO-CNN+MVF. Journal of King Saud University - Computer and Information Sciences, 35(9). https://doi.org/10.1016/j.jksuci.2023.101737

Baig, A., Blau, B. M., & Griffith, T. G. (2021). Firm Opacity and the Clustering of Stock Prices: the Case of Financial Intermediaries. Journal of Financial Services Research, 60(2–3), 187 – 206. https://doi.org/10.1007/s10693-020-00341-w

Baig, A. S., Blau, B. M., & DeLisle, R. J. (2022). Does mutual fund ownership reduce stock price clustering? Evidence from active and index funds. Review of Quantitative Finance and Accounting, 58(2), 615 – 647. https://doi.org/10.1007/s11156-021-01004-0

Baig, A. S., Chaudhry, M. I., & DeLisle, R. J. (2024). Dynamics of price clustering in the Pakistan stock exchange. Managerial Finance, 50(3), 590 – 613. https://doi.org/10.1108/MF-01-2023-0016

Cen, Y., Luo, M., Cen, G., Zhao, C., & Cheng, Z. (2022). Financial Market Correlation Analysis and Stock Selection Application Based on TCN-Deep Clustering. Future Internet, 14(11). https://doi.org/10.3390/fi14110331

Chebbi, T., Migdady, H., Hmedat, W., & Shehadeh, M. (2024). Another look at the price clustering behavior: evidence from the Muscat stock exchange. Review of Behavioral Finance, 16(5), 773 – 791. https://doi.org/10.1108/RBF-02-2023-0053

Chen, J. (2024). Jump Clustering, Information Flows, and Stock Price Efficiency. Journal of Financial Econometrics, 22(5), 1588 – 1615. https://doi.org/10.1093/jjfinec/nbae009

Das, T., Halder, A., & Saha, G. (2024). Application Of Density-Based Clustering Approaches For Stock Market Analysis. Applied Artificial Intelligence, 38(1). https://doi.org/10.1080/08839514.2024.2321550

Eggimann, S., Vulic, N., Rüdisüli, M., Mutschler, R., Orehounig, K., & Sulzer, M. (2022). Spatiotemporal upscaling errors of building stock clustering for energy demand simulation. Energy and Buildings, 258. https://doi.org/10.1016/j.enbuild.2022.111844

Fang, Z., & Chiao, C. (2021). Research on prediction and recommendation of financial stocks based on K-means clustering algorithm optimization. Journal of Computational Methods in Sciences and Engineering, 21(5), 1081 – 1089. https://doi.org/10.3233/JCM-204716

Febe, M., Theotista, G., & Winson. (2025). UNDERSTANDING LQ45 STOCKS (2021-2023) WITH K-MEANS CLUSTERING. Barekeng, 19(1), 153 – 162. https://doi.org/10.30598/barekengvol19iss1pp153-162

Firman Ashari, I., Dwi Nugroho, E., Baraku, R., Yanda, I. N., & Liwardana, R. (2023). Analysis of Elbow, Silhouette, Davies-Bouldin, Calinski-Harabasz, and Rand-Index Evaluation on K-Means Algorithm for Classifying Flood-Affected Areas in Jakarta. In Journal of Applied Informatics and Computing (JAIC) (Vol. 7). Retrieved from http://jurnal.polibatam.ac.id/index.php/JAIC

Gatta, F., Iorio, C., Chiaro, D., Giampaolo, F., & Cuomo, S. (2023). Statistical arbitrage in the stock markets by the means of multiple time horizons clustering. Neural Computing and Applications, 35(16), 11713 – 11731. https://doi.org/10.1007/s00521-023-08313-6

Guan, B., Zhao, C., Yuan, X., Long, J., & Li, X. (2024). Price prediction in China stock market: an integrated method based on time series clustering and image feature extraction. Journal of Supercomputing, 80(7), 8553 – 8591. https://doi.org/10.1007/s11227-023-05562-z

Guo, Y., Guo, J., Sun, B., Bai, J., & Chen, Y. (2022). A new decomposition ensemble model for stock price forecasting based on system clustering and particle swarm optimization. Applied Soft Computing, 130. https://doi.org/10.1016/j.asoc.2022.109726

Hao, X., Liu, C., Liu, M., Zhang, C., & Zheng, L. (2023). Solving a real-world large-scale cutting stock problem: A clustering-assignment-based model. IISE Transactions, 55(11), 1160 – 1173. https://doi.org/10.1080/24725854.2022.2133196

Hendrastuty, N. (2024). Penerapan Data Mining Menggunakan Algoritma K-Means Clustering Dalam Evaluasi Hasil Pembelajaran Siswa. Jurnal Ilmiah Informatika Dan Ilmu Komputer (JIMA-ILKOM), 3(1), 46–56. https://doi.org/10.58602/jima-ilkom.v3i1.26

Jaroonchokanan, N., Termsaithong, T., & Suwanna, S. (2022). Dynamics of hierarchical clustering in stocks market during financial crises. Physica A: Statistical Mechanics and Its Applications, 607. https://doi.org/10.1016/j.physa.2022.128183

Li, M., Zhu, Y., Shen, Y., & Angelova, M. (2023). Clustering-enhanced stock price prediction using deep learning. World Wide Web, 26(1), 207 – 232. https://doi.org/10.1007/s11280-021-01003-0

Li, X., & Wu, P. (2022). Stock Price Prediction Incorporating Market Style Clustering. Cognitive Computation, 14(1), 149 – 166. https://doi.org/10.1007/s12559-021-09820-1

Lobão, J. (2024a). Efficiency and price clustering in Islamic stocks: evidence from three Asian countries. Journal of Islamic Accounting and Business Research, 15(1), 136 – 152. https://doi.org/10.1108/JIABR-05-2022-0140

Lobão, J. (2024b). Efficiency and price clustering in the Baltic stock exchanges: evidence from a micro-level analysis. Journal of Baltic Studies, 55(3), 493 – 511. https://doi.org/10.1080/01629778.2023.2251459

Lobão, J., Pacheco, L., & Carvalho, D. (2024). Exploring the Nordic numbers: an analysis of price clustering in Scandinavian stocks. Review of Behavioral Finance, 16(6), 1012 – 1028. https://doi.org/10.1108/RBF-01-2024-0007

Lúcio, F., & Caiado, J. (2022). COVID-19 and Stock Market Volatility: A Clustering Approach for S&P 500 Industry Indices. Finance Research Letters, 49. https://doi.org/10.1016/j.frl.2022.103141

Lupu, I., Criste, A., Dragu, A. D., & Albu, T. D. (2024). Volatility Transitions in European Stock Markets: A Clustering-Based Approach. Romanian Journal of Economic Forecasting, 27(3), 65 – 80. Retrieved from https://www.scopus.com/inward/record.uri?eid=2-s2.0-85217535322&partnerID=40&md5=ac94a0fb236ab6976fe2845d650c9dba

Mansano, R. E., Allem, L. E., Del-Vecchio, R. R., & Hoppen, C. (2022). Balanced portfolio via signed graphs and spectral clustering in the Brazilian stock market. Quality and Quantity, 56(4), 2325 – 2340. https://doi.org/10.1007/s11135-021-01227-2

Mattera, R., Athanasopoulos, G., & Hyndman, R. (2024). Improving out-of-sample forecasts of stock price indexes with forecast reconciliation and clustering. Quantitative Finance, 24(11), 1641 – 1667. https://doi.org/10.1080/14697688.2024.2412687

Michis, A. A. (2022). Multiscale Partial Correlation Clustering of Stock Market Returns. Journal of Risk and Financial Management, 15(1). https://doi.org/10.3390/jrfm15010024

Naderi, S., Heslop, S., Chen, D., Watts, S., MacGill, I., Pignatta, G., & Sproul, A. (2023). Clustering based analysis of residential duck curve mitigation through solar pre-cooling: A case study of Australian housing stock. Renewable Energy, 216. https://doi.org/10.1016/j.renene.2023.119064

Ngoc, C. D., Huy, T. P., Cam, T. T. T., My, T. L. T., Thuy, H. N. T., & Hai, M. N. (2023). Political Affiliate Clustering with Machine Learning in Vietnam Stock Exchange Market. Journal of International Commerce, Economics and Policy, 14(3). https://doi.org/10.1142/S1793993323500242

Oyewole, G. J., & Thopil, G. A. (2023). Data clustering: application and trends. Artificial Intelligence Review, 56(7), 6439–6475. https://doi.org/10.1007/s10462-022-10325-y

Padsala, D., Jhaveri, R. H., Patel, A. D., Mohammed Alotaibi, F., & Reddy Gadekallu, T. (2025). Integrating Clustering and Regularization for Robust LSTM-Based Stock Price Prediction. Fusion: Practice and Applications, 18(2), 251 – 261. https://doi.org/10.54216/FPA.180218

Park, S. H., & Lim, B. (2023). Insider trade clustering and large variations in stock prices: evidence from the Korean market. Asia-Pacific Journal of Accounting and Economics, 30(5), 1368 – 1389. https://doi.org/10.1080/16081625.2021.1915166

Purwandari, T., Riaman, Hidayat, Y., Sukono, Ibrahim, R. A., & Hidayana, R. A. (2023). Selecting and Weighting Mechanisms in Stock Portfolio Design Based on Clustering Algorithm and Price Movement Analysis. Mathematics, 11(19). https://doi.org/10.3390/math11194151

Razo-De-Anda, J. O., Romero-Castro, L. L., & Venegas-Martínez, F. (2023). Contagion Patterns Classification in Stock Indices: A Functional Clustering Analysis Using Decision Trees. Mathematics, 11(13). https://doi.org/10.3390/math11132961

Ridwan, A. F., Napitupulu, H., & Sukono. (2022). Decision-making in formation of mean-VaR optimal portfolio by selecting stocks using K-means and average linkage clustering. Decision Science Letters, 11(4), 431 – 442. https://doi.org/10.5267/j.dsl.2022.7.002

Safari-Monjeghtapeh, L., & Esmaeilpour, M. (2024). Clustering of listed stock exchange companies active in the cement using the FPC clustering algorithm. Data-Centric Engineering, 5. https://doi.org/10.1017/dce.2024.36

Sun, L., Wang, K., Balezentis, T., Streimikiene, D., & Zhang, C. (2021). Extreme point bias compensation: A similarity method of functional clustering and its application to the stock market. Expert Systems with Applications, 164. https://doi.org/10.1016/j.eswa.2020.113949

Sun, L., Zhu, L., Li, W., Zhang, C., & Balezentis, T. (2022). Interval-valued functional clustering based on the Wasserstein distance with application to stock data. Information Sciences, 606, 910 – 926. https://doi.org/10.1016/j.ins.2022.05.112

Theotista, G., Febe, M., & Ryan, M. S. (2025). ANALYSING MARKET DYNAMICS: REVEALING OBSCURED PATTERNS IN LQ45 STOCKS (2021-2023) USING WARD’S HIERARCHICAL CLUSTERING. Barekeng, 19(1), 163 – 172. https://doi.org/10.30598/barekengvol19iss1pp163-172

Urumov, G., & Chountas, P. (2022). Clustering stock price volatility using intuitionistic fuzzy sets. Notes on Intuitionistic Fuzzy Sets, 28(3), 343 – 352. https://doi.org/10.7546/nifs.2022.28.3.343-352

Wang, J., Chen, Y., Qiu, S., & Cui, Q. (2021). Cuckoo search optimized integrated framework based on feature clustering and deep learning for daily stock price forecasting. Economic Computation and Economic Cybernetics Studies and Research, 55(3), 55 – 70. https://doi.org/10.24818/18423264/55.3.21.04

Wang, J., & Zhu, S. (2023). A multi-factor two-stage deep integration model for stock price prediction based on intelligent optimization and feature clustering. Artificial Intelligence Review, 56(7), 7237 – 7262. https://doi.org/10.1007/s10462-022-10352-9

Wang, X., Yang, K., & Liu, T. (2021). Stock Price Prediction Based on Morphological Similarity Clustering and Hierarchical Temporal Memory. IEEE Access, 9, 67241 – 67248. https://doi.org/10.1109/ACCESS.2021.3077004

Wu, D., Wang, X., & Wu, S. (2022). Construction of stock portfolios based on k-means clustering of continuous trend features. Knowledge-Based Systems, 252. https://doi.org/10.1016/j.knosys.2022.109358

Xing, J., Li, B., & Yang, Y. (2023). Community detection and clustering characteristics analysis of the stock market. Managerial and Decision Economics, 44(7), 3893 – 3906. https://doi.org/10.1002/mde.3929

Xu, M., Han, X., Zhang, J., Huang, K., Peng, M., Qiu, B., & Yang, K. (2024). Integrating Ward’s Clustering Stratification and Spatially Correlated Poisson Disk Sampling to Enhance the Accuracy of Forest Aboveground Carbon Stock Estimation. Forests, 15(12). https://doi.org/10.3390/f15122111

Yang, C.-H., Lee, B., Lee, Y.-I., Chung, Y.-F., & Lin, Y.-D. (2025). An autoencoder-based arithmetic optimization clustering algorithm to enhance principal component analysis to study the relations between industrial market stock indices in real estate. Expert Systems with Applications, 266. https://doi.org/10.1016/j.eswa.2024.126165

Zai, C., & Komputer, T. (n.d.). IMPLEMENTASI DATA MINING SEBAGAI PENGOLAHAN DATA. In Portaldata.org (Vol. 2).

Zainudin, N. S., Ting, C.-Y., Khor, K.-C., Ng, K.-H., Tong, G.-K., & Kalid, S. N. (2024). Clustering Defensive Shariah-compliant Stocks Using Financial Performance as the Indicator. International Journal on Informatics Visualization, 8(1), 115 – 122. https://doi.org/10.62527/joiv.8.1.2269

Zhang, S., & Fang, W. (2021). Multifractal behaviors of stock indices and their ability to improve forecasting in a volatility clustering period. Entropy, 23(8). https://doi.org/10.3390/e23081018

Downloads

Published

2026-01-19

How to Cite

Setyawan, R., & Murtiyasa, B. (2026). Systematic Literature Review: A Comparison of Clustering Methods in Data Mining. Journal of Computer Networks, Architecture and High Performance Computing, 8(1), 36–53. https://doi.org/10.47709/cnahpc.v8i1.7333

Similar Articles

<< < 27 28 29 30 31 32 33 34 35 36 > >> 

You may also start an advanced similarity search for this article.