A Systematic Literature Review on the Optimization of K-Means and Agglomerative Clustering for Student Performance Segmentation: A Comparative Analysis of Elbow and Silhouette Methods
DOI:
https://doi.org/10.17977/um039v10i12025p73-88Keywords:
K-Means Clustering, Student Performance Segmentation, Cluster Validation, Elbow Method, Silhouette Score, Educational Data Mining.Abstract
This systematic literature review critically examines the optimization of K-Means and Agglomerative Clustering algorithms for segmenting student academic performance. The study aims to map current research trends, evaluate the comparative application of Elbow and Silhouette validation methods, and identify significant gaps limiting the pedagogical utility of clustering outputs. The review employed the PRISMA protocol and conducted a rigorous search of the Scopus database, yielding 26 studies for the final qualitative synthesis and thematic analysis. Findings reveal a field dominated by quantitative, engineering-driven refinements and a clear trend towards algorithmic hybridization, particularly the use of metaheuristics to strengthen K-Means. While the Elbow and Silhouette methods are canonical, their application is often procedural rather than critically comparative. A core limitation is the pronounced theoretical deficit: clusters derived predominantly from secondary data remain statistically valid but pedagogically inert due to minimal integration with educational or learning sciences frameworks. The discussion underscores that the field's primary bottleneck is not algorithmic but interpretative, stemming from a methodological monoculture and a disconnect between computational output and actionable educational insight. The conclusion emphasizes the imperative for future research to develop explanatory, theory-guided models, employ mixed-methods and longitudinal designs, and address contextual Equity in data provenance to transform segmentation from a descriptive technique into a tool for genuinely understanding and supporting diverse learners
References
Alizade, M., Kheni, R., Price, S., Sousa, B. C., Cote, D. L., & Neamtu, R. (2024). A comparative study of clustering methods for nanoindentation mapping data. Integrating Materials and Manufacturing Innovation, 13(1), 113–127. https://doi.org/10.1007/s40192-024-00349-3
Anantathanavit, M., & Munlin, M. (2015). Using K-means radius particle swarm optimization for the Travelling Salesman Problem. IETE Technical Review, 32(6), 496–505. https://doi.org/10.1080/02564602.2015.1057770
Biggs, J. (1987). Student approaches to learning and studying. Australian Council for Educational Research.
Chen, X., & Liu, Y. (2021). A review of educational data mining for improving learning outcomes. Educational Technology Research and Development, 69(1), 1–25. https://doi.org/10.1007/s11423-021-09974-5
Darojat, M. H., Izfanna, D., Aogubado, A. F., Fauzi, A., & Zh, M. H. R. (2025). Synthesizing traditions: A framework for integrating classical islamic theology into modern educational paradigms. At Turots: Jurnal Pendidikan Islam, 1194-1207.
Deci, E. L., & Ryan, R. M. (2000). The "what" and "why" of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227–268. https://doi.org/10.1207/S15327965PLI1104_01
Dweck, C. S. (2006). Mindset: The new psychology of success. Random House.
Goh, C. L., & Ting, M. H. (2022). Learning analytics for predicting student engagement in online learning environments: A systematic review. Computers & Education, 189, 104578. https://doi.org/10.1016/j.compedu.2022.104578
Gupta, S., & Kumar, A. (2023). A comprehensive evaluation of cluster validity indices for high-dimensional data. Pattern Recognition Letters, 165, 112–119. https://doi.org/10.1016/j.patrec.2022.11.009
Hariningsih, S., Wibowo, A., & Prasetyo, Y. (2024). Predatory publishing in the social sciences: A systematic review of detection and impact. Learned Publishing, 37(1), 45–59. https://doi.org/10.1002/leap.1567
Hong, J. (2021). Research on the development of innovation path of ideological and political education in colleges and universities based on cloud computing and K-means clustering algorithm model. Scientific Programming, 2021, 4263981. https://doi.org/10.1155/2021/4263981
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651–666. https://doi.org/10.1016/j.patrec.2009.09.011
Khan, I., Luo, Z., & Huang, J. Z. (2023). A critique of silhouette coefficient for cluster validation. ACM Computing Surveys, 55(10), 1–25. https://doi.org/10.1145/3572830
Kodinariya, T. M., & Makwana, P. R. (2019). Review on determining number of cluster in K-means clustering. International Journal of Advanced Research in Computer Science and Management Studies, 1(6), 90–95.
Kumar, V., & Singh, A. (2023). Socio-economic determinants of student performance: A clustering approach. Education and Information Technologies, 28(4), 4567–4589. https://doi.org/10.1007/s10639-022-11402-z
Kurniasih, D. D., Zh, M. H. R., Ayunisa, D. A., Maryono, Mustofa, K., & Siswanto. (2025). Analysis of awareness and confidence in learning outcomes with students’ academic motivation: SEM approach. Jurnal Inovasi Teknologi Pendidikan, 12(1), 58–67.
Kuswandi, D., Fadhli, M., Zh, M. H. R., Haditia, M., Sinaga, M. N. A., Thaariq, Z. Z. A., & Ardiansyah, A. (2025). Implementation of personalized approach in video editing learning to improve digital competency of 21st century learners. Cahaya Pendidikan, 11(1), 56-65.
Lasda Bergman, E. M. (2012). Finding citations to social work literature: The relative benefits of using Web of Science, Scopus, or Google Scholar. The Journal of Academic Librarianship, 38(6), 370–379. https://doi.org/10.1016/j.acalib.2012.08.002
Meng, J., Abed, A. M., Elsehrawy, M. G., Al Agha, Afnan, Abdullah, N., Elattar, S., Abbas, M., AL Garalleh, H., & Assilzadeh, H. (2024). Nano-integrating green and low-carbon concepts into ideological and political education in higher education institutions through K-means clustering. Heliyon, 10(8), e31244. https://doi.org/10.1016/j.heliyon.2024.e31244
Miguéis, V. L., Freitas, A., Garcia, P. J. V., & Silva, A. (2018). Early segmentation of students according to their academic performance: A predictive modelling approach. Decision Support Systems, 115, 36–51. https://doi.org/10.1016/j.dss.2018.09.001
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & The PRISMA Group. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS Medicine, 6(7), e1000097. https://doi.org/10.1371/journal.pmed.1000097
Mooi, E., & Sarstedt, M. (2011). A concise guide to market research: The process, data, and methods using IBM SPSS statistics. Springer.
Nanda, S. J., Gulati, I., Chauhan, R., Modi, R., & Dhaked, U. (2018). A K-Means-Galactic Swarm Optimization-Based Clustering Algorithm with Otsu’s Entropy for Brain Tumor Detection. Applied Artificial Intelligence, 32(9-10), 885–914. https://doi.org/10.1080/08839514.2018.1530869
Panic, N., Leoncini, E., de Belvis, G., Ricciardi, W., & Boccia, S. (2013). Evaluation of the endorsement of the preferred reporting items for systematic reviews and meta-analysis (PRISMA) statement on the quality of published systematic review and meta-analyses. PLoS ONE, 8(12), e83138. https://doi.org/10.1371/journal.pone.0083138
Qian, Q., Wang, Z., & Ji, S. (2024). Optimization of machining path for complex surface milling in TC11 based on the K-Means algorithm. Integrated Ferroelectrics, 241(1), 193–202. https://doi.org/10.1080/10584587.2024.2325864
Rauf, A., Ali, S., & Khan, M. (2022). Identifying at-risk students in higher education: A clustering approach using academic and demographic data. Journal of Applied Research in Higher Education, 14(3), 1024–1040. https://doi.org/10.1108/JARHE-05-2021-0185
Rocha, Y. M., de Moura, G. A., Desidério, G. A., de Oliveira, C. H., Lourenço, F. D., & de Figueiredo Nicolete, L. D. (2020). The impact of fake news on social media and its influence on health during the COVID-19 pandemic: A systematic review. Journal of Public Health (Germany). https://doi.org/10.1007/s10389-021-01658-z
Shutaywi, M., & Kachouie, N. N. (2021). Silhouette analysis for performance evaluation in machine learning with applications to clustering. Entropy, 23(6), 759. https://doi.org/10.3390/e23060759
Siddaway, A. P., Wood, A. M., & Hedges, L. V. (2019). How to do a systematic review: A best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annual Review of Psychology, 70, 747–770. https://doi.org/10.1146/annurev-psych-010418-102803
Syakur, M. A., Khotimah, B. K., Rochman, E. M. S., & Satoto, B. D. (2018). Integration k-means clustering method and elbow method for identification of the best customer profile cluster. IOP Conference Series: Materials Science and Engineering, 336, 012017. https://doi.org/10.1088/1757-899X/336/1/012017
Tang, Z., Liu, K., Xiao, J., Yang, L., & Xiao, Z. (2017). A parallel k-means clustering algorithm based on redundance elimination and extreme points optimization employing MapReduce. Concurrency and Computation: Practice and Experience, 29(20), e4109. https://doi.org/10.1002/cpe.4109
ter Huurne, E. D., Hoving, C., & de Vries, H. (2017). An exploration of search terms for systematic reviews on the effectiveness of smoking cessation interventions in smokers with a lower socioeconomic position. Nicotine & Tobacco Research, 19(12), 1523–1526. https://doi.org/10.1093/ntr/ntw264
Thiyagarajan, S. K., & Murugan, K. (2023). Arithmetic optimization-based K means algorithm for segmentation of ischemic stroke lesion. Soft Computing, 27(24), 18663–18678. https://doi.org/10.1007/s00500-023-08225-6
Tuyishimire, E., Mabuto, W., Gatabazi, P., & Bayisingize, S. (2022). Detecting learning patterns in tertiary education using K-means clustering. Information, 13(2), 94. https://doi.org/10.3390/info13020094
Wahyudi, A. (2024). Watase Uake System: A platform for systematic literature review management. [Perangkat Lunak]. Universitas Negeri Malang.
Wati, M., Indriani, F., & Utami, E. (2021). Comparative analysis of elbow and silhouette method in determining the optimal number of clusters for customer segmentation. Journal of Physics: Conference Series, 1869(1), 012066. https://doi.org/10.1088/1742-6596/1869/1/012066
Zh, M. H. R., Sani, N. L., Kuswandi, D., & Fadhli, M. (2024). Needs analysis of development fbo media as a support for blended learning in al-qur’an hadits lesson. Jurnal Pendidikan Agama Islam Al-Thariqah, 9(1), 16–32.
Zh, M. H. R., Pradana, M. I. Y., Soepriyanto, Y., & Budiman, F. (2025). Comparative analysis of student learning outcomes in Al-Qur’an Hadith lessons based on learning media. al-Afkar, Journal For Islamic Studies, 8(1), 241–250.
Zh, M. H. R., Putra, M. F. B., Kuswandi, D., Wedi, A., & Ardiansyah, A. (2024). Developing Wordwall evaluations in blended Islamic education using the Smith and Ragan model. Al-Aulia: Jurnal Pendidikan Dan Ilmu-Ilmu Keislaman, 10(1), 89–104.
Zh, M. H. R., Thaariq, Z. Z. A., & Ardiansyah, A. (2024). Mobile learning: Future learning technologies for Islamic formal education (A literature study). Proceedings of International Conference on Education, 2(1), 370–377.
Zh, M. H. R. (2025). Pengembangan strategi blended learning berbantuan flipbook digital berbasis hypercontent untuk meningkatkan hasil belajar [Disertasi doktoral]. Universitas Negeri Malang.
Zhang, X., & Li, L. (2025). Color extraction and artistic matching design based on silhouette coefficient method and eye tracking technology. Journal of The Institution of Engineers (India): Series C. Advance online publication. https://doi.org/10.1007/s40032-025-01229-1
