Prediction of Admission Decisions using Machine Learning Models: An Analysis of the Holistic Undergraduate Admissions Review Process in Korea
Main Article Content
Abstract
This study aimed to improve the prediction accuracy of the holistic undergraduate admissions review process by applying machine learning models. In Korea, the holistic undergraduate admissions review process, which is widely adopted, relies entirely on human evaluators. In contrast, the present study employed and compared five machine learning algorithms ‒ Gradient Boosting, Random Forest, Support Vector Regression, Logistic Regression, and XGBoost ‒ to identify the most effective predictive model. The analysis utilized a dataset consisting of 1,554 application records from the 2024 application cycle. To improve model performance, Latent Dirichlet Allocation (LDA) was applied to extract meaningful features from unstructured text data. The findings revealed that among the models, the XGBoost model performed best in predicting admission outcomes. Major factors influencing admission decisions included interview scores (overall scores, academic competency, suitability for the major), application type (e.g., general student admission), and document evaluation scores emerged as dominant predictors in the model, supporting the effectiveness of the XGBoost model. The study’s findings provide not only practical recommendations for improving prediction accuracy in the holistic undergraduate admissions but also directions for future research in data-driven educational assessment.
Article Details
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Adnan, K., & Akbar, R. (2019). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1), 1-38. https://doi.org/10.1186/s40537-019-0254-8
Ahuja, R., & Kankane, Y. (2017). Predicting the probability of student’s degree completion by using different data mining techniques. In 2017 Fourth International Conference on Image Information Processing (ICIIP)(pp. 1-4). IEEE. https://doi.org/10.20368/1971-8829/1135017
AL-Alawi, L., AL Shaqsi, J., Tarhini, A., & AL-Busaidi, A. S. (2023). Using machine learning to predict factors affecting academic performance: The case of college students on academic probation. Education and Information Technologies, 28(10), 12407-12432. https://doi.org/10.1007/s10639-023-11700-0
Alghamdi, A., Barsheed, A., Almshjary, H., & Alghamdi, H. (2020, March). A machine learning approach for graduate admission prediction. In Proceedings of the 2020 2nd International Conference on Image, Video and Signal Processing (pp. 155-158). https://doi.org/10.1145/3388818.3393716
Altabrawee, H., Ali, O. A. J., & Ajmi, S. Q. (2019). Predicting students’ performance using machine learning techniques. Journal of University of Babylon, Pure and Applied Sciences, 27(1), 194-205. https://doi.org/10.29196/jubpas.v27i1.2108
Alyahyan, E., & Düştegör, D. (2020). Predicting academic success in higher education: Literature review and best practices. International Journal of Educational Technology in Higher Education, 17(1), 3-24. https://doi.org/10.1186/s41239-020-0177-7
Anderson, T., & Kohler, H. (2013). Education fever and the east Asian fertility puzzle: A case study of low fertility in South Korea. Asian Population Studies, 9(2), 196-215. https://doi.org/10.1080/17441730.2013.797293
Arora, S. (2024, August 14). Data mining Vs. machine learning: The key difference. https://www.simplilearn.com/data-mining-vs-machine-learning-article
Baker, R. S., Corbett, A. T., & Koedinger, K. R. (2009). Developing a generalizable detector of when students game the system. User Modeling and User-Adapted Interaction, 18(3), 287-314. https://doi.org/10.1007/s11257-007-9045-6
Baker, R. S., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3-17. https://doi.org/10.5281/zenodo.3554657
Bharara, S., Sabitha, S., & Bansal, A. (2018). Application of learning analytics using clustering data mining for students’ disposition analysis. Education and Information Technologies, 23(3), 957-984. https://doi.org/10.1007/s10639-017-9645-7
Bird, S., Klein, E., & Dloper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media, Inc.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17-35.
Bucos, M., & Drăgulescu, B. (2018). Predicting student success using data generated in traditional educational environments. STEM Journal, 7(3), 617-625. DOI: 10.18421/TEM73-19 https://dx.doi.org/10.18421/TEM73-19
Bornmann, L., Mittag, S., & Danie, H. D. (2006). Quality assurance in higher education–meta-evaluation of multi-stage evaluation procedures in Germany. Higher Education, 52, 687-709. https://doi.org/10.1007/s10734-004-8306-0
Bowers, A. J. (2018). Early warning systems and predictive analytics for improved student outcomes. Journal of Education for Students Placed at Risk (JESPAR), 23(1-2), 86-106.
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
Chai, K., Sheng, M., Liu, Y., & Zhao, J. (2021). A comparative study of machine learning models for predicting student success in higher education. IEEE Access, 9, 95674-95683.
Chen, W., Zhang, S., Li, R., & Shahabi, H. (2018). Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Science of the Total Environment, 644, 1006-1018.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). https://doi.org/10.1145/2939672.2939785
Chicco, D., & Juman, G. (2020). Th rubric-based holistic review: A promising route to equitable graduate admissions in physics e advantage of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21, 1-13.
Coleman, A. L., & Keith, J. L. (2018). Understanding holistic review in higher education admissions. College Board.
Cortez, P., & Silva, A. (2008). Using data mining to predict secondary school student performance. In Proceedings of the 5th Annual Future Business Technology Conference (pp. 5-12).
Deist, T. M., Dankers, F. J., Valdes, G., Wijsman, R., Hsu, I., Oberije, C., & Jochems, A. (2018). Machine learning algorithms for outcome prediction in (chemo) radiotherapy: An empirical comparison of classifiers. Medical Physics, 45(7), 3449-3459.
Duan, S. (2020). Topic modeling for educational research: A comparative analysis of Latent Dirichlet Allocation and Correlated Topic Model. Educational Technology Research and Development, 68(3), 1475-1491.
Ekowo, M., & Palmer, I. (2016). The promise and peril of predictive analytics in higher education: A landscape analysis. New America Foundation. https://www.newamerica.org/education-policy/policy-papers/promise-and-peril-predictive-analytics-higher-education/
Espenshade, T. J., & Radford, A. W. (2009). No longer separate, not yet equal: Race and class in elite college admission and campus life. Princeton University Press.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874.
Fernandes, E., Holanda, M., Victorino, M., Borges, V., Carvalho, R., & Van Erven, G. (2019). Educational data mining: Predictive models for student academic performance. IEEE Access,7(3), 1-11.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 1189-1232.
García, P., García, J., Castillo, G., & Lorente, R. (2019). Predicting academic success through machine learning: An analysis of factors influencing students' academic performance. IEEE Access, 7, 8282-82833.
Geiser, S., & Santelices, M. V. (2007). Validity of high-school grades in predicting student success beyond the freshman year: High-school record vs. standardized tests as indicators of four-year college outcomes. Research and Occasional Papers Series. Center for Studies in Higher Education.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
Hiss, W. C., & Franks, V. W. (2014). Defining promise: Optional standardized testing policies in American college and university admissions. Report of the National Association for College Admission Counseling (NACAC).
Hosmer JR, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
Hossler, D., Chung, E., Kwon, J., Lucido, J., Bowman, N., & Bastedo, M. (2019). A study of the use of nonacademic factors in holistic undergraduate admissions reviews. The Journal of Higher Education, 90(6), 833-859.
Huang, Y., & Fang, Y. (2020). Machine learning for educational applications: A review of the literature. Journal of Learning Analytics,7(3), 9-29.
Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. (2019). Student engagement predictions in an e-learning system and their impact on student course assessment scores. Computational Intelligence and Neuroscience, 9(4), 1-21. https://doi.org/10.1155/2018/6347186
Ibrahim, Z. M. (2023). Text mining framework for detecting assessment and feedback issues using students’ evaluation surveys. University of Portsmouth.
Jia, J. W., & Mareboyana, M. (2013). Machine learning algorithms and predictive models for undergraduate student retention. In Proceedings of the World Congress on Engineering and Computer Science (pp. 23-25). International Association of Engineers.
Jo, H. (2018). Changes and challenges in the rise of mass higher education in Korea. In A. Wu., & J. Hawkins (Eds). Higher education in Asia: Quality, excellence and governance (pp. 39-56). Springer. https://doi.org/10.1007/978-981-13-0248-0_4
Kaur, J., & Buttar, P. K. (2018). A systematic review on stopword removal algorithms. International Journal on Future Revolution in Computer Science & Communication Engineering, 4(4), 207-210.
Kim, J. H., & Hwang, S. H. (2018). Predicting college student success using academic and non-academic factors: A multi-level analysis. Journal of Educational Measurement, 55(10), 1-19.
Kim, H. (2024). A fad or the new norm for student access today? Evaluating enrollment outcomes of holistic admissions in South Korea. Research in Higher Education, 65, 1040-1064. https://doi.org/10.1007/s11162-024-09776-9
Kim, S., & Kim, N. (2024). Unveiling the evolving educational inequality from upper secondary to higher education in South Korea: From effectively maintained inequality theory perspective. High Education, 2024. https://doi.org/10.1007/s10734-024-01301-2
Kotsiantis, S. B. (2012). Use of machine learning techniques for educational purposes: A decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331-344. https://doi.org/10.1007/s10462-011-9234-x
Lakkaraju, H., Aguiar, E., Shan, C., Miller, D., Yuhas, B., & Bhanpuri, N. (2015). A machine learning framework to identify students at risk of adverse academic outcomes. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1909-1918).
Lantz, B. (2019). Machine learning with R: Expert techniques for predictive modeling. Packt publishing Ltd.
Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In Proceedings of International Conference on Machine Learning (pp. 1188-1196). PMLR.
Ling, C. X., Huang, J., & Zhang, H. (2003). AUC: A statistically consistent and more discriminating measure than accuracy. Ijcai, 3, 519-524.
Luo, L., Li, J., Liu, C., & Shen, W. (2019). Using machine‐learning methods to support health‐care professionals in making admission decisions. The International journal of health planning and management, 34(2), e1236-e1246. https://doi.org/10.1002/hpm.2769
Ma, L. (2016). Female labour force participation and second birth rates in South Korea. Journal of Population Research, 33, 173-195.
Maulana, A., Noviandy, T. R., Sasmita, N. R., Paristiowati, M., Suhendra, R., Yandri, E., & Idroes, R. (2023). Optimizing university admissions: A machine learning perspective. Journal of Educational Management and Learning, 1(1), 1-7. https://doi.org/10.60084/jeml.v1i1.46
Massey, A. K., Eisenstein, J., Antón, A. I., & Swire, P. P. (2013). Automated text mining for requirements analysis of policy documents. In Proceeding of the 2023 21st IEEE International Requirements Engineering Conference (pp. 4-13).
Mengash, H. A. (2020). Using data mining techniques to predict student performance to support decision making in university admission systems. IEE Access, 8, 55462-55470. https://doi.org/10.1109/ACCESS.2020.2981905
Namoun, A., & Alshanqiti, A. (2020). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences, 11(1), 237-265. https://doi.org/10.3390/app11010237
Nghe, N. T., Janecek, P., & Haddawy, P. (2007). A comparative analysis of techniques for predicting academic performance. In Proceedings of the 37th Annual Frontiers in Education Conference (pp. T2G-7).
Obsie, E. Y., & Adem, S. A. (2018). Prediction of student academic performance using neural network, linear regression and support vector regression: A case study. International Journal of Computer Applications, 180(40), 39-47.
Ojajuni, O., Ayeni, F., Akodu, O., Ekanoye, F., Adewole, S., Ayo, T., & Mbarika, V. (2021). Predicting student academic performance using machine learning. In Proceedings of Computational Science and Its Applications–ICCSA 2021: 21st International Conference, Cagliari, Italy (pp. 481-491). Springer International Publishing.
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130-137. https://doi.org/10.1108/eb046814
Posselt, J. R. (2016). Inside graduate admissions: Merit, diversity, and faculty gatekeeping. Harvard University Press.
Posselt, J. R., Jaquette, O., Bielby, R., & Bastedo, M. N. (2012). Access without equity: Longitudinal analyses of institutional stratification by race and ethnicity, 1972-2004. American Educational Research Journal, 49(6), 1074-1111.
Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.
Pradana, A. W., & Hayaty, M. (2019). The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on Indonesian-language texts. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 4(4), 375-380.
Prihatini, P. M., Suryawan, I. K., & Mandia, I. N. (2018). Feature extraction for document text using Latent Dirichlet Allocation. The Journal of Physics: Conference Series, 953(1), 012047.
Raghavendran, C. V., Pavan Venkata Vamsi, C., Veerraju, T., & Veluri, R. K. (2021). Predicting student admissions rate into university using machine learning models. In Proceedings of the Machine Intelligence and Soft Computing: Proceedings of ICMISC 2020 (pp. 151-162).
Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied Sciences,10(3), 1-25. https://doi.org/10.3390/app10031042
Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135-146. https://doi.org/10.1016/j.eswa.2006.04.005
Sahin, E. K. (2020). Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Applied Sciences, 2(7), 1308. https://doi.org/10.1007/s42452-020-3060-1
Siino, M., Tinnirello, I., & La Cascia, M. (2024). Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on transformers and traditional classifiers. Information Systems, 121, 102342. https://doi.org/10.1016/j.is.2023.102342
Singh, J., & Gupta, V. (2017). A systematic review of text stemming techniques. Artificial Intelligence Review, 48, 99-222. https://doi.org/10.1007/s10462-016-9498-2
Stehman, S. V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 62(1), 77-89. https://doi.org/10.1016/S0034-4257(97)00083-7
Stevens, M. (2018). Holistic admissions: Perspectives on key predictors in student success. Higher Education, 73(6), 889-910.
Tair, M. M., & El-Halees, A. M. (2012). Mining educational data to improve students’ performance: A case study. International Journal of Information and Communication Technology Research, 2(2), 140-146.
Taub, M., & Azevedo, R. (2018). Using sequence mining to analyze metacognitive monitoring and scientific inquiry based on levels of efficiency and emotions during game-based learning. Journal of Educational Data Mining, 10(3), 1-26.
Walid, M. A. A., Ahmed, S. M., Zeyad, M., Galib, S. S., & Nesa, M. (2022). Analysis of machine learning strategies for prediction of passing undergraduate admission test. International Journal of Information Management Data Insights, 2(2), 100111. https://doi.org/10.1016/j.jjimei.2022.100111
Wang, Y., Sun, Z., Zhang, H., Cui, W., Xu, K., Ma, X., & Zhang, D. (2019). Datashot: Automatic generation of fact sheets from tabular data. IEEE Transactions on Visualization and Computer Graphics, 26(1), 895-905. https://doi.org/ 10.1109/TVCG.2019.2934398
Willett, P. (2006). The Porter stemming algorithm: then and now. Program, 40(3), 219-223.
Xu, L. (2024). Prediction of College Admission Scores Based on an XGBoost-LSTM Hybrid Model. In Proceedings of the 3rd International Conference on Educational Innovation and Multimedia Technology, EIMT 2024, March 29-31. http://dx.doi.org/10.4108/eai.29-3-2024.2347687
Wu, J. P., Lin, M. S., & Tsai, C. L. (2023). A predictive model that aligns admission offers with student enrollment probability. Education Sciences, 13(5), 440.
Yadav, S. K., Bharadwaj, B., & Pal, S. (2012). Mining Education data to predict student's retention: A comparative study. arXiv preprint arXiv:1203.2987.
https://doi.org/10.48550/arXiv.1203.2987
Yağci, M. (2022). Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Journal of Educational Management and Learning, 9(1), 11-30. 1 https://doi.org/10.1186/s40561-022-00192-z
Yang, X., Yang, K., Cui, T., Chen, M., & He, L. (2022). A study of text vectorization method combining topic model and transfer learning. Processes, 10(2), 350-146. ; https://doi.org/10.3390/pr10020350
Yoo, S. H., & Sobotka, T. (2018). Ultra-low fertility in South Korea: The role of the tempo effect. Demographic Research, 38, 549-576.
https://doi.org/10.4054/DemRes.2018.38.22
Young, N. T., Tollefson, K., Zegers, R. G., & Caballero, M. D. (2022). Rubric-based holistic review: A promising route to equitable graduate admissions in physics. Physical Review Physics Education Research, 18(2), 020140.
Zafra, A., & Ventura, S. (2009). Predicting student grades in learning management systems with multiple instance genetic programming. In Proceedings of the 2009 9th International Working Group on Educational Data Mining (pp. 307-314).
Zerwic, J. J., Scott, L. D., Mccreary, L. L., & Corte, C. (2018). Programmatic evaluation of holistic admissions: The influence on students. Journal of Nursing Education, 57, 416-421. https://doi.org/10.3928/01484834-20180618-06