Exploring student course transition prediction in online schools using positive and unlabeled data
Miki Katsuragi, Kenji Tanaka
Abstract
In this study, we analyzed the daily learning data of students enrolled in an online school to predict school withdrawal using machine learning techniques. Specifically, we focused on predicting student withdrawal one month in advance based on their attribute data, class enrollment data, and communication records between teachers and students over the preceding three months. Unlike traditional binary classification methods that simply categorize outcomes as Positive or Negative, we employed a Positive and Unlabeled (PU) Learning approach to consider not only the timing of withdrawal but also the potential for future withdrawals. This approach resulted in an improved recall rate, increasing the accuracy of our predictions of student course changes from 63% to 72%
Keywords
References
Baranyi, M., Nagy, M., & Molontay, R. (2020). Interpretable deep learning for university dropout prediction. Proceedings of the 21st Annual Conference on Information Technology Education, 13-19.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Chung, J. Y., & Lee, S. (2019). Dropout early warning systems for high school students using machine learning. Children and Youth Services Review, 96, 346-353.
Claesen, M., De Smet, F., Gillard, P., Mathieu, C., & De Moor, B. (2015). Building classifiers to predict the start of glucoselowering pharmacotherapy using Belgian health expenditure data. Clinical Orthopaedics and Related Research, 1-23. Exploring student course transition prediction in online schools using positive and unlabeled data
Da Silva, P. M., Lima, M. N. C. A., Soares, W. L., Silva, I. R. R., Fagundes, R. A. A., & de Souza, F. F. (2019). Ensemble regression models applied to dropout in higher education. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS) (pp. 120-125). IEEE.
De la Varre, C., Irvin, M. J., Jordan, A. W., Hannum, W. H., & Farmer, T. W. (2014). Reasons for student dropout in an online course in a rural K–12 setting. Distance Education, 35(3), 324-344.
Elkan, C., & Noto, K. (2008). Learning classifiers from only positive and unlabeled data. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge discovery and data mining, pp. 213-220.
Fei, M., & Yeung, D. Y. (2015). Temporal models for Predicting Student Dropout in Massive Open Online Courses. In IEEE International Conference on Data Mining Workshop (pp. 256-263). IEEE.
Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2002). Partially supervised classification of text documents. Proceedings of the Nineteenth International Conference on Machine Learning, 2, 387-394.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiis, R., Dubourg, V. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
Rovai, A. P. (2002). Sense of community, perceived cognitive learning, and persistence in asynchronous learning networks. The Internet and Higher Education, 5(4), 319-332.
Takahashi, H., & Komatsugawa, H. (2018). Analysis method of dropout students using student ICT-based data. In The 43rd Annual Conference of JSiSE (pp. 17-18). JSISE.
Wang, W., Yu, H., & Miao, C. (2017). Deep model for dropout prediction in MOOCs. Proceedings of the 2nd International Conference on Crowd Science and Engineering, 26-32.
Zhang, Y., Oussena, S., Clark, T., & Kim, H. (2010). Using data mining to improve student retention in higher education: a case study. In International Conference on Enterprise Information Systems (pp. 190-197). SciTePress.
Submitted date:
01/06/2024
Accepted date:
08/31/2024