Employee Attrition Prediction Using Logistic Regression and Selectkbest: A Case Study on The IBM HR Analytics Employee Attrition and Performance Dataset

Authors

  • Putra Nurhuda Makatita Universitas Teknologi Bandung
  • Gusnaldi Pramudita Universitas Teknologi Bandung
  • Muhammad Roprop Al Muntaha Universitas Teknologi Bandung
  • Ilham Arya Yuda Universitas Teknologi Bandung
  • Rani Rahma Wulan Universitas Teknologi Bandung

Keywords:

Machine learning, Logistic Regression, SelectKBest, ANOVA F-Test

Abstract

Employee attrition is a significant strategic concern for organizations as it directly impacts overall performance, productivity, and long-term sustainability. High attrition rates can lead to increased costs in recruitment and training, a loss of skilled and experienced employees, decreased morale among remaining staff, and disruptions to critical business operations. In response to these challenges, many organizations are turning to predictive analytics to anticipate employee turnover and implement effective retention strategies. This study proposes a machine learning-based approach to predict employee attrition using the Logistic Regression algorithm. Logistic Regression is chosen due to its effectiveness in binary classification tasks and its interpretability, which is essential for human resource (HR) professionals when making data-driven decisions. To enhance the model’s performance, the SelectKBest feature selection technique is applied in conjunction with the ANOVA F-test. This method allows the identification of the most influential features contributing to attrition, helping reduce noise and computational complexity while improving model accuracy. The IBM HR Analytics Employee Attrition & Performance Dataset is used in this study. The dataset contains a variety of demographic and organizational attributes such as age, monthly income, job role, tenure, and job satisfaction. The data undergoes a comprehensive preprocessing phase that includes numerical transformation, encoding of categorical variables, normalization, and the implementation of feature selection. By combining Logistic Regression with effective feature selection, this research aims to deliver an accurate and interpretable predictive model. The results are expected to help HR departments proactively identify high-risk employees and take strategic actions to reduce attrition, ultimately supporting better workforce planning and organizational stability.

References

Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Springer.

Breiman, Leo, Jerome H Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification and Regression Trees. Chapman & Hall.

Gerhart, Barry, and Sara L Rynes. 2003. Compensation, Organizations, and Managerial Performance: An Integrative Approach. McGraw-Hill Education. https://doi.org/10.4135/9781452229256.

Géron, Aurélien. 2019. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. 2nd ed. O’Reilly Media.

Guyon, Isabelle, and André Elisseeff. 2003. “An Introduction to Variable and Feature Selection.” Journal of Machine Learning Research 3:1157–82.

Han, Jiawei, Micheline Kamber, and Jian Pei. 2011. Data Mining: Concepts and Techniques. 3rd ed. Morgan Kaufmann.

Hom, Peter W, Rodger W Griffeth, and Leigh R Williams. 2017a. Turnover and Retention: Current Research and Future Directions. Cambridge University Press.

Hom, Peter W, Rodger W Griffeth, and Lisa M Williams. 2017b. “Losing Your Best: Turnover Intentions and Employee Performance.” Academy of Management Journal 60 (1): 150–74.

Hosmer, David W, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied Logistic Regression. 3rd ed. Wiley.

IBM. 2017. “IBM HR Analytics Employee Attrition & Performance.”

Kelleher, John D., Brian Mac Namee, and Aoife D’Arcy. 2015. Fundamentals of Machine Learning for Predictive Data Analytics. Igarss 2014.

Kluyver, Thomas, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Jonathan Bussonnier, J Frederic, and Chris Willing. 2016. “Jupyter Notebooks – a Publishing Format for Reproducible Computational Workflows.” In Positioning and

Power in Academic Publishing: Players, Agents and Agendas, 87–90. IOS Press.

Krishna, Shobhanam, and Sumati Sidharth. 2022. “HR Analytics: Employee Attrition Analysis Using Random Forest.” International Journal of Performability Engineering 18 (4): 275–81. https://doi.org/10.23940/ijpe.22.04.p5.275281.

Lee, Thomas W, and Terence R Mitchell. 1994. “An Alternative Approach: The Unfolding Model of Voluntary Employee Turnover.” Academy of Management Review 19 (1): 51–89. https://doi.org/10.2307/258835.

Mining, Tanagra Data. n.d. “Logistic Regression Module.”

Mozaffari, Fatemeh, Marzieh Rahimi, Hamidreza Yazdani, and Babak Sohrabi. 2023. “Employee Attrition Prediction in a Pharmaceutical Company Using Both Machine Learning Approach and Qualitative Data.” Benchmarking 30 (10): 4140–73. https://doi.org/10.1108/BIJ-11-2021-0664.

Mozaffari, Fatemeh, Marzieh Rahimi, Hamidreza Yazdani, Babak Sohrabi, Aurélien Géron, Jupyter Development Team, Christopher M Bishop, et al. 2017. 1.13. Feature Selection. Academy of Management Review. 3rd ed. Vol. 19. Cambridge University Press. https://doi.org/10.2307/258835.

Raschka, Sebastian, and Vahid Mirjalili. 2019. Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-Learn, and TensorFlow 2. 3rd ed. Packt Publishing.

Sammut, Claude, and Geoffrey I Webb. 2011. Encyclopedia of Machine Learning. Springer.

Team, Jupyter Development. n.d. “Jupyter Widgets (Ipywidgets) Documentation.

Downloads

Published

2026-02-15

How to Cite

Makatita, P. N., Pramudita, G., Al Muntaha, M. R., Yuda, I. A., & Wulan, R. R. (2026). Employee Attrition Prediction Using Logistic Regression and Selectkbest: A Case Study on The IBM HR Analytics Employee Attrition and Performance Dataset. Journal of Artificial Intelligence Computer Engineering Science and Technology, 1(1), 1–9. Retrieved from https://journals.eduped.org/index.php/jaicest/article/view/1649

Issue

Section

Articles