Perbandingan Akurasi Metode Principal Component Analysis (PCA) dan Correlation-Based Feature Selection (CFS) Pada Klasifikasi Perpanjangan Kontrak Karyawan Menggunakan Metode Naïve Bayes

Authors

  • Dewi Sartika Universitas Indo Global Mandiri
  • Imelda Saluza Universitas Indo Global Mandiri
  • Muhammad Haviz Irfani Universitas Indo Global Mandiri

DOI:

https://doi.org/10.36982/jiig.v13i2.2292

Abstract

PT. Oasis Waters International Palembang conducts regular staff performance reviews, the findings of which are utilized to make recommendations for employee contract extension. The Human Resource Department has assigned a numerical value to 25 qualities (HRD). The process of giving a label or class to a number of examples when the value of each characteristic is known as classification. The Naïve Bayes technique is a basic classification approach that makes use of probability estimates. Based on the observations, it was discovered that one of the 25 criteria was deemed the most relevant in determining the recommendation for an employee contract renewal. As a result, in this study, a comparison of the pre-processing Principal Component Analysis (PCA) approach and the Correlation-based Feature Selection (CFS) method on the categorization of employee contract extensions at PT Oasis Waters International Palembang will be performed. According to the data, the CFS approach has a positive influence on classification performance, while PCA does not. This is demonstrated by a 30% increase in accuracy when utilizing the CFS approach. Meanwhile, both strategies have a positive influence on the model's dependability. This is demonstrated by a reduction in Root Mean Square Error (RMSE) when using the CFS approach from 0.6325 to 0.1845, whereas using the PCA method results in 0.5123.

Keywords : Naïve Bayes, Principal Component Analysis, Correlation-based Feature Selection, Confusion Matrix, Root Mean Square Error

Author Biographies

Dewi Sartika, Universitas Indo Global Mandiri

Program Studi Teknik Informatika

Imelda Saluza, Universitas Indo Global Mandiri

Program Studi Manajemen Informatika

Muhammad Haviz Irfani, Universitas Indo Global Mandiri

Program Studi Teknik Informatika

References

Beniwal, S., & Arora, J. (2012). Classification and Feature Selection Techniques in Data Mining. International Journal of Engineering Research & Technology (IJERT), 1(6), 1–6.

Defiyanti, S. (2017). Integrasi Metode Clustering dan Klasifikasi untuk Data Numerik. Citee, July, 256–261.

Djatna, T., & Morimoto, Y. (2008). Pembandingan Stabilitas Algoritma Seleksi Fitur Menggunakan Transformasi Ranking Normal. Jurnal Ilmiah Ilmu Komputer, 6(2), 245006.

Doshi, M., & Chaturvedi, S. K. (2014). Correlation Based Feature Selection (CFS) Technique to Predict Student Perfromance. International Journal of Computer Networks & Communications, 6(3), 197–206. https://doi.org/10.5121/ijcnc.2014.6315

Hakimah, M., & Muhimah, R. R. (2021). Klasifikasi Penderita Penyakit Jantung Menggunakan Metode Naive Bayes dengan Chi-Square untuk Pemilihan Atribut. Seminar Nasional Teknik Elektro, Sistem Informasi Dan Teknik Informatika, 1, 257–262.

Hall, M. A. (1999). Correlation-based Feature Selection for Machine Learning. April.

Ifriza, Y. N., & Sam, M. (2021). Irrigation management of agricultural reservoir with correlation feature selection based binary particle swarm optimization. Journal of Soft Computing Exploration, 2(1), 40–45. https://doi.org/10.52465/joscex.v2i1.23

Jakob, R. (2016). Disease Classification. International Encyclopedia of Public Health, 332–337. https://doi.org/10.1016/B978-0-12-803678-5.00116-8

K, Gupta, G. (2014). Introduction to Data Mining with Case Studies (Third Edit).

Karegowda, A. G., Manjunath, A. S., Ratio, G., & Evaluation, C. F. (2010). Comparative study of Attribute Selection Using Gain Ratio. International Journal of Information Technology and Knowledge and Knowledge Management, 2(2), 271–277. https://pdfs.semanticscholar.org/3555/1bc9ec8b6ee3c97c524f9c9ceee798c2026e.pdf%0Ahttp://csjournals.com/IJITKM/PDF 3-1/19.pdf

Khan, M. A., Akram, T., Sharif, M., Alhaisoni, M., Saba, T., & Nawaz, N. (2021). A probabilistic segmentation and entropy-rank correlation-based feature selection approach for the recognition of fruit diseases. Eurasip Journal on Image and Video Processing, 2021(1). https://doi.org/10.1186/s13640-021-00558-2

Kira, K., & Rendell, L. A. (1992). A Practical Approach to Feature Selection. In Machine Learning Proceedings 1992. Morgan Kaufmann Publishers, Inc. https://doi.org/10.1016/b978-1-55860-247-2.50037-1

Kusrini, S. E. D. A. (2017). Algoritma K-Means untuk Diskretisasi Numerik Kontinyu Pada Klasifikasi Intrusion Detection System Menggunakan Naive Bayes. Konferensi Nasional Sistem & Informatika, 61–66.

Myoelectric, P., Hudgins, B., Control, P. M., Hargrove, L. J., Li, G., Member, S., Englehart, K. B., & Member, S. (2009). Principal Components Analysis Preprocessing for Improved Classification Accuracies in Principal Components Analysis Preprocessing for Improved Classification Accuracies in. 56(October 2016), 1407–1414.

Noya van Delsen, M. S., Wattimena, A. Z., & Saputri, S. (2017). Penggunaan Metode Analisis Komponen Utama Untuk Mereduksi Faktor-Faktor Inflasi Di Kota Ambon. BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 11(2), 109–118. https://doi.org/10.30598/barekengvol11iss2pp109-118

Nurul Yusufiyah, H. K., & Gya Nur Rochman, J. P. (2021). Efektivitas Penggunaan Seleksi Ciri CFS pada Klasifikasi Ciri Bentuk Nodul Kanker Payudara dengan Citra Ultrasonografi. Physics Education Research Journal, 3(1), 11–18. https://doi.org/10.21580/perj.2021.3.1.6667

Pramadhana, D. (2021). Klasifikasi Penyakit Diabetes Menggunakan Metode CFS dan ROS dengan Algoritma J48 Berbasis Adaboost. Edumatic: Jurnal Pendidikan Informatika, 5(1), 89–98. https://doi.org/10.29408/edumatic.v5i1.3336

Purbasari, I. Y., Nugroho, B., & Implementasi, D. A. N. (2013). Benchmarking Algoritma Pemilihan Atribut Pada Klasifikasi Data Mining. Snastia, 47–54.

Ranjan, B., Sun, W., Park, J., Mishra, K., Schmidt, F., Xie, R., Alipour, F., Singhal, V., Joanito, I., Honardoost, M. A., Yong, J. M. Y., Koh, E. T., Leong, K. P., Rayan, N. A., Lim, M. G. L., & Prabhakar, S. (2021). DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data. Nature Communications, 12(1), 1–12. https://doi.org/10.1038/s41467-021-26085-2

Rish, I. (2001). An Empirical Study of The Naive Bayes Classifier. 41–46. https://doi.org/10.1039/b104835j

Widagdo, K. A., Adi, K., & Gernowo, R. (2020). Kombinasi Feature Selection Fisher Score dan Principal Component Analysis (PCA) untuk Klasifikasi Cervix Dysplasia. Jurnal Teknologi Informasi Dan Ilmu Komputer, 7(3), 565. https://doi.org/10.25126/jtiik.2020702987

Zhang, S., Zhang, C., & Yang, Q. (2003). Data Prepartion for Data Mining. Appl. Artif. Intel., 17(5–6), 375–381. https://doi.org/10.1080/08839510390219264

Downloads

Published

2022-08-01

How to Cite

Sartika, D., Saluza, I., & Irfani, M. H. (2022). Perbandingan Akurasi Metode Principal Component Analysis (PCA) dan Correlation-Based Feature Selection (CFS) Pada Klasifikasi Perpanjangan Kontrak Karyawan Menggunakan Metode Naïve Bayes. Jurnal Ilmiah Informatika Global, 13(2). https://doi.org/10.36982/jiig.v13i2.2292

Issue

Section

Articles