Perbandingan Akurasi Metode Principal Component Analysis (PCA) dan Correlation-Based Feature Selection (CFS) Pada Klasifikasi Perpanjangan Kontrak Karyawan Menggunakan Metode Naïve Bayes
DOI:
https://doi.org/10.36982/jiig.v13i2.2292Abstract
PT. Oasis Waters International Palembang conducts regular staff performance reviews, the findings of which are utilized to make recommendations for employee contract extension. The Human Resource Department has assigned a numerical value to 25 qualities (HRD). The process of giving a label or class to a number of examples when the value of each characteristic is known as classification. The Naïve Bayes technique is a basic classification approach that makes use of probability estimates. Based on the observations, it was discovered that one of the 25 criteria was deemed the most relevant in determining the recommendation for an employee contract renewal. As a result, in this study, a comparison of the pre-processing Principal Component Analysis (PCA) approach and the Correlation-based Feature Selection (CFS) method on the categorization of employee contract extensions at PT Oasis Waters International Palembang will be performed. According to the data, the CFS approach has a positive influence on classification performance, while PCA does not. This is demonstrated by a 30% increase in accuracy when utilizing the CFS approach. Meanwhile, both strategies have a positive influence on the model's dependability. This is demonstrated by a reduction in Root Mean Square Error (RMSE) when using the CFS approach from 0.6325 to 0.1845, whereas using the PCA method results in 0.5123.
Keywords : Naïve Bayes, Principal Component Analysis, Correlation-based Feature Selection, Confusion Matrix, Root Mean Square Error
References
Beniwal, S., & Arora, J. (2012). Classification and Feature Selection Techniques in Data Mining. International Journal of Engineering Research & Technology (IJERT), 1(6), 1–6.
Defiyanti, S. (2017). Integrasi Metode Clustering dan Klasifikasi untuk Data Numerik. Citee, July, 256–261.
Djatna, T., & Morimoto, Y. (2008). Pembandingan Stabilitas Algoritma Seleksi Fitur Menggunakan Transformasi Ranking Normal. Jurnal Ilmiah Ilmu Komputer, 6(2), 245006.
Doshi, M., & Chaturvedi, S. K. (2014). Correlation Based Feature Selection (CFS) Technique to Predict Student Perfromance. International Journal of Computer Networks & Communications, 6(3), 197–206. https://doi.org/10.5121/ijcnc.2014.6315
Hakimah, M., & Muhimah, R. R. (2021). Klasifikasi Penderita Penyakit Jantung Menggunakan Metode Naive Bayes dengan Chi-Square untuk Pemilihan Atribut. Seminar Nasional Teknik Elektro, Sistem Informasi Dan Teknik Informatika, 1, 257–262.
Hall, M. A. (1999). Correlation-based Feature Selection for Machine Learning. April.
Ifriza, Y. N., & Sam, M. (2021). Irrigation management of agricultural reservoir with correlation feature selection based binary particle swarm optimization. Journal of Soft Computing Exploration, 2(1), 40–45. https://doi.org/10.52465/joscex.v2i1.23
Jakob, R. (2016). Disease Classification. International Encyclopedia of Public Health, 332–337. https://doi.org/10.1016/B978-0-12-803678-5.00116-8
K, Gupta, G. (2014). Introduction to Data Mining with Case Studies (Third Edit).
Karegowda, A. G., Manjunath, A. S., Ratio, G., & Evaluation, C. F. (2010). Comparative study of Attribute Selection Using Gain Ratio. International Journal of Information Technology and Knowledge and Knowledge Management, 2(2), 271–277. https://pdfs.semanticscholar.org/3555/1bc9ec8b6ee3c97c524f9c9ceee798c2026e.pdf%0Ahttp://csjournals.com/IJITKM/PDF 3-1/19.pdf
Khan, M. A., Akram, T., Sharif, M., Alhaisoni, M., Saba, T., & Nawaz, N. (2021). A probabilistic segmentation and entropy-rank correlation-based feature selection approach for the recognition of fruit diseases. Eurasip Journal on Image and Video Processing, 2021(1). https://doi.org/10.1186/s13640-021-00558-2
Kira, K., & Rendell, L. A. (1992). A Practical Approach to Feature Selection. In Machine Learning Proceedings 1992. Morgan Kaufmann Publishers, Inc. https://doi.org/10.1016/b978-1-55860-247-2.50037-1
Kusrini, S. E. D. A. (2017). Algoritma K-Means untuk Diskretisasi Numerik Kontinyu Pada Klasifikasi Intrusion Detection System Menggunakan Naive Bayes. Konferensi Nasional Sistem & Informatika, 61–66.
Myoelectric, P., Hudgins, B., Control, P. M., Hargrove, L. J., Li, G., Member, S., Englehart, K. B., & Member, S. (2009). Principal Components Analysis Preprocessing for Improved Classification Accuracies in Principal Components Analysis Preprocessing for Improved Classification Accuracies in. 56(October 2016), 1407–1414.
Noya van Delsen, M. S., Wattimena, A. Z., & Saputri, S. (2017). Penggunaan Metode Analisis Komponen Utama Untuk Mereduksi Faktor-Faktor Inflasi Di Kota Ambon. BAREKENG: Jurnal Ilmu Matematika Dan Terapan, 11(2), 109–118. https://doi.org/10.30598/barekengvol11iss2pp109-118
Nurul Yusufiyah, H. K., & Gya Nur Rochman, J. P. (2021). Efektivitas Penggunaan Seleksi Ciri CFS pada Klasifikasi Ciri Bentuk Nodul Kanker Payudara dengan Citra Ultrasonografi. Physics Education Research Journal, 3(1), 11–18. https://doi.org/10.21580/perj.2021.3.1.6667
Pramadhana, D. (2021). Klasifikasi Penyakit Diabetes Menggunakan Metode CFS dan ROS dengan Algoritma J48 Berbasis Adaboost. Edumatic: Jurnal Pendidikan Informatika, 5(1), 89–98. https://doi.org/10.29408/edumatic.v5i1.3336
Purbasari, I. Y., Nugroho, B., & Implementasi, D. A. N. (2013). Benchmarking Algoritma Pemilihan Atribut Pada Klasifikasi Data Mining. Snastia, 47–54.
Ranjan, B., Sun, W., Park, J., Mishra, K., Schmidt, F., Xie, R., Alipour, F., Singhal, V., Joanito, I., Honardoost, M. A., Yong, J. M. Y., Koh, E. T., Leong, K. P., Rayan, N. A., Lim, M. G. L., & Prabhakar, S. (2021). DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data. Nature Communications, 12(1), 1–12. https://doi.org/10.1038/s41467-021-26085-2
Rish, I. (2001). An Empirical Study of The Naive Bayes Classifier. 41–46. https://doi.org/10.1039/b104835j
Widagdo, K. A., Adi, K., & Gernowo, R. (2020). Kombinasi Feature Selection Fisher Score dan Principal Component Analysis (PCA) untuk Klasifikasi Cervix Dysplasia. Jurnal Teknologi Informasi Dan Ilmu Komputer, 7(3), 565. https://doi.org/10.25126/jtiik.2020702987
Zhang, S., Zhang, C., & Yang, Q. (2003). Data Prepartion for Data Mining. Appl. Artif. Intel., 17(5–6), 375–381. https://doi.org/10.1080/08839510390219264
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.