Klasifikasi Pendapatan Menggunakan Algoritma Random Forest: Studi Kasus Dataset Adult Income
DOI:
https://doi.org/10.36982/jiig.v16i2.5407Abstract
This research aims to classify a person's income based on demographic attributes using Random Forest algorithm, which is one of the popular ensemble learning methods in the field of machine learning. The dataset used is Adult Income from the UCI Machine Learning Repository, which consists of more than 32 thousand data with 15 attributes such as age, gender, education, education level, employment type, marital status and others. The research process includes data preprocessing, model pipeline creation, training, and performance evaluation. Preprocessing was done through the removal of irrelevant attributes, normalization of numerical data, and application of one-hot encoding on categorical data. The model was trained with default parameters and evaluated using accuracy, precision, recall, F1-score, and confusion matrix metrics. The evaluation results show that the model achieved an accuracy of 85.44%, with higher performance in classifying income classes ≤50K than >50K. The low recall value in the >50K class indicates that the model tends to be biased towards the majority class, which could be caused by data imbalance. Therefore, it is necessary to improve the model through hyperparameter tuning techniques, handling data imbalance, or exploring other algorithms such as Gradient Boosting. This research is expected to be the basis for developing accurate and applicable data-based prediction systems in the fields of economics, policy planning, and decision support systems that require analysis of individual income potential.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
