Similarity Identification Model of Thesis Titles with Mahalanobis Distance Approach
DOI:
https://doi.org/10.36982/jseci.v3i01.5413Keywords:
Thesis, Mahalanobis Distance, identification, threshold, TF-IDFAbstract
This study aims to identify the similarity of thesis titles by applying the Mahalanobis Distance method which is known to be effective in measuring the distance between vectors by considering data distribution and correlation between variables. In its implementation, each thesis title is represented in vector form using the TF-IDF scheme before calculating the level of similarity using Mahalanobis Distance. The test results show that this method is able to produce similarity values between titles, but its performance has not shown optimal effectiveness in the context of similarity classification. The highest precision value obtained of 1.0 indicates that this method is quite reliable in identifying pairs of titles that are truly similar. However, the low recall value of only 0.5 indicates that there are many pairs of similar titles that fail to be detected, resulting in an F1-score value of only 0.638. This shows an imbalance between the system's ability to detect similarity and its classification accuracy. Although the accuracy value is relatively high, ranging from 0.958 to 0.988, these results do not necessarily reflect the overall effectiveness of the method in handling minor classification errors. Testing of the threshold parameters also shows that a value of 0.1 provides the best performance compared to other threshold values because it is able to maintain a balance between precision, recall, F1-score, and accuracy.
