Implementation of Weighted Tree Similarity and Cosine Sorensen-Dice Algorithms for Semantic Search in Document Repository Information System

Amrullah, Abdurrosyiid (2021) Implementation of Weighted Tree Similarity and Cosine Sorensen-Dice Algorithms for Semantic Search in Document Repository Information System. Journal Of Development Research, 5 (1). pp. 12-19. ISSN 2579-9347

[img] Text
Persetujuan Publikasi Jurnal.pdf

Download (272kB)
[img] Text (Artikel Publikasi)
143
Restricted to Repository staff only until 14 June 2027.

Download (20kB)
Official URL: http://journal.unublitar.ac.id/jdr/index.php/jdr/a...

Abstract

Document search has several approaches, including full-text search, plain metadata search and se-mantic search. This study uses the Weighted Tree Similarity algorithm with the Cosine Sorensen-Dice algorithm to calculate the semantic search similarity. In this study, document metadata is repre-sented in the form of a tree that has labeled nodes, labeled branches and weighted branches. The simi-larity calculation on the subtree edge label uses Cosine Sorensen-Dice, while the total similarity of a document uses the weighted tree similarity. The metadata structure of the document uses the taxono-my owner, description, title, disposition content and type. The result of this research is a document search application with taxonomic weight on file storage. From the experimental results combination Weighted Tree Similarity method with Tanimoto Cosine has an average recall of 58%, 88% preci-sion, and 83% accuracy, while the combination of Weighted Tree Similarity with Cosine Sorensen-Dice has an average recall value of 66%, precision 88%. and accuracy 85%. Combination of Weighted Tree Similarity with Cosine Sorensen-Dice has better than the combination of Weigthed Tree Similarity with Tanimoto Cosine for search documents at the University of Muhammadiyah Gresik with a average recall value of 66% and an average accuracy of 85%. Similarity value on text labels using Cosine Sorensen-Dice is also influenced by the number of terms and documents in the repository.

Item Type: Article
Uncontrolled Keywords: weighted tree similarity; semantic search; cosine similarity; sorensen dice similarity;
Subjects: Engineering > Informatics Engineering
Engineering
Divisions: Faculty of Engineering > Informatics Engineering Study Program
Depositing User: abdurrosyiid amrullah
Date Deposited: 15 Jun 2022 04:05
Last Modified: 15 Jun 2022 04:05
URI: http://eprints.umg.ac.id/id/eprint/6020

Actions (login required)

View Item View Item