Udayana Networking | Universitas Udayana

Journal article

I Made Arta Purniawan GUSTI MADE ARYA SASMITA I Putu Agus Eka Pratama

Volume : 3 Nomor : 1 Published : 2022, April

JITTER Jurnal Ilmiah Teknologi dan Komputer

Abstrak

News clustering aims to identify each news group that is formed from the implementation of the K-Means method which is based on the word weighting process using the TF-IDF (Term Frequency Inverse Document Frequency) Algorithm. The clustering process uses news crawled from the detik.com site for a period of one year (2018), totaling 124,509 news stories and stored in the form of a CSV (Comma Seperated Value) file. Before carrying out the clustering process, the previous dataset must go through a text-processing stage in the form of: case folding, tokenizing, stopword removal, and stemming. The TF-IDF and K-Means methods are used for the clustering process. The TF-IDF method assigns weights to each keyword in each category to find the similarity of keywords to the available categories, then continues with the K-Means Method for the grouping process based on similar characteristics / similarities between documents. In the process, there are two implementations of the K-Means method, each using 16 centroids and 12 centroids. This is because in the first process, there are groups / clusters that cannot be identified because they contain common words, so a second implementation is needed. Based on the results of testing on 124,509 news stories, there are 27 news groups that have been successfully identified with adequate application capabilities in processing large data.

CLUSTERING BERITA MENGGUNAKAN ALGORITMA TF-IDF DAN K-MEANS DENGAN MEMANFAATKAN SUMBER DATA CRAWLING PADA SITUS DETIK.COM

Abstrak

Related Publication

Penerapan Core App Quality pada Aplikasi Manajemen Laundry Berbasis Android

Smart Security Risk Management pada Bali Smart Island menggunakan OSINT, OTGv4.2, dan ISO 31000 2018

Data Mining Association Rules Menggunakan Algoritma Apriori untuk Menemukan Pola Pembelian Wisatawan pada Pasar Seni Guwang Bali

Pengujian High Availability pada Asynchronous DNS Berbasis Restknot menggunakan Algoritma Round Robin

Related Publication

Penerapan Core App Quality pada Aplikasi Manajemen Laundry Berbasis Android

I Wayan Ananta Radityawan ANAK AGUNG KETUT AGUNG CAHYAWAN WIRANATHA I Putu Agus Eka Pratama

Jurnal Nasional... 2023
Jurnal Nasional... 2023

Smart Security Risk Management pada Bali Smart Island menggunakan OSINT, OTGv4.2, dan ISO 31000 2018

I Putu Agus Eka Pratama

Jurnal Nasional Terakreditasi ... 2023
Jurnal Nasional Terakreditasi ... 2023

Data Mining Association Rules Menggunakan Algoritma Apriori untuk Menemukan Pola Pembelian Wisatawan pada Pasar Seni Guwang Bali

I Putu Agus Eka Pratama

Jurnal Nasional Terakreditasi ... 2023
Jurnal Nasional Terakreditasi ... 2023

Pengujian High Availability pada Asynchronous DNS Berbasis Restknot menggunakan Algoritma Round Robin

I Putu Agus Eka Pratama Putu Veda Andreyana Putu Ramaditya Nurjana

Jurnal Nasional Terakreditasi ... 2024
Jurnal Nasional Terakreditasi ... 2024