Udayana Networking | Universitas Udayana

Journal article

Ni Made Ayu Widiastuti KETUT ARTAWA I MADE RAJEG I Nyoman Udayana Gede Primahadi Wijaya Rajeg

Volume : 6 Nomor : 1 Published : 2024, June

THE INTERNATIONAL JOURNAL OF SOCIAL SCIENCES WORLD (TIJOSSW)

Abstrak

This study aims to show the availability of corpus files containing tokens generated from the search of the keywords and to select representative data samples based on the availability of the tokens. This paper reported the case study of corpus linguistics data collection procedures as careful considerations. The Leipzig Corpora Collection (LCC) in Indonesian is the data source. The corpus sampling frame was adapted by considering the structured guidelines used to define the criteria for selecting balanced and representative samples. There are two main procedures to get the representative data samples, i) to know the availability of corpus files containing tokens generated from the search of the keywords; and ii) to select the representative data samples based on the availability of the tokens. The results are ten files in ten years between 2013–2022, each of them has ±1,000 linguistic expressions as the balanced and representative data samples. Kata kunci: representative sample, target domain, availability, procedures, corpus

Selecting Representative Data Samples from Corpus Collection with Specific Target Domains

Abstrak

Related Publication

A Preliminary Experimental Study on Inherent Association of Verbs to Specific Nouns

Exploring Grammatical and Semantic Profiles of ANGRY and MAD - A Corpus-Based Study

The (non)canonical status of the ka- passive in Balinese

Selecting Representative Data Samples from Corpus Collection with Specific Target Domains

Related Publication

A Preliminary Experimental Study on Inherent Association of Verbs to Specific Nouns

I Gede Semara Dharma Putra Gede Primahadi Wijaya Rajeg

Jurnal Nasional Terakreditasi ... 2024
Jurnal Nasional Terakreditasi ... 2024

Exploring Grammatical and Semantic Profiles of ANGRY and MAD - A Corpus-Based Study

IDA AYU SASKARA TRANGGANA SUARI Gede Primahadi Wijaya Rajeg I NENGAH SUDIPA

Jurnal Nasional... 2024
Jurnal Nasional... 2024

The (non)canonical status of the ka- passive in Balinese

I Nyoman Udayana Gede Primahadi Wijaya Rajeg Ida Ayu Made Puspani

Jurnal Internasional Terindeks... 2025
Jurnal Internasional Terindeks... 2025

Selecting Representative Data Samples from Corpus Collection with Specific Target Domains

Ni Made Ayu Widiastuti KETUT ARTAWA I MADE RAJEG I Nyoman Udayana Gede Primahadi Wijaya Rajeg

Jurnal internasional terindeks... 2024
Jurnal internasional terindeks... 2024