Journal article

Working with a Linguistic Corpus using R An Introductory Note with Indonesian Negating Construction

Gede Primahadi Wijaya Rajeg Karlina Denistia I MADE RAJEG

Volume : 36 Nomor : 1 Published : 2018, February

Jurnal Ilmiah Masyarakat Linguistik Indonesia

Abstrak

This paper demonstrates the use of R for a unified data science in corpus linguistics via a series of corpus-based analyses on Indonesian Negating Construction. The data is based on c17-million word-tokens of an online-news corpus, a part of the Indonesian Leipzig Corpora. We identified that tidak is the most frequent form in our corpus. Next, we found that tak has significantly higher type frequency for negated-predicates with [ter-X-kan] schema compared to tidak; this finding provides a quantitative nuance against a description in an Indonesian reference grammar, stating that (i) in present-day Indonesian tidak is also common to negate ter- related predicates, while (ii) the compulsoriness of tak to negate ter- predicates is a past usage. Lastly, we refine our second finding by applying Distinctive Collexeme Analysis to determine that tak strongly collocates with specific verbs predominantly in the [ter-X-kan] schema compared to tidak; this finding offers a deeper characterisation for tidak and tak. Keywords: R programming language; Quantitative Corpus Linguistics; Distinctive Collexeme Analysis; Indonesian Negating Constructions