CNLI-TR Augmentation Pipeline

Summary

This repository contains text augmentation pipeline for the seed data of CNLI-TR.

Introduction

CNLI-TR is a challenge dataset in Turkish created to assess natural language inference (NLI) abilities of language models. It contains sentence triplets: One potentially De Re De Dicto ambiguous sentence, one De Re paraphrase, and one De Dicto paraphrase.

The entirety of seed data is manually generated by trained linguists who are native speakers of Turkish.

The code and resources in this repository is used to augment the seed data to create a large, manually corrected NLI dataset.

Unique sentence id generator (id_generator.py): Each sentence in seed data set and augmented data set has a unique alphanumeric identifier. Sentence IDs consist of three letters followed by an underscore and a 5-digit number. The initial letter in IDs of seed data indicates the contributor that wrote the sentence. An algorithm that generates random numbers and strings was used to create these sentence IDs.

Augmentation pipeline (): The augmentation pipeline uses seed data to generate sentence triplets. Details will be revealed soon.

=== Machine-readable metadata ================================
Data available since: 11.2022
License: CC BY-SA 4.0
Includes text: yes
Contributors: Marşan, Büşra; Atlamaz, Ümit; Demirok, Ömer; Kuzgun, Aslı; 
Oksal, Ceren; Doğan, Merve; Gök, Serra; Korkmaz, Arda
Contact: busra.marsan@boun.edu.tr 
===============================================================================

https://isimbulamadim.com/ ↩
"f" for feminine, "m" for masculine, "u" for unisex. ↩
"ar" for Arabic, "ge" for Georgian, "gr" for Greek, "hb" for Hebrew, "mg" for Mongolian, "pr" for Persian, "tr" for Turkish, and "n/a" for unknown origins. Please note that some names were recorded as having two origins, i.e. ar-tr. ↩
intensional operator: Any expression O that combines with sentences φ to form well-formed expressions (usually sentences) Oφ and whose extension [[O]][^M,i] at an index i in a model M takes sentential intensions, i.e. functions from indices to truth values. (Wehmeier, K. F. (2018). Are quantifiers intensional operators?. Inquiry.) ↩

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
augmentation		augmentation
contributor-guidelines		contributor-guidelines
id-generator		id-generator
knn-rf		knn-rf
nli-model		nli-model
seed-data		seed-data
turkish-name-scraper		turkish-name-scraper
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNLI-TR Augmentation Pipeline

Summary

Introduction

Contents

About

Releases

Packages

Languages

License

iambusra/CNLI-TR_augmentation

Folders and files

Latest commit

History

Repository files navigation

CNLI-TR Augmentation Pipeline

Summary

Introduction

Contents

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages