Bilkent Turkish Writings Dataset

This dataset contains the turkish creative writings of Turkish 101 and Turkish 102 courses between 2014-2018. It contains 4 publicly published writings of students 2 for each course. The writings in this dataset promotes creativity, content, composition, grammar, spelling and punctuation.

The writings can be found here as bunch of PDFs.

The dataset is continuously growing since each semester new texts are published publicly.

Currently, there are 6,844 writings in this dataset which is 33.1MB of data in a csv file.

Description of Turkish 101 & 102 in Bilkent University

This course is the first of a sequence of two courses designed to develop creative writing skills of the students through their own writings in Turkish. It is an active learning course. Students write their own blogs and instructors comment and send feedback about the creativity, content, composition, grammar, spelling and punctuation of the writing regularly.

Downloading the dataset

The data can be found in ./data/texts.csv.

git clone https://github.com/selimfirat/bilkent-turkish-writings-dataset.git
mv ./bilkent-turkish-writings-dataset/data/texts.csv <TARGET_PATH>

How to scrape from scratch

git clone https://github.com/selimfirat/bilkent-turkish-writings-dataset.git
pip install -r requirements.txt
cd bilkent-turkish-writings-dataset/scraper
scrapy crawl bilkent_turkish_writings
cd ../
python convert_to_text.py

In the end, there will be ~2GB of PDFs(it worth to continuous crawling & preprocessing) which can be deleted after the converting to text is done. The last two line suggested to executed using this notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
scraper		scraper
.gitignore		.gitignore
README.md		README.md
convert_to_text.ipynb		convert_to_text.ipynb
convert_to_text.py		convert_to_text.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bilkent Turkish Writings Dataset

Description of Turkish 101 & 102 in Bilkent University

Downloading the dataset

How to scrape from scratch

About

Releases

Packages

Languages

selimfirat/bilkent-turkish-writings-dataset

Folders and files

Latest commit

History

Repository files navigation

Bilkent Turkish Writings Dataset

Description of Turkish 101 & 102 in Bilkent University

Downloading the dataset

How to scrape from scratch

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages