Keystroke-Essay-Score-Prediction

This is an independent prediction project using data from this database: https://www.kaggle.com/competitions/linking-writing-processes-to-writing-quality/overview

Data dictionary

Source: https://www.kaggle.com/code/alexia/kerasnlp-starter-notebook-writing-quality

Let's dicover the data which are composed of :

events
activities
scores

Events

Column	Definition
essay_id_comp	The unique ID of the essay
event_id	The index of the event, ordered chronologically
down_time	The time of the down event in milliseconds
up_time	The time of the up event in milliseconds
action_time	The duration of the event (the difference between down_time and up_time)
activity	The category of activity which the event belongs to
down_event	The name of the event when the key/mouse is pressed
up_event	The name of the event when the key/mouse is released
text_change	The text that changed as a result of the event (if any)
cursor_position	The character index of the text cursor after the event
word_count	The word count of the essay after the event

Activities

Activity Name	Definition
Nonproduction	The event does not alter the text in any way
Input	The event adds text to the essay
Remove/Cut	The event removes text from the essay
Paste	The event changes the text through a paste input
Replace	The event replaces a section of text with another string
Move From [x1, y1] To [x2, y2]	The event moves a section of text spanning character index x1, y1 to a new location x2, y2

Scores

Column	Definition
essay_id_comp	The unique ID of the essay
score	The score the essay received out of 6 (the prediction target for the competition)

The goal is to analyze key and mouse stroke patterns and predict an essay score assigned by graders, indicative of the writing's quality. I chose this project because:

I'm interested in keystrokes as biometric verification & for verification of academic integrity, and this is a good way to familizarize myself with methods to analyze this kind of data
I would like to work on prediction projects where I must generate features myself from raw data

The jupyter notebook contains sections of code organized into exploratory data analysis, rudimentary prediction, and a final pipeline that uses various machine learning models, followed by a concluding analysis of prediction accuracy from each of the methods. A pdf report will be uploaded from overleaf at the project's conclusion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Keystroke-Essay-Score-Prediction

Data dictionary

Files

README.md

Latest commit

History

README.md

File metadata and controls

Keystroke-Essay-Score-Prediction

Data dictionary