Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the processing of the dataset #5

Open
yjcreation opened this issue Feb 7, 2022 · 2 comments
Open

the processing of the dataset #5

yjcreation opened this issue Feb 7, 2022 · 2 comments

Comments

@yjcreation
Copy link

yjcreation commented Feb 7, 2022

The ASSISTment 09-10 dataset has a field order_id, which is explained on the official website as: these id's are chronological, and refer to the id of the original problem log.

So for the processing of this dataset, after grouping by user_id, should we sort by 'order_id', otherwise it will destroy the chronological order of each user's answer. Although the preprocessing part constructs the timestamp, it cannot completely guarantee the user's question order. After each user's problem is sorted by order_id, the result of the program run has changed.

@THUwangcy
Copy link
Owner

It is probably a potential issue. We did not notice the order_id field, and we assumed the original order in the dataset is already chronological. Maybe we should rerun the experiments on this dataset.

@yjcreation
Copy link
Author

yjcreation commented Feb 8, 2022

1.We rerun the experiments on this dataset and obtained the following results:
image

2.The ASSISTment 2012 dataset was sorted by 'timestamp': (int(start) + int(end)) // 2, this may destroy the seriality of the data. For example, one data : start = 1, end = 7, so timestamp = 4; another data : start =2, end = 4, so timestamp = 3. Wouldn't it make more sense to sort the dataset by start_time or end_time?
@THUwangcy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants