the processing of the dataset #5

yjcreation · 2022-02-07T14:34:13Z

The ASSISTment 09-10 dataset has a field order_id, which is explained on the official website as: these id's are chronological, and refer to the id of the original problem log.

So for the processing of this dataset, after grouping by user_id, should we sort by 'order_id', otherwise it will destroy the chronological order of each user's answer. Although the preprocessing part constructs the timestamp, it cannot completely guarantee the user's question order. After each user's problem is sorted by order_id, the result of the program run has changed.

THUwangcy · 2022-02-08T08:31:03Z

It is probably a potential issue. We did not notice the order_id field, and we assumed the original order in the dataset is already chronological. Maybe we should rerun the experiments on this dataset.

yjcreation · 2022-02-08T10:52:55Z

1.We rerun the experiments on this dataset and obtained the following results：

2.The ASSISTment 2012 dataset was sorted by 'timestamp': (int(start) + int(end)) // 2, this may destroy the seriality of the data. For example, one data : start = 1, end = 7, so timestamp = 4; another data : start =2, end = 4, so timestamp = 3. Wouldn't it make more sense to sort the dataset by start_time or end_time?
@THUwangcy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the processing of the dataset #5

the processing of the dataset #5

yjcreation commented Feb 7, 2022 •

edited

Loading

THUwangcy commented Feb 8, 2022

yjcreation commented Feb 8, 2022 •

edited

Loading

the processing of the dataset #5

the processing of the dataset #5

Comments

yjcreation commented Feb 7, 2022 • edited Loading

THUwangcy commented Feb 8, 2022

yjcreation commented Feb 8, 2022 • edited Loading

yjcreation commented Feb 7, 2022 •

edited

Loading

yjcreation commented Feb 8, 2022 •

edited

Loading