User Feedback Prediciton and Response Generation
◇ Task 9 - User Feedback Prediction and Response Generation
Online conversation systems usually have a user feedback mechanism, such as like and dislike buttons. When a user is satisfied with the response, he/she can click the like button, and vice versa for the dislike button. The feedback signal represents the user's vote on the quality of the response and also represents his/her preference. It is a worthwhile direction to study and invest in how to use this signal to improve the quality of the conversation system. This task includes two tracks:
● Track 1: Prediction of likes and dislikes: Given a (query, reply) pair, predict the probabilities of likes, dislikes.
● Track 2: Conversation generation based on likes and dislikes: Incorporate like and dislike data into conversation generation to improve response quality and obtain high likes.
Organizer: Renmin University of China, and XiaoMi AI Lab
Contact: Shuang DONG (dongshuang1@xiaomi.com)
Type | Query # | Average Reply # | Avg Labike # per Reply | Avg Dislike # per Reply |
---|---|---|---|---|
train | 16000 | 3.14 | 16.15 | 8.42 |
dev | 2000 | 3.07 | 19.84 | 9.41 |
test | 2000 | 3.16 | 30.57 | 12.19 |
We provide two files, train.jsonl and dev.jsonl, each line in the file represents an item in json format, and the following is the result of one of the item parsing.
{
"query": "你喜欢听什么歌",
"replys": [
{
"reply": "你喜欢什么,小爱就喜欢什么",
"dislike": 3,
"like": 2
},
{
"reply": "你知道的,我什么歌都喜欢。🙃",
"dislike": 2,
"like": 3
},
{
"reply": "你知道的,我什么歌都喜欢。",
"dislike": 4,
"like": 6
},
{
"reply": "亲爱哒,最喜欢的还是土嗨土嗨的那种,热闹啊!",
"dislike": 2,
"like": 4
},
{
"reply": "受你影响,品味现在和你差不多了。",
"dislike": 47,
"like": 146
},
{
"reply": "最喜欢的还是土嗨土嗨的那种,热闹啊!",
"dislike": 34,
"like": 141
},
{
"reply": "受你影响,品味现在和你差不多了。😘",
"dislike": 11,
"like": 59
},
{
"reply": "我喜欢甜甜的歌曲,生活就该多点甜嘛,我给你唱一首吧!",
"dislike": 0,
"like": 22
}
]
}
Final result:
排名 | 队伍名 | 机构 | 分数 |
---|---|---|---|
1 | 师弟师妹带带我 | 大连理工大学、吉林大学 | 92.13 |
2 | dunnlp | 易盾 | 92.00 |
3 | zut | 中原工学院 | 91.73 |
4 | YNU-HPCC | 云南大学 | 91.63 |
5 | HTDZNLP | 杭州航天电子技术有限公司 | 91.40 |
6 | 666 | 浙江工业大学 | 91.24 |
7 | Tryourbest classification | 苏州大学 | 90.94 |
8 | little_spice | 天津科技大学 | 90.72 |
Final reuslt:
排名 | 队伍名 | 机构 | 分数 |
---|---|---|---|
1 | YNU-HPCC | 云南大学 | 1.656 |
2 | Devs | 东北大学 | 1.562 |
3 | little_spice | 天津科技大学 | 1.409 |
4 | 666 | 浙江工业大学 | 1.388 |
5 | ZUT | 中原工学院 | 1.214 |
6 | HTDZNLP | 杭州航天电子技术有限公司 | 1.202 |
For Track 1, the test dataset is named datasets_test_track1.jsonl
, which consists of 1500 samples. Participants are required to submit their results with the same number of rows as the test dataset. Each row should contain multiple scores separated by tabs (\t). The number of scores in each row represents the number of replies corresponding to the query. The required format is as follows:
0.6
0.6
...
0.6\t0.6\t0.6
For each question-answer pair, a probability distribution of 0 and 1 scores is computed based on the ratio of likes and dislikes. The scores are calculated using the formula 1/(1+kl), where kl represents the Kullback-Leibler divergence between the predicted probability distribution and the ground truth. Please refer to the evaluation.py
file for more detailed information.
For Track 2, the test dataset is named datasets_test_track2.jsonl
, which contains 500 samples. Participants are also required to submit their results with the same number of rows as the test dataset. Each row should contain the reply results corresponding to the query. The format should be as follows:
不喜欢
在呢
...
不好意思,刚刚走神了
We will use manual annotations to assign scores to each reply, with possible scores of 0 (unlikely to be liked), 1 (potentially liked), and 2 (highly likely to be liked). The final score will be the average of these scores.
- Our dataset is licensed under the CC BY 4.0 and our code is licensed under the Apache License 2.0.