semeval-2016-2017-task3-subtaskA-unannotated
menshikh-iv
released this
05 Feb 16:35
·
4 commits
to master
since this release
SemEval 2016 / 2017 Task 3 Subtask A unannotated dataset contains 189,941 questions and 1,894,456 comments in English collected from the Community Question Answering (CQA) web forum of Qatar Living. These can be used as a corpus for language modelling.
Related issue #18
attribute | value |
---|---|
File size | 224MB |
Number of records | 189941 |
Read more:
- http://alt.qcri.org/semeval2017/task3/
- http://alt.qcri.org/semeval2017/task3/data/uploads/semeval2017-task3.pdf
Produced by: https://github.com/Witiko/semeval-2016_2017-task3-subtaskA-unannotated-english
Example:
import gensim.downloader as api
for thread in api.load("semeval-2016-2017-task3-subtaskA-unannotated"):
print("Question subjects: {}\n".format(thread["RelQuestion"]["RelQSubject"]))
print("Question body: {}\n".format(thread["RelQuestion"]["RelQBody"]))
print("Relevat comments: ")
for idx, relcomment in enumerate(thread["RelComments"]):
print("\t#{}: {}\n".format(idx + 1, relcomment["RelCText"]))
break
"""
Output:
Question subjects: Thailand:IT Minsitry blocks CNN; Facebook;
Question body: The state of Internet in Thailand:IT Minsitry blocks CNN; Facebook; Yahoo; Flickr Thai Immigration website listed as dangerousFull story: http://www.thaivisa.com/forum/Thai-Govt-Blocks-Cnn-Yahoo-Financ-t321851.html
Relevat comments:
#1: have they blocked porn??? <img src="http://www.qatarliving.com/files/images/Da.gif">
#2: like trying to contain a tsunami with a hand towel ************************************ I'm Jack's complete lack of surprise
#3: oops double post.. ----------------- "HE WHO DARES WINS" Derek Edward Trotter
#4: What next they gonna ban all *** tourist from entering the country? ----------------- "HE WHO DARES WINS" Derek Edward Trotter
#5: Or you can always make your own there with some thai babys Rules are a guideline for intelligent people; but they must be adhered to by idiots.
#6: why CNN? they want to die ignorant of what happens around?
"""