Skip to content

semeval-2016-2017-task3-subtaskA-unannotated

Compare
Choose a tag to compare
@menshikh-iv menshikh-iv released this 05 Feb 16:35
· 4 commits to master since this release

SemEval 2016 / 2017 Task 3 Subtask A unannotated dataset contains 189,941 questions and 1,894,456 comments in English collected from the Community Question Answering (CQA) web forum of Qatar Living. These can be used as a corpus for language modelling.

Related issue #18

attribute value
File size 224MB
Number of records 189941

Read more:

Produced by: https://github.com/Witiko/semeval-2016_2017-task3-subtaskA-unannotated-english

Example:

import gensim.downloader as api


for thread in api.load("semeval-2016-2017-task3-subtaskA-unannotated"):
    print("Question subjects: {}\n".format(thread["RelQuestion"]["RelQSubject"]))
    print("Question body: {}\n".format(thread["RelQuestion"]["RelQBody"]))
    print("Relevat comments: ")
    for idx, relcomment in enumerate(thread["RelComments"]):
        print("\t#{}: {}\n".format(idx + 1, relcomment["RelCText"]))
    break

"""
Output:

Question subjects: Thailand:IT Minsitry blocks CNN; Facebook;

Question body: The state of Internet in Thailand:IT Minsitry blocks CNN; Facebook; Yahoo; Flickr Thai Immigration website listed as dangerousFull story: http://www.thaivisa.com/forum/Thai-Govt-Blocks-Cnn-Yahoo-Financ-t321851.html

Relevat comments: 
	#1: have they blocked porn??? <img src="http://www.qatarliving.com/files/images/Da.gif">

	#2: like trying to contain a tsunami with a hand towel ************************************ I'm Jack's complete lack of surprise

	#3: oops double post.. ----------------- "HE WHO DARES WINS" Derek Edward Trotter

	#4: What next they gonna ban all *** tourist from entering the country? ----------------- "HE WHO DARES WINS" Derek Edward Trotter

	#5: Or you can always make your own there with some thai babys Rules are a guideline for intelligent people; but they must be adhered to by idiots.

	#6: why CNN? they want to die ignorant of what happens around?
"""