Skip to content

Commit

Permalink
Add Quora QA dataset (#2)
Browse files Browse the repository at this point in the history
  • Loading branch information
menshikh-iv authored Nov 14, 2017
1 parent 19b8560 commit 6fe859e
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions list.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
{
"corpora": {
"quora-duplicate-questions": {
"description": "over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line truly contains a duplicate pair.",
"checksum": "d7cfa7fbc6e2ec71ab74c495586c6365",
"file_name": "quora-duplicate-questions.gz",
"source": "https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs",
"parts": 1
},
"wiki-en": {
"description": "Extracted Wikipedia dump from October 2017. Produced by `python -m gensim.scripts.segment_wiki -f enwiki-20171001-pages-articles.xml.bz2 -o wiki-en.gz`",
"checksum-0": "a7d7d7fd41ea7e2d7fa32ec1bb640d71",
Expand Down

0 comments on commit 6fe859e

Please sign in to comment.