How to use dataset #17

pyfisch · 2020-02-06T14:48:48Z

Hi,

thanks for providing the dataset as a download. I downloaded the dataset from the location mentioned in #12 (comment)
But it appears that the format of the dataset is different from the files you receive if you dowload the data yourself.

See this gist, the first file 12092740.data I downloaded myself from archive.org, while the second file was part of the dowloaded dataset.

As you can see the downloaded file contains the attributes [XSUM]URL[XSUM], [XSUM]INTRODUCTION[XSUM] and [XSUM]RESTBODY[XSUM]. But the file from the dataset has [SN]URL[SN], [SN]TITLE[SN], [SN]FIRST-SENTENCE[SN] and [SN]RESTBODY[SN].

My problem is that if I follow the tutorial at https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset the scripts don't work with the unmodified files.

Which changes do I need to make to the scripts?

Best,
Pyfisch

The text was updated successfully, but these errors were encountered:

isabelcachola · 2020-03-16T22:30:15Z

@pyfisch I had the same issue and was able to resolve it with a quick data processing script, described here. Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use dataset #17

How to use dataset #17

pyfisch commented Feb 6, 2020 •

edited

Loading

isabelcachola commented Mar 16, 2020

How to use dataset #17

How to use dataset #17

Comments

pyfisch commented Feb 6, 2020 • edited Loading

isabelcachola commented Mar 16, 2020

pyfisch commented Feb 6, 2020 •

edited

Loading