Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use dataset #17

Open
pyfisch opened this issue Feb 6, 2020 · 1 comment
Open

How to use dataset #17

pyfisch opened this issue Feb 6, 2020 · 1 comment

Comments

@pyfisch
Copy link

pyfisch commented Feb 6, 2020

Hi,

thanks for providing the dataset as a download. I downloaded the dataset from the location mentioned in #12 (comment)
But it appears that the format of the dataset is different from the files you receive if you dowload the data yourself.

See this gist, the first file 12092740.data I downloaded myself from archive.org, while the second file was part of the dowloaded dataset.

As you can see the downloaded file contains the attributes [XSUM]URL[XSUM], [XSUM]INTRODUCTION[XSUM] and [XSUM]RESTBODY[XSUM]. But the file from the dataset has [SN]URL[SN], [SN]TITLE[SN], [SN]FIRST-SENTENCE[SN] and [SN]RESTBODY[SN].

My problem is that if I follow the tutorial at https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset the scripts don't work with the unmodified files.

Which changes do I need to make to the scripts?

Best,
Pyfisch

@isabelcachola
Copy link

@pyfisch I had the same issue and was able to resolve it with a quick data processing script, described here. Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants