Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report for incorrect sentence split (JNLPBA-IOBES) #2

Open
wonjininfo opened this issue Mar 28, 2019 · 1 comment
Open

Report for incorrect sentence split (JNLPBA-IOBES) #2

wonjininfo opened this issue Mar 28, 2019 · 1 comment
Assignees

Comments

@wonjininfo
Copy link

Hi,
Thanks for providing these useful resources!
While we were using the resources, we got to know that sentences in JNLPBA-IOBES dataset might be incorrectly split.

MTL-Bioinformatics-2016/data/JNLPBA-IOBES/test.tsv starts with

Number	O

of	O
glucocorticoid	B-protein
receptors	E-protein
in	O
lymphocytes	S-cell_type
and	O
their	O
sensitivity	O
to	O
hormone	O
action	O
.	O
The	O

study	O
demonstrated	O

while MTL-Bioinformatics-2016/data/JNLPBA/test.tsv starts with

-DOCSTART-	O

Number	O
of	O
glucocorticoid	B-protein
receptors	I-protein
in	O
lymphocytes	B-cell_type
and	O
their	O
sensitivity	O
to	O
hormone	O
action	O
.	O

The	O
study	O

We used our own post-preprocessing script to fix this and used the fixed dataset in our experiments.

Once again, thank you so much for sharing these useful resources!

@GamalC GamalC self-assigned this Mar 28, 2019
@GamalC
Copy link
Collaborator

GamalC commented Mar 28, 2019

Hi @wonjininfo. Many thanks for this bit of information. I think others would appreciate having your script as well, would you mind sharing it? If you are willing you can create a pull request or send me the script (gkoc2 at cam dot ac uk) and I would add it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants