Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release filtered dataset without any tables but only text #7

Open
JBBalling opened this issue Feb 20, 2023 · 0 comments
Open

Release filtered dataset without any tables but only text #7

JBBalling opened this issue Feb 20, 2023 · 0 comments

Comments

@JBBalling
Copy link

Hello there,

thank you for your fantastic work. Is it possible to release a filtered version of the dataset, without any tables annotated?

Background: The reading order of text can be quite different than the reading order in tables. In my experiments with your model, it is mixing up the reading order on some documents with multi-column text layouts. It is reading some paragraphs left to right instead of following the two-column layout from top to bottom. I guess it is due to the table samples provided in the dataset.

Is it maybe possible to filter out the images containing a table, by a layout-segmentation / table detection model and release a filtered version of the dataset?

Isn't it better to release two separate datasets, one for tables and one for text?

Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant