example data processing warning using google colab #54

amscosta · 2024-02-26T14:44:42Z

Hello,
The following warning is issued when processing one of the .xml from the example data:
Processing: paperetl/file/data/0.xml
/usr/local/lib/python3.10/dist-packages/paperetl/file/tei.py:35: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument features="xml" into the BeautifulSoup constructor.
soup = BeautifulSoup(stream, "lxml")

Any clue how to avoid/correct that?
Thanks a lot.

The text was updated successfully, but these errors were encountered:

amscosta · 2024-02-26T18:54:06Z

I am using the colab notebook.

davidmezzetti · 2024-02-28T14:32:03Z

You can ignore it like this:

import warnings
from bs4 import XMLParsedAsHTMLWarning

warnings.filterwarnings("ignore", category=XMLParsedAsHTMLWarning)

amscosta2022 · 2024-03-01T14:01:34Z

Thanks.
But "using an XML parser will be more reliable" the message says.

davidmezzetti · 2024-03-02T15:11:48Z

Feel free to fork this project and try. It doesn't work in the tests I've run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

example data processing warning using google colab #54

example data processing warning using google colab #54

amscosta commented Feb 26, 2024

amscosta commented Feb 26, 2024

davidmezzetti commented Feb 28, 2024

amscosta2022 commented Mar 1, 2024

davidmezzetti commented Mar 2, 2024

example data processing warning using google colab #54

example data processing warning using google colab #54

Comments

amscosta commented Feb 26, 2024

amscosta commented Feb 26, 2024

davidmezzetti commented Feb 28, 2024

amscosta2022 commented Mar 1, 2024

davidmezzetti commented Mar 2, 2024