Ensure clear connection between HTML nodes and plaintext #43

appledora · 2022-09-15T10:05:07Z

In GitLab by @geohci on Sep 15, 2022, 16:05

Many use-cases for HTML plaintext do require some knowledge of where each word came from -- e.g., knowing which part of the sentence is a link or was italicized in the HTML can be crucial to training models for link prediction. For the plaintext methods, we should have the ability to see which type of node contributed each character/word while also easily joining them together into a pure string object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure clear connection between HTML nodes and plaintext #43

Ensure clear connection between HTML nodes and plaintext #43

appledora commented Sep 15, 2022

Ensure clear connection between HTML nodes and plaintext #43

Ensure clear connection between HTML nodes and plaintext #43

Comments

appledora commented Sep 15, 2022