Skip to content
This repository has been archived by the owner on Sep 27, 2022. It is now read-only.

Ensure clear connection between HTML nodes and plaintext #43

Open
appledora opened this issue Sep 15, 2022 · 0 comments
Open

Ensure clear connection between HTML nodes and plaintext #43

appledora opened this issue Sep 15, 2022 · 0 comments

Comments

@appledora
Copy link
Owner

In GitLab by @geohci on Sep 15, 2022, 16:05

Many use-cases for HTML plaintext do require some knowledge of where each word came from -- e.g., knowing which part of the sentence is a link or was italicized in the HTML can be crucial to training models for link prediction. For the plaintext methods, we should have the ability to see which type of node contributed each character/word while also easily joining them together into a pure string object.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant