You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 27, 2022. It is now read-only.
Many use-cases for HTML plaintext do require some knowledge of where each word came from -- e.g., knowing which part of the sentence is a link or was italicized in the HTML can be crucial to training models for link prediction. For the plaintext methods, we should have the ability to see which type of node contributed each character/word while also easily joining them together into a pure string object.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
In GitLab by @geohci on Sep 15, 2022, 16:05
Many use-cases for HTML plaintext do require some knowledge of where each word came from -- e.g., knowing which part of the sentence is a link or was italicized in the HTML can be crucial to training models for link prediction. For the plaintext methods, we should have the ability to see which type of node contributed each character/word while also easily joining them together into a pure string object.
The text was updated successfully, but these errors were encountered: