You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm wondering whether it is possible to conduct sentence tokenization on a list of tokens that have already been tokenized (without breaking the original word tokenization)?
I tried the answer in #38, but it seems that it no longer works in pybo 0.6.4.
The error comes from the fact you're feeding the sentence_tokenizer() a list of strings whereas it is expecting a list of Token objects – which would have attributes such as 'pos'.
It is theoretically feasible to turn a list of words into tokens without breaking the original tokenization, but lots of information dynamically derived from the trie will be lost. So I don't really know how useful that would be. Specially for sentence tokenization that runs heuristics on the content of Token objects.
I may give it a try since I have received requests from others too. I'll keep you posted
drupchen
changed the title
Sentencize a list of tokens that have already been tokenized
Sentencize a list of tokens that have been manually tokenized by adding spaces
Aug 16, 2019
Hi, I'm wondering whether it is possible to conduct sentence tokenization on a list of tokens that have already been tokenized (without breaking the original word tokenization)?
I tried the answer in #38, but it seems that it no longer works in pybo 0.6.4.
The text was updated successfully, but these errors were encountered: