You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this amazing library. Looking forward to actually train and adapt some models for it.
After creating my first vocabulary I noticed that a lot of the tokens contain uppercase C and uppercase D. Do those have a special meaning? I could also see them referenced in the code, but I could not find the meaning.
D, C & W are 'capcode' markers for capcode level 2. With capcode level 1 it will instead use only ord(127).
D means delete next space.
C means uppercase next character.
W means uppercase next word.
Thanks for this amazing library. Looking forward to actually train and adapt some models for it.
After creating my first vocabulary I noticed that a lot of the tokens contain uppercase C and uppercase D. Do those have a special meaning? I could also see them referenced in the code, but I could not find the meaning.
Thanks in advance
Example:
The text was updated successfully, but these errors were encountered: