Info on sub-prompts and tokens #92
jdietzChina
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Anyone have more info on how these token limits of 75 and the idea of greedy merges and bags?? Where can I learn more about what's going on here? I'm pasting the section of the docs, i find this part fascinating! Thanks in advance!
@lllyasviel , thanks for all you do!
Parameter: description and detailed_descriptions
Let us introduce a concept called "sub-prompt". If a prompt is less than 75 tokens, and is self-supported to describe a thing without relying on other prompts, we call it a "sub-prompt".
The description is a sub-prompt, and the detailed_descriptions is a list of sub-prompts.
Note that each sub-prompt is strictly less than 75 tokens (and typically less than 40 tokens), you can safely encode them with any clip without worrying the truncation position affecting the semantics.
The design of sub-prompt also allows more satisfying text encoding based on greedy merge. For example, if you have
sub-prompt A: 25 tokens
sub-prompt B: 35 tokens
sub-prompt C: 5 tokens
sub-prompt D: 60 tokens
sub-prompt E: 15 tokens
sub-prompt F: 25 tokens
and since every sub-prompt is promised to be self-supported to describe a thing independently, we can use greedy method to merge them to bags like
bag 1 {A, B, C} : 65 tokens
bag 2 {D} : 60 tokens
bag 1 {E, F} : 40 tokens
where each bag is less than 75 tokens and can be encoded by any clip in one pass (and then concat them).
Encoding texts in this way will make sure that text-encoder will never make semantic truncation mistakes.
One may ask - if all sub-prompts are less than 75 tokens with independent semantics, why not just encode them without merge and then concat? This is mainly because we want the text embedding to be more coherent. For example, lets say sub-prompt A is "a man" while sub-prompt B is "handsome, professional", then merging them before encoding will give you a more mixed text embedding concept with coherent features of a handsome professional man.
Beta Was this translation helpful? Give feedback.
All reactions