Unary trouble, Surrogate problems and Odd requests #12

ReNardier · 2021-10-10T08:43:47Z

i've tried using unary(for fun) in https://rot47.net/base.html and although it was only 12(base10) to unary... it froze my browser and had to close it
then i saw your site in your discus post and i really liked it... and i though of trying unary on yours as well, but chose not to...
but i then accidentally miss-selected all the characters in base input and pressed on 1, SO the good news is that it didn't crash, the ""bad"" are that it did nothing, could you add an unary option with a character output limit? id say 1080 maximum should be enough as a default, but if the user wants to change it he could at his own risk, what do you think?
and of course way to use a custom "1" for the unary base wouldn't be remiss
as for the second part of the title, i kind of wanted to try it with 𝍲𝍳𝍴𝍵𝍶(the kanji ideographic tally marks), but besides the facts that i've just noticed now while writing this issue that it would have problems because there is no 0 and that the way it should be converted into would be 1="1", 2="2", 3="3", 4="4", 5="5", 6="51", 7="52", etc etc... isn't supported, the problem the title refers to is that for characters 𐀀 to 􏿽(and i suppose 10FFFE and 10FFFF as well but those shouldn't be used) it seems to be using the unicode-16 surrogate characters https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates and that is an obvious problem
and while im asking for the tally output method(yeah, im doing that as well) do you think you could add one for 𝍷𝍸(fence tally marks) as well? the way they work is 1="1", 2="11", 3="111", 4="1111", 5="5", 6="51", etc etc... with 𝍷="1" and 𝍸="5, also when they encoded these in unicode it seems they though it would be better to make it so that "2", "3" and "4" should be handle at the font level using multiple 𝍷, so you'll see how to best handle that in the output(and perhaps even the input)...... and obviously a way to use custom characters for those tally ones as well
thanks in advance.
edit: ps, i've noticed that you can output bizarre stuff like 0123454321 and it's converted correctly, i like it please don't remove it, either say in the page(and code) that is something that can be done although its somewhat rather pointless and leave it, or put it behind a "for fun" option that is by default off(and has the explanation of what it does on a widget), thanks again.
edit 2:and i just thought of something, perhaps you could make it so that if either the input or output fields see " ##"(a space followed by two hastags) at the end(so "0123456789 ##") the converter interprets that as "hold on, there might be a parameter i have to listen for there" so say stuff like ##paddingchar="#" or ##tallymark , what do you think?

zamicol · 2021-10-12T20:41:48Z

Would the following address your suggestions?

Support unary.
Support emoji characters.
Option for Unicode markup ("U+").
Support surrogate characters.

I think unary is interesting. Since internally it converts to base10, it would be a simple loop over the base10 representation of the input.

As far as tally marks are concerned, I think emojis present the same problem (base2 with an alphabet of 😀😍). The application is interpreting that as 4 characters.

zamicol · 2024-10-24T17:46:35Z

For my own notes, when this is done, cypherary (base 0) can be implemented as well. (Which should always result in "undefined".)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unary trouble, Surrogate problems and Odd requests #12

Unary trouble, Surrogate problems and Odd requests #12

ReNardier commented Oct 10, 2021 •

edited

Loading

zamicol commented Oct 12, 2021

zamicol commented Oct 24, 2024

Unary trouble, Surrogate problems and Odd requests #12

Unary trouble, Surrogate problems and Odd requests #12

Comments

ReNardier commented Oct 10, 2021 • edited Loading

zamicol commented Oct 12, 2021

zamicol commented Oct 24, 2024

ReNardier commented Oct 10, 2021 •

edited

Loading