-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wide chars, UTF-8, terminal escapes and colors, etc. #69
Comments
I looked in to the licensing some more, and I don't think there's actually a licensing problem with the Debian patches or most of their cows. Per their Looking at the Debian patch...
...hmm. I'm not very familiar with Perl Unicode support. Looks like this is programmatically doing the equivalent of Might be able to handle this more gracefully by using See: |
…o 5.8.7 Addresses #69 and #65. This UTF-8 handling approach is based on Debian's UTF-8 handling patch for cowsay 3.03 at https://sources.debian.org/patches/cowsay/3.03%2Bdfsg2-8/utf8_width, discussed at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254557. It has been in place on Debian since 2010, so I think we can consider it reasonably well tested and supported.
Added support for multibyte UTF-8 chars in inputs in b91f3d2, targeted for Cowsay 3.9.0, on the Seems to work fine for me: |
…o 5.8.7 Addresses #69 and #65. This UTF-8 handling approach is based on Debian's UTF-8 handling patch for cowsay 3.03 at https://sources.debian.org/patches/cowsay/3.03%2Bdfsg2-8/utf8_width, discussed at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254557. It has been in place on Debian since 2010, so I think we can consider it reasonably well tested and supported.
…o 5.8.7 Addresses #69 and #65. This UTF-8 handling approach is based on Debian's UTF-8 handling patch for cowsay 3.03 at https://sources.debian.org/patches/cowsay/3.03%2Bdfsg2-8/utf8_width, discussed at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254557. It has been in place on Debian since 2010, so I think we can consider it reasonably well tested and supported.
…o 5.8.7 Addresses #69 and #65. This UTF-8 handling approach is based on Debian's UTF-8 handling patch for cowsay 3.03 at https://sources.debian.org/patches/cowsay/3.03%2Bdfsg2-8/utf8_width, discussed at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254557. It has been in place on Debian since 2010, so I think we can consider it reasonably well tested and supported.
Ah. It looks like some of our recent contributions are using non-ASCII UTF-8 characters.
I don't know what interaction this new "UTF-8 on inputs" is having with cow source files, but seems likely it's something like that. Maybe the problem is that we're changing STDOUT to UTF8 encoding, and these UTF8-using source files are not marked as UTF8, so they get misinterpreted as single-byte Latin-1 source, converted to chars, and then upon output, Perl renders them as the UTF-8 encoding of that bogus single-byte-encoding interpretation. That'd explain the accented characters: high (8-bit not 7-bit) bytes getting re-rendered. Maybe References |
Trying Before: After: Yeah, that looks better. Did that in fc346f4. |
…o 5.8.7 Addresses #69 and #65. This UTF-8 handling approach is based on Debian's UTF-8 handling patch for cowsay 3.03 at https://sources.debian.org/patches/cowsay/3.03%2Bdfsg2-8/utf8_width, discussed at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254557. It has been in place on Debian since 2010, so I think we can consider it reasonably well tested and supported.
…o 5.8.7 Addresses #69 and #65. This UTF-8 handling approach is based on Debian's UTF-8 handling patch for cowsay 3.03 at https://sources.debian.org/patches/cowsay/3.03%2Bdfsg2-8/utf8_width, discussed at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=254557. It has been in place on Debian since 2010, so I think we can consider it reasonably well tested and supported.
Cowsay doesn't handle variant character widths well. It kind of assumes all characters are 1 char wide (in display) and (I think) 1 byte (in the input encoding). This means that non-English/Latin characters in the cows or the message text are not handled well.
This is an expansion of #65 "Use Debian's UTF-8 Patch".
Aspects:
Bad-wrapped multi-byte char example:
Considerations
The cow files distributed with cowsay are all UTF-8, regardless of what locale the user is running in or how their system is set up. (I think? Or are they actually ASCII/Latin-1, since they are Perl source code?)
Message input might be in the user's locale while the cow files are UTF-8. Custom cows (including in third-party cow herd packages) might be in other encodings, which may or may not be the same encoding as
Perl's standard library doesn't support char width detection, I don't think. Would need a CPAN module for that. We currently don't take any deps on modules. Would need to figure out how to do that. I think we'd vendor the module (ship a copy of it in cowsay itself), to avoid creating any external dependencies or a more complicated install process.
Testing
Examples:
cowsay "MÖÖÖ"
cowsay 'Привет, мир!'
cowsay 'Ищу свое лицо. Особых примет нет.'
Wide chars:
cowsay "我愛中國人"
cowsay 'でびあん/Debian'
cowsay 谢谢你
ANSI terminal escapes:
echo 'Hello, World!' | toilet -w 100 --metal | cowsay -n
figlet "Hello World!" | toilet -f term --metal | /usr/games/cowsay -n
TODO
use utf8;
)?wchar
type in encoding.)References
The text was updated successfully, but these errors were encountered: