Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic glyph scaling based on unicode width and following character content #4969

Closed
Tyriar opened this issue Feb 27, 2024 · 14 comments · Fixed by #4997
Closed

Automatic glyph scaling based on unicode width and following character content #4969

Tyriar opened this issue Feb 27, 2024 · 14 comments · Fixed by #4997
Assignees
Labels
area/addon/canvas area/addon/webgl type/enhancement Features or improvements to existing features
Milestone

Comments

@Tyriar
Copy link
Member

Tyriar commented Feb 27, 2024

We have a report related to GB18030 compliance (Chinese government standard) that calls out the rendering of the char overlapping with the follow character. This is because we think it's a narrow width char but render it exactly as the font renders.

image

We could handle this case and solve the problem of overlapping character once and for all by scaling down the glyph to fit in the cell. This is what Windows Terminal's old renderer did:

image

https://github.com/microsoft/terminal/blob/release-1.19/src/renderer/dx/CustomTextLayout.cpp#L795-L836

Of course when talking about overlapping we can't ignore emoji which overlap but often contain a space after as this is what most terminals do. If we started scaling down all emoji I'm sure we would get many bug reports complaining about it. To handle this particular case generically I think we should also consider the following character, if is just whitespace then render normally without scaling,

@jerch
Copy link
Member

jerch commented Feb 28, 2024

@Tyriar Hmm, while this would help as a general fix it still kinda feels like patching things at the wrong end. I also have some doubts that endusers will accept it - a misaligned glyph will disturb the reading flow a lot up to a point where ppl might not recognize its meaning anymore (ppl are used to read smaller/out-of-raster glyphs as symbols with a different meaning, like ™ has its very own meaning beside being "TM").

For the char/glyph at hand - does it get the correct width with the newer unicode versions @PerBothner included in the grapheme addon?

Edit: The situation with glyphs depending on the font creators' goodwill even for monospace fonts (since unicode does not impose strict widths) is really a nuisance - de facto every single monospace font would have to be tested upfront for problematic glyphs dropping out of the monospace grid, it is a really wicked situation. (We could at least build a test script, maybe even with a visual grid output to check for wcwidth --> glyph width conformance of a certain font.)

@PerBothner
Copy link
Contributor

The character 'Ⅻ' (216B) is listed as having "ambiguous" width, going back to at least Unicode 6.0.0. We should probably treat such characters as wide. I thought we already did, but I'd have to look at the code (when I'm not trying to get back to sleep) to see what is going on.

@Tyriar
Copy link
Member Author

Tyriar commented Feb 28, 2024

Isn't just making it wide on our side going to cause weird things to happen if conpty or the system thinks it's a narrow character? I end up having these discussions every year and want to just fix it once and for all. An example of another discussion I have is they don't like our beautiful custom powerline glyphs and want a square with a question mark to show instead, despite it being a PUA unicode region 😕

@jerch
Copy link
Member

jerch commented Feb 28, 2024

Yupp, we cannot fix it by just setting wcwidth to 2 in xterm.js - it would have to be in line with what the OS / app thinks about that char.

Btw thats the reason why I think that the terminal interface needs sequences to announce the used unicode version + extensions (+/- grapheme clustering, +/- bidi etc.), so app and terminal side can adopt or refuse that. The issue here - neither of them (app/OS side, TE side) have easy to use building blocks for that currently. It is almost impossible for cmdline devs to properly prepare their apps for that in C/C++, unless they have explicitly linked against a certain libicu version (thats for linux, idk by which lib macos integrates unicode support, for windows I am completely clueless).

What we'd need (a perfect world scenario):

  • per unicode version: fully specced wcwidth map (no ambiguous stuff), ideally with optional grapheme support
  • bake that into a stdlib to be used by TEs and cmdline apps for wcwidth calc
  • sequence to request a certain unicode version (+extension) from TE (maybe also baked into that lib)
  • convince all TE + cmdline devs to use that lib in their program
  • ideally ship that lib with a least denominator unicode version
  • point all monospace font creators to that lib regarding glyph widths

We hardly would ever get close to this, esp. with no official authority interested in this...

@lhecker
Copy link

lhecker commented Feb 28, 2024

FYI the corresponding Windows Terminal issue: microsoft/terminal#16779
My intention is to add an API to ConPTY that allows setting the width of ambiguous width glyphs. That may also (partially, but mostly) solve the issue for xterm.js, right?

@jerch
Copy link
Member

jerch commented Feb 28, 2024

@lhecker Yes that would halfway solve the broader issue (it would solve the issue at hand though), I also suggested something similar for PUA in the past. But you are right - it should be used for any ambiguous widths. It could also level out differences between unicode versions, at least for newly specced codepoints, that were unspecced before (which is the majority of changes between unicode versions). It still would not help with "active processing" changes, like additions to grapheme handling...

@jerch
Copy link
Member

jerch commented Feb 28, 2024

For ref - link to my PUA suggestion

@o-sdn-o
Copy link

o-sdn-o commented Feb 28, 2024

Perhaps this can somehow help in this issue. I asked a long time ago why not delegate specifying the size of glyphs to the user side/application:

https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/23

Perhaps there are some pitfalls in this that I am not aware of. A few days ago I started designing/implementing rendering of the vtm text buffer in the GUI to fully understand the viability of this concept.

@PerBothner
Copy link
Contributor

As I've mentioned before, I believe the best "fix" is to specify escape sequences or a mode where cursor addressing is in terms of "logical" characters and "logical" lines, independent of character width and wrapping. I.e. the application doesn't know or care how wide a character is or when/if a line gets wrapped. This would also allow use of variable-width fonts, which is more-or-less required by many writing systems.

@o-sdn-o
Copy link

o-sdn-o commented Feb 28, 2024

If the application doesn't care about the width of the characters, then the terminal itself arranges/places/lays out the glyphs (in its own way, even relying on font metrics). If an application is interested in precise cursor movement and glyph placement, then the app should be able to explicitly somehow specify the required layout/widths/etc.

@jerch
Copy link
Member

jerch commented Feb 29, 2024

@o-sdn-o Yes I remember your proposal. It is really sophisticated in what it can express while still being attached to a grid system. I dont know yet, how that would work in detail, e.g. whats the default assumption, if not explicit fragment size is given? Wouldn't that need a sane default/fallback, thus we are again at the current wcwidth madness?

@PerBothner Giving up the strict grid adherence would prolly work for 90% of all cmdline apps, that dont care for a specific screen state or certain alignment. Those are mostly apps just printing to the normal buffer. But more complicated apps on the alternate buffer still would need a more fine-grained control of the screen state (like all curses apps), so I dont see this yet as a general way to overcome the grid system in general, unless the model provides more "layouting primitives" (which opens Pandora's box of CSS-like requirements in the terminal).

my stance
I see both of your suggestions as interesting concepts to enhance terminal output caps in the future, given the TE has a capable enough output system. And yet again legacy constraints come into play - what about limited output systems, that cannot freely scale glyphs or are limited to monospaced/grid world? Imho both would need a strict monospace fallback - so here we are again still facing the wcwidth-system flaws with modern unicode specs. To make this clear - I really appreciate your works to bring more modern output concepts to the terminal, still I dont see how both ideas can make the current wcwidth-system obsolete. So we still have to deal with its flaws.


Suggestion, how to overcome the issue at hand (rough outline):

  • introduce a terminal mode to query/set ambiguous width default setting, the default setting might be applied by the TE, whenever no other rule overrides it
  • discourage the old ambiguous width selection from ECMA-35/ISO-2022 as outdated, as it cannot be applied anymore in unicode context, for real old-fashioned ISO-2022 usage it can stay in place overriding the default value
  • introduce a sequence to query/set wcwidth values for any codepoints/clusters, with this app side can reliably calculate screen state again even for cutting edge unicode pattern like joined emojis

With this app and TE side would be able to deal with most wcwidth related issues, as long as their wcwidth maps/unicode version is close enough.


Somewhat offtopic addressing the broader issue:
In the longer term we might also want have a better negotiation of unicode version/features support:

  • introduce a sequence to notify app side about supported unicode versions/features by the TE
  • make unicode version/extensions selectable from the list of supported versions
  • if TE has grapheme support enabled, its wcwidth impl must also be cluster aware

The sequence for selecting unicode version is a less pressing issue, as long as unicode versions stay mostly backwards-compatible (which is the case since v10) and the TEs manage to stick to recent versions.
A mismatch of unicode features like BiDi or grapheme support would be more disrupting. But both is really a big topic of its own, so prolly this issue here is not the right place to discuss that. (Also we have in xterm.js no good story or any devs to properly implement BiDi support, and grapheme support was just recently added by @PerBothner)

Also everything above still does not deal yet with font related issues, like when a font gets a glyph size wrong. But I think this is a separate topic and not the real cause behind the issue above.

@o-sdn-o
Copy link

o-sdn-o commented Feb 29, 2024

whats the default assumption, if not explicit fragment size is given?

In the absence of explicit specification of glyph sizes, a terminal operating in a GUI environment can set glyph sizes according to its own logic, for example, using the metrics of the font used.

In the case of a headless terminal, in the absence of an explicit specification of glyph sizes, it must inject into the output stream additional instructions with glyph sizes that will be used by the TE instance operating in the GUI, which is the final layer in this output chain.

The headless TE is reflowing on its side, so it must inject the cluster sizes into the output stream.

Shells/Readlines must inject cluster sizes into the output stream as they reflow and position the cursor.

TUI-apps must inject cluster sizes into the output stream as they render their UI and position the cursor.

Legacy applications are left to rely on the GUI terminal option switch for logic between wcwidths/font metrics, or whatever.

Meta data about the current cluster size should exist as a common cell-wise SGR attribute, like the background color, it is either set or missing.

We need to wait for this to be implemented in a prototype (vtm's GUI front-end) so that anyone can play with it and better understand the need for such complications in the terminal ecosystem.

@Tyriar
Copy link
Member Author

Tyriar commented Mar 13, 2024

This is a little more time sensitive on our end than trying to standardize something across all terminals like a new escape sequence and definitely beyond the scope of the time I have to dedicate to this problem. I just want a simple innocuous flag that is sufficient for passing GB18030 certification and not needing to revisit this problem every year.

My intention is to add an API to ConPTY that allows setting the width of ambiguous width glyphs. That may also (partially, but mostly) solve the issue for xterm.js, right?

@lhecker This would mostly solve it on Windows only, not sure what OS they use.

FWIW the glyph on my setup actually overlaps the 3rd cell as well, so even this would technically not solve the overlapping problem for the keen observer.

ppl are used to read smaller/out-of-raster glyphs as symbols with a different meaning, like ™ has its very own meaning beside being "TM")

@jerch I put together a little prototype that just rescales horizontally, not vertically, and I think it looks decent (ignore the emoji, we wouldn't allow that to happen if the following cell is free):

image

Tyriar added a commit to Tyriar/xterm.js that referenced this issue Mar 14, 2024
@Tyriar Tyriar added this to the 5.5.0 milestone Mar 14, 2024
@jerch
Copy link
Member

jerch commented Mar 14, 2024

@Tyriar Yes, looks like a good compromise to get a less disturbing output w'o waiting for the long and exhaustive spec iteration steps. 👍

Tyriar added a commit to Tyriar/xterm.js that referenced this issue Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/addon/canvas area/addon/webgl type/enhancement Features or improvements to existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants