Unicode support #2

mbudde · 2020-06-27T19:41:31Z

Column text width is calculated by counting bytes which gives a wrong result if the text contains multibyte UTF-8 sequences or non-printable characters.

aldnav · 2024-11-06T22:46:07Z

Hi @mbudde — do you have a sample input and an expected output?

aa ทtb ñññ
a ทt ñ

tabulate < test.txt
[src/main.rs:113:5] &args = Args {
    truncate: None,
    ratio: 1.0,
    lines: 1000,
    include_cols: None,
    exclude_cols: None,
    delim: " \t",
    output_delim: "  ",
    strict_delim: false,
    online: false,
    print_info: false,
}
aa  ทtb    ñññ
a   ทt     ñ

Seems expected, no?

mbudde · 2024-11-08T13:28:44Z

Expected output in your example is (two spaces between columns):

aa  ทtb  ñññ
a   ทt   ñ

Here is an example with zero-width characters (zero-width no-break space U+FEFF): test.txt
Input:

aaaa a
a<feff><feff><feff><feff>a a

Actual output:

aaaa            a
aa          a

Expected output:

aaaa  a
aa    a

mbudde added the enhancement label Jun 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode support #2

Unicode support #2

mbudde commented Jun 27, 2020

aldnav commented Nov 6, 2024

mbudde commented Nov 8, 2024

Unicode support #2

Unicode support #2

Comments

mbudde commented Jun 27, 2020

aldnav commented Nov 6, 2024

mbudde commented Nov 8, 2024