Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode support #2

Open
mbudde opened this issue Jun 27, 2020 · 2 comments
Open

Unicode support #2

mbudde opened this issue Jun 27, 2020 · 2 comments

Comments

@mbudde
Copy link
Owner

mbudde commented Jun 27, 2020

Column text width is calculated by counting bytes which gives a wrong result if the text contains multibyte UTF-8 sequences or non-printable characters.

@aldnav
Copy link

aldnav commented Nov 6, 2024

Hi @mbudde — do you have a sample input and an expected output?

aa ทtb ñññ
a ทt ñ
tabulate < test.txt
[src/main.rs:113:5] &args = Args {
    truncate: None,
    ratio: 1.0,
    lines: 1000,
    include_cols: None,
    exclude_cols: None,
    delim: " \t",
    output_delim: "  ",
    strict_delim: false,
    online: false,
    print_info: false,
}
aa  ทtb    ñññ
a   ทt     ñ

Seems expected, no?

@mbudde
Copy link
Owner Author

mbudde commented Nov 8, 2024

Expected output in your example is (two spaces between columns):

aa  ทtb  ñññ
a   ทt   ñ

Here is an example with zero-width characters (zero-width no-break space U+FEFF): test.txt
Input:

aaaa a
a<feff><feff><feff><feff>a a

Actual output:

aaaa            a
aa          a

Expected output:

aaaa  a
aa    a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants