Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<filesystem>: MSVC's path conversion from wide to narrow throws an exception (fs::path(L"要").string()) #5093

Open
stevewgr opened this issue Nov 17, 2024 · 6 comments
Labels
filesystem C++17 filesystem info needed We need more info before working on this

Comments

@stevewgr
Copy link

Describe the bug

MSVC's std::filesystem::path conversion from wide to narrows crashing with the following snippet:

#include <iostream>
#include <filesystem>
int main() {
    std::filesystem::path(L"要らない.exe").string();
    std::cout << "Hello World!\n";
}

I also tried in a fresh sandbox with fresh latest VS Community installation and crash is happening on both builds x86|x64 Debug|Release:
Microsoft C++ exception: std::system_error at memory location

Also tried compiling with and without unicode enabled. Same behavior.

Expected behavior

conversion should succeed and not crash the program.

STL version

Microsoft Visual Studio Community 2022 (64-bit) - Current
Version 17.11.1

@stevewgr stevewgr changed the title bug: MSVC's std::filesystem::path conversion from wide to narrow crashing (fs::path(L"要らない.exe").string()) bug: MSVC's std::filesystem::path conversion from wide to narrow crashing (fs::path(L"要").string()) Nov 17, 2024
@frederick-vs-ja
Copy link
Contributor

The error code seems to be the following.

ERROR_NO_UNICODE_TRANSLATION

1113 (0x459)

No mapping for the Unicode character exists in the target multi-byte code page.

The exception can be avoided when the source file is encoded in UTF-8 and the program is compiled with /utf-8.

@CaseyCarter
Copy link
Member

Works fine for me on my "Beta: Use UTF-8 for language support" machine, and the same for Compiler Explorer (https://www.godbolt.org/z/oGMbo3YE1). The problem is most likely:

  1. The compiler and editor have differing notions of the source encoding, so the compiler sees a gibberish string in the source file. Using a pure-ascii encoding of the string literal (L"\u8981\u3089\u306a\u3044.exe") will avoid this.
  2. The active codepage (the narrow encoding the win32 APIs and therefore path uses at runtime) can't represent 要らない so the transcoding in path::string fails (this is the error @frederick-vs-ja refers to above).

@stevewgr
Copy link
Author

The error code seems to be the following.

ERROR_NO_UNICODE_TRANSLATION
1113 (0x459)
No mapping for the Unicode character exists in the target multi-byte code page.

The exception can be avoided when the source file is encoded in UTF-8 and the program is compiled with /utf-8.

Yes, that I already did and could still reproduce. Try disabling the beta feature in your Region system settings and then restart your computer. You'll be able to reproduce. I tried with both, where the file is UTF8 encoded with and without BOM. Also of course the /utf-8 compiler flag or even explictly defining the codepage for Korean characters like /source-charset:utf-8 /execution-charset:.949 based on the docs: https://learn.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8?view=msvc-170

@stevewgr
Copy link
Author

stevewgr commented Nov 17, 2024

Works fine for me on my "Beta: Use UTF-8 for language support" machine, and the same for Compiler Explorer (https://www.godbolt.org/z/oGMbo3YE1).

Of course that works, but that's unfortunately not a solution I can instruct the consumers of my application to use.
I tried that on Godbolt before submitting the ticket and it indeed worked. I believe (not sure) the reason is because they don't use natively Windows machines, might be some bootstrapped / dockerized system or possibly already have UTF-8 for language support enabled. I remember Matt Godbolt was talking about some of these challenges with msvc in one of his talks. Can be done also via powershell:

Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage' -Name 'ACP' -Value '65001'
Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage' -Name 'OEMCP' -Value '65001'
Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage' -Name 'MACCP' -Value '65001'

Here he talks further about some of the changes they made: https://xania.org/202407/msvc-on-ce
Since it was a recent article, who knows, maybe now they actually use wine on Linux.

@CaseyCarter
Copy link
Member

I'm not sure there's a bug here. If the library is throwing ERROR_NO_UNICODE_TRANSLATION to tell you that there are characters in the path that can't be represented in the active codepage, that's not a crash but expected behavior. If you believe the library is incorrect we need some more information to reproduce the problem. What is int(__std_fs_code_page())? What is the active code page in the console when the program runs?

@StephanTLavavej StephanTLavavej added info needed We need more info before working on this filesystem C++17 filesystem labels Nov 20, 2024
@StephanTLavavej
Copy link
Member

We talked about this at the weekly maintainer meeting - I agree with Casey that this sounds by design but we need more info.

The filesystem codepage, the source character set (which is a non-issue if you use universal-character-names for your repro), and the execution character set (the last two are controlled by /utf-8 which we strongly recommend), are relevant here. Casey and I now believe that the console code page is not relevant (the repro doesn't attempt to write Unicode to the console, and if it had to for diagnostic purposes, compiling with /utf-8 and using <print> would write Unicode without introducing wacky questions about the console code page).

@StephanTLavavej StephanTLavavej changed the title bug: MSVC's std::filesystem::path conversion from wide to narrow crashing (fs::path(L"要").string()) <filesystem>: MSVC's path conversion from wide to narrow throws an exception (fs::path(L"要").string()) Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
filesystem C++17 filesystem info needed We need more info before working on this
Projects
None yet
Development

No branches or pull requests

4 participants