On 13/02/2020 14:42, Eli Zaretskii wrote, quoting me: >> >> [...snip...] >> >> 2) Representation of wchar_t as UTF-16LE entities makes it >> impossible to have effective handling, in mbrtowc() and wcrtomb(), >> for code points which lie off the BMP. Such code points are >> represented by surrogate pairs, in UTF-16, and there is no >> standards conformant mechanism for passing such surrogate pairs >> through the single wchar_t argument to either of these functions.>> >> I can address the first of these issues; the second is more >> problematic.>> > I think you can give up on 2). Wide-character support in the CRT > routines is fundamentally broken on MS-Windows, due to the use of > UTF-16, for any codepoint beyond the BMP. I certainly agree with this ... with the benefit of hindsight, I think we can all agree that Microsoft's decision to standardize on UTF-16 for their wchar_t was myopic, in the extreme, and the blinkered insistence, which prevailed throughout their documentation for years, that Unicode could mean nothing other than UTF-16LE, did them no credit at all. > [...snip...] > > I don't think this can be fixed as long as wchar_t remains a 16-bit > data type. People who need their MinGW programs to do better should > either (a) convert everything to UTF-8 and write their own code to > manipulate UTF-8 strings, or (b) use replacements such as Gnulib > (which, quite expectedly, uses a 32-bit data type for wide > characters). > > So I think you should just document this as a Windows restriction, and > move on. That would be the easy cop-out, but the problem with doing so is that we have had (fundamentally broken) implementations, in libmingwex.a, for about fifteen years now, and I'm uncomfortable with providing broken implementations. Of course, I could reject bug reports against the existing implementations, declaring them as no longer supported, and flagging them as "won't fix". (Withdrawing them altogether is hardly a viable option, since it would break any legacy code which may have come to rely on them ... not least of these being our own enhanced printf implementation, which requires both mbrtowc(), and wcrtomb())! It turns out that I can, quite easily, improve on the implementations of those existing functions, within libmingwex.a, which are related to wcrtomb(), and which convert from wchar_t to MBCS. Conversion in the opposite direction seems trickier; I won't give up on it just yet, but I may need to document a limitation that calling any mbrtowc() related function, with a wchar_t return value buffer sufficient for less than two UTF-16 wchar_t entities, may be unsafe. -- Regards, Keith. Public key available from keys.gnupg.net Key fingerprint: C19E C018 1547 DE50 E1D4 8F53 C0AD 36C6 347E 5A3F -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: <https://lists.osdn.me/mailman/archives/mingw-users/attachments/20200213/9346670f/attachment.sig>