[hdf-forum] Unicode filenames on Windows?

Quincey Koziol koziol at hdfgroup.org
Thu Jun 4 09:27:27 EDT 2009


Hi Andrew,

On Jun 3, 2009, at 4:38 PM, Andrew Collette wrote:

> Hi Quincey,
>
>>        Hmm, I don't think we do anything special to the strings we  
>> pass to
>> the file system.  Is there some particular problem you are seeing?
>
> I can't figure out how to take an arbitrary sequence of Unicode code
> points and create an HDF5 file with that name on Windows.
>
> I have limited experience with Windows Unicode support, but I know
> that the way Microsoft implements unicode is through a series of
> wide-character (2-byte "UCS-2") APIs.  Unlike most UNIX platforms,
> where you simply pass in a UTF-8 (or whatever) string through a char*,
> I think you actually have to call a separate function (e.g. fopen vs.
> _wfopen) to be able to handle generic Unicode filenames on Windows.
> Otherwise Windows treats a simple char* string as extended-ASCII,
> according to the current locale settings.  So if I'm on a French
> computer, I can get HDF5 to generate an e-with-an-accent, but not (for
> example) a name with Cyrillic letters.
>
> Currently, as far as I've found out in my investigations, there's no
> way to encode a generic Unicode string to char* on windows and have it
> work with the filesystem; you have to use the UCS-2 functions.  I've
> peeked at H5FDwindows.c and it looks like you're using the traditional
> char* API.
>
> I realize it's probably not a priority for HDF5 development, but it
> would be nice if HDF5 could handle the full extent of names allowed by
> the filesystem.  It seems like the correct place for that is the
> Windows file driver.  One way would be to have two modes, perhaps set
> by the file access property list; in the first, it passes the raw
> bytes through to the filesystem (as is done now), and in the other, it
> performs two-way translation between UTF-8 strings (HDF5 user side)
> and the UCS-2/wchar API (Windows platform side).  This would have the
> additional benefit of maintaining HDF5's internal standardization on
> UTF-8.

	Seems like a reasonable idea.  I've filed a bug in our bug tracker  
and it'll get prioritized with the other things there, but we'd be  
happy to accept a well-tested patch from the community also.

	Quincey

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2502 bytes
Desc: not available
URL: <http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/attachments/20090604/dc8d5152/attachment.bin>


More information about the Hdf-forum mailing list