Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External table file names not transliterated to OS character set [CORE6202] #6447

Open
firebird-automations opened this issue Dec 5, 2019 · 1 comment

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Kjell Rilbe (kjellrilbe)

It seems that the file name specified for an external table is sent to the operating system's file operations without transliteration. This makes it impossible to use file names with non-ASCII characters.

For example, specifying the file name 'Teståäö.txt' will result in a file named 'Teståäö.txt'. Which is the Win-1252 interpretation of the byte sequence that UTF-8 string 'Teståäö.txt' is encoded as.

In other words, it would appear that the file name, stored in UTF-8 (UNICODE-FSS?) format is sent as is to a Windows system call that expects the file name to be encoded in the operating system's codepage, in this case Win-1252.

I've tried this in both isql and FlameRobin and got consistent results. The file name appears correct in RDB$RELATIONS.RDB$EXTERNAL_FILE but ends up wrong in the operating system, like described above.

I expect this to be rather easily fixed, considering the file name is always stored in the same character set (UTF8, or is it UNICODE_FSS?) and the operating system's character set is known. All that should be needed is to add transliteration of the stored file name before sending it to any operating system call.

By the way, I think I've had similar issues with database file name, but have not tried it recently. Maybe it would be a good idea to go through all operating system file operations and make sure the file name(s) passed are properly transliterated.

@firebird-automations
Copy link
Collaborator Author

Commented by: Kjell Rilbe (kjellrilbe)

There are no viable work arounds, except just not using non-ASCII names (are we back in the 1980's?), because there's no valid way to encode a string in UFT-8 that will appear as "åäö" when interpreted as Win-1252. The character codes needed are not valid UTF-8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant