Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying WIN1251 columns, characters like é ë replaced by ? [DNET900] #827

Closed
firebird-automations opened this issue Sep 24, 2019 · 7 comments

Comments

@firebird-automations
Copy link

Submitted by: Erik Groenakkers (eriki419)

I am working with a legacy database which uses WIN1251 text columns (don't ask me why). I am now trying to read data through the FirebirdClient 7.1.1.0 (and earlier versions) but characters like é and ë are replaced by question marks. I have tried connection character set NONE, UTF8 and even WIN1251 but to the same effect.

I tested this with WIN1252 columns, and that works fine with connection character set NONE, WIN1252 or UTF8. Is WIN1251 not supported?

Sample code (sensitive info removed):

using (FbConnection connection = new FbConnection(@"character set=UTF8;data source=localhost;initial catalog=<db path>;user id=<user>;password=<pass>")) {
connection.Open();
using (var command = new FbCommand(@"select id, name from client", connection))
{
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
Console.WriteLine($"ID {(int)reader["ID"]}, NAME {(string)reader["NAME"]}");
}
}
}
}

@firebird-automations
Copy link
Author

Commented by: @cincuranet

To make it clear, the column is WIN1251 and the connection charset (in connection string) is UTF8, right?

@firebird-automations
Copy link
Author

Commented by: Erik Groenakkers (eriki419)

@jiri correct!

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

WIN1251 is a cyrillic character set and does not contain é or ë. If you stored those characters, then you likely did that with connection character set NONE, and you will need to fix it by casting to NONE and then to the right character set (probably WIN1252).

@firebird-automations
Copy link
Author

Commented by: Erik Groenakkers (eriki419)

@mark Good point. Why does Firebird not reject characters that are not part of the column character set?
Data originates from (old) Delphi applications that use the default NONE connection character set to connect to Firebird.When read back in Delphi under NONE or WIN1251, there is no problem. I guess in .NET this goes wrong due to .NET's preference for Unicode as in-between character set.

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

If you connect with character set NONE the byte values received are simply stored as is (this is a bit of an oversimplification though). Bytes themselves only have meaning in a character set, so it is impossible to reject them because byte 0xEB in WIN1252 is ë, while in WIN1251 it is л.

A possible solution (but test them on a copy of your database) is to do the following

1. alter the character set of the column to NONE
2. update all rows, explicitly updating the column (ie UPDATE yourtable SET yourcolumn = yourcolumn)
3. alter the character set of the column to WIN1252

Note: don't change directly from WIN1251 to WIN1252, because that would yield transliteration errors on read. This is also why step 2 is necessary.

@firebird-automations
Copy link
Author

Commented by: Erik Groenakkers (eriki419)

@mark Thanks! I was hoping to avoid having to modify the database, but it looks like I'll have to.

@firebird-automations
Copy link
Author

Modified by: @cincuranet

status: Open [ 1 ] => Closed [ 6 ]

resolution: Won't Fix [ 2 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants