Issue Details (XML | Word | Printable)

Key: DNET-900
Type: Bug Bug
Status: Closed Closed
Resolution: Won't Fix
Priority: Major Major
Assignee: Jiri Cincura
Reporter: Erik Groenakkers
Votes: 0
Watchers: 1

If you were logged in you would be able to see more operations.
.NET Data provider

Querying WIN1251 columns, characters like é ë replaced by ?

Created: 24/Sep/19 10:36 AM   Updated: 02/Dec/19 04:36 PM
Component/s: ADO.NET Provider
Affects Version/s:
Fix Version/s: None

Environment: Win10, x64/x86, .NET 4.7.2, Firebird 2.5

 Description  « Hide
I am working with a legacy database which uses WIN1251 text columns (don't ask me why). I am now trying to read data through the FirebirdClient (and earlier versions) but characters like é and ë are replaced by question marks. I have tried connection character set NONE, UTF8 and even WIN1251 but to the same effect.

I tested this with WIN1252 columns, and that works fine with connection character set NONE, WIN1252 or UTF8. Is WIN1251 not supported?

Sample code (sensitive info removed):

using (FbConnection connection = new FbConnection(@"character set=UTF8;data source=localhost;initial catalog=<db path>;user id=<user>;password=<pass>")) {
                using (var command = new FbCommand(@"select id, name from client", connection))
                    using (var reader = command.ExecuteReader())
                        while (reader.Read())
                            Console.WriteLine($"ID {(int)reader["ID"]}, NAME {(string)reader["NAME"]}");

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jiri Cincura added a comment - 24/Sep/19 10:47 AM
To make it clear, the column is WIN1251 and the connection charset (in connection string) is UTF8, right?

Erik Groenakkers added a comment - 24/Sep/19 11:34 AM
@Jiri correct!

Mark Rotteveel added a comment - 24/Sep/19 05:26 PM
WIN1251 is a cyrillic character set and does not contain é or ë. If you stored those characters, then you likely did that with connection character set NONE, and you will need to fix it by casting to NONE and then to the right character set (probably WIN1252).

Erik Groenakkers added a comment - 25/Sep/19 12:37 PM
@Mark Good point. Why does Firebird not reject characters that are not part of the column character set?
Data originates from (old) Delphi applications that use the default NONE connection character set to connect to Firebird.When read back in Delphi under NONE or WIN1251, there is no problem. I guess in .NET this goes wrong due to .NET's preference for Unicode as in-between character set.

Mark Rotteveel added a comment - 25/Sep/19 04:21 PM
If you connect with character set NONE the byte values received are simply stored as is (this is a bit of an oversimplification though). Bytes themselves only have meaning in a character set, so it is impossible to reject them because byte 0xEB in WIN1252 is ë, while in WIN1251 it is л.

A possible solution (but test them on a copy of your database) is to do the following

1. alter the character set of the column to NONE
2. update all rows, explicitly updating the column (ie UPDATE yourtable SET yourcolumn = yourcolumn)
3. alter the character set of the column to WIN1252

Note: don't change directly from WIN1251 to WIN1252, because that would yield transliteration errors on read. This is also why step 2 is necessary.

Erik Groenakkers added a comment - 26/Sep/19 09:00 AM
@Mark Thanks! I was hoping to avoid having to modify the database, but it looks like I'll have to.