Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding differs when reading TEXT-Blobs in Windows/Linux [JDBC454] #493

Closed
firebird-automations opened this issue Sep 15, 2016 · 9 comments

Comments

@firebird-automations
Copy link

Submitted by: Benjamin Judas (benjamin.judas)

Attachments:
FBTest.java
BSAKTX.sql

- I am building a web-application to retrieve data from a Firebird 2.5 database.
- The Database is using ISO8859-1.
- The data is a Text-BLOB.
- The data is accessed via JPA (Eclipselink bundled with Glassfish 4.1.1).

The problem is related to special characters (Trademark-Sign, german Umlauts, ß). I've been trying for two days now but I wasn't able to find a solution. Now I tried to circle the problem and analyze the raw data retrieved from the database.

An Example: Imagine the following text:

EasyLan® Dualboot® Patchkabel RJ45 Kat.5 geschirmt 1:1 grau 2,00m

On a Glassfish running on Windows I get the following bytes:

45 61 73 79 4c 61 6e ffffffae 20 44 75 61 6c 62 6f 6f 74 ffffffae 20 50 61 74 63 68 6b 61 62 65 6c 20 52 4a 34 35 20 4b 61 74 2e 35 20 67 65 73 63 68 69 72 6d 74 20 31 3a 31 20 67 72 61 75 20 32 2c 30 30 6d

On a Glassfish running on Linux I get the following bytes:

45 61 73 79 4c 61 6e ffffffef ffffffbf ffffffbd 20 44 75 61 6c 62 6f 6f 74 ffffffef ffffffbf ffffffbd 20 50 61 74 63 68 6b 61 62 65 6c 20 52 4a 34 35 20 4b 61 74 2e 35 20 67 65 73 63 68 69 72 6d 74 20 31 3a 31 20 67 72 61 75 20 32 2c 30 30 6d

As you can see, the Registered Trademark (®) seems to be messed up.

My connection pools in both servers use the exact same configuration and I added a charset-parameter for both (ISO8859-1). I cannot update to Firebird 3 since the database in question is one of our ERP-System.

I tried the following connection charsets: ISO8859-1, UTF8, Windows1252, NONE

I also did another test and wrote a small program using a "plain" JDBCConnection: I get the same results as with Glassfish + JPA. I think it's a Jaybird issue in relation to the operating system it's being used on.

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

Can you post all that is necessary to reproduce this. Preferably a simple executable that demonstrates storing the data and retrieval of data and a backup of a database with existing data that exhibits the problem

As I also commented on stackoverflow, this can happen when there is a mismatch between character sets when storing and retrieving data.

What I forgot to mention there is that blobs in Jaybird 2.2 and earlier are even more a special case, as we only use the connection character set to determine conversion, and not the actual character set of the blob. So if you stored data from windows with connection character set NONE (which is the default if nothing is specified!), than chance are high it was stored as Cp1252 (if you are in Western Europe; behavior depending on Windows and Java version), and when you retrieve under Linux with NONE it will usually be retrieved as UTF-8.

@firebird-automations
Copy link
Author

Commented by: Benjamin Judas (benjamin.judas)

Simple Test case

@firebird-automations
Copy link
Author

Modified by: Benjamin Judas (benjamin.judas)

Attachment: FBTest.java [ 13013 ]

@firebird-automations
Copy link
Author

Commented by: Benjamin Judas (benjamin.judas)

I added a simple testcase which demonstrates the problem. I cannot offer you the executable (proprietary, non free) for storing the data or a copy of the database (9GiB).

@firebird-automations
Copy link
Author

Commented by: Benjamin Judas (benjamin.judas)

DDL-Script for Table creation (Not my business - created by our ERP-provider)

@firebird-automations
Copy link
Author

Modified by: Benjamin Judas (benjamin.judas)

Attachment: BSAKTX.sql [ 13014 ]

@firebird-automations
Copy link
Author

Commented by: Benjamin Judas (benjamin.judas)

Your hints about the encoding of BLOBs and about how data is stored in them lead me to the solution. It's not a driver bug -- Shame on me!

The solution is a simple one-liner (example):
String text = new String(rawByteArrayFromBlob[], "CP1252");

This is of course no universal solution, but I will open a ticket for our ERP-Provider to check their user-inputs and probably transcode the input.

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

Good to hear you managed to fix this. I will close the ticket.

@firebird-automations
Copy link
Author

Modified by: @mrotteveel

status: Open [ 1 ] => Closed [ 6 ]

resolution: Cannot Reproduce [ 5 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants