Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With UTF8, exceed size limit do not throw same Exception [JDBC354] #396

Closed
firebird-automations opened this issue May 16, 2014 · 5 comments

Comments

@firebird-automations
Copy link

Submitted by: Chouteau Mathieu (chouteaum)

Attachments:
TestEncodingFB.java

Use a preparedStatement with a parameter on a 5 characters column.

When you execute the query (select, update, delete or insert), you don't obtain the same result if the length of the parameter value is over 5 or over 20 :
- Over 5 characters you obtain a FBSQLException
- Over 20 characters you obtain a DataTruncation

If the value contain 11 accented characters, the DataTruncation Exception is thrown.

I have attached a JUnit test case.

@firebird-automations
Copy link
Author

Commented by: Chouteau Mathieu (chouteaum)

JUnit test case

@firebird-automations
Copy link
Author

Modified by: Chouteau Mathieu (chouteaum)

Attachment: TestEncodingFB.java [ 12521 ]

@firebird-automations
Copy link
Author

Commented by: Chouteau Mathieu (chouteaum)

To create the table wich is used by the JUnit test case :

CREATE TABLE TEST
(
ID integer NOT NULL,
CODE varchar(5),
CONSTRAINT CONSTRAINT_NAME PRIMARY KEY (ID)
);

@firebird-automations
Copy link
Author

Commented by: @mrotteveel

The observed behavior is caused by a check that doesn't take the bytes per character into account. It will only throw the DataTruncation exception when the value exceeds the storage length (nr chars * nr of bytes per char) instead of the nr of chars. I am not sure if I am going to fix this in 2.2 as this will probably be significantly rewritten in Jaybird 3.0.

@mrotteveel
Copy link
Member

mrotteveel commented Jan 14, 2023

The problem with solving this is that, for example for UTF8, simply counting (Java) characters won't do. For example, the string "abcd\uD83D\uDE03" is 6 Java chars long, the last two being a surrogate pair representing a single codepoint, but conversion to UTF8 will yield the byte representation of 5 characters/codepoints (the last being 😃).

In addition, older versions of Firebird did - intentionally - not perform character length checks for UNICODE_FSS, allowing you to store characters up to the storage size in bytes, not declared character size (though Jaybird will truncate this when selecting values).

I could maybe check if the encoding is UTF8 (and not UNICODE_FSS), and then, if the string length() is too long, check the codepoint count and if that exceeds the length, throw a DataTruncation error as well. However, this would then result in different behaviour compared to setBytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants