You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered a problem using SIMILAR TO syntax on UTF-8 fields. I
searched through the tracker but couldn't find this issue.
What seems to be happening is that when using SIMILAR TO on UTF-8 fields
doesn't get the right pattern length.
I've made a testcase with a bit of data in it that you can get from: http://www.fantasycomic.com/t/testsimilarto.zip (both 2.5beta2 database
as gbak backup are in there).
In that testcase you'll find a table created with:
CREATE TABLE TEST (
ANSIFIELD Varchar(100),
UTF8FIELD Varchar(100) CHARACTER SET UTF8,
UNICODECIFIELD Varchar(100) CHARACTER SET UTF8 COLLATE UNICODE_CI
);
This table was then filled with a number of records, with the same value
going into each three fields.
The default character set for the database is ISO8859_1. I have done all
my tests using flamerobin 0.8.6.1652 Unicode, connection charset was
ISO8859_1.
Now on that table if you do
SELECT * FROM Test WHERE Utf8Field SIMILAR TO 'DELL %' ;
You'll get 16 results. Amongst those results are incorrect matches like
for example 'DE SINGEL'. All 16 matches start with 'DE'
Change the query to:
SELECT * FROM Test WHERE Utf8Field SIMILAR TO 'DE %' ;
and you'll get 39 results, all starting with "D" but some incorrect
matches like 'DHR. F. BERENDE'.
Now, these same queries on AnsieField (which has the same data but
different character set) will give 3 and 6 results respectively and all
correct.
If you change the query to use LIKE instead of SIMILAR TO on Utf8Field
then you get the correct results.
To me this looks like some of the substring matching in the regexp
engine is using the wrong character set to calculate string sizes, so
instead of 'DELL %' it sees 'DE%' (because the 5 character 'DELL '
literal string is seen as a 2-char one).
I've tested this issue in Firebird 2.5 Beta 1 and Beta 2 running as
superserver on windows vista, 32bit. If you wish I can run the same
tests on xp32 bit and xp 64 bit.
Submitted by: @asfernandes
Reported in fb-devel:
I've encountered a problem using SIMILAR TO syntax on UTF-8 fields. I
searched through the tracker but couldn't find this issue.
What seems to be happening is that when using SIMILAR TO on UTF-8 fields
doesn't get the right pattern length.
I've made a testcase with a bit of data in it that you can get from:
http://www.fantasycomic.com/t/testsimilarto.zip (both 2.5beta2 database
as gbak backup are in there).
In that testcase you'll find a table created with:
CREATE TABLE TEST (
ANSIFIELD Varchar(100),
UTF8FIELD Varchar(100) CHARACTER SET UTF8,
UNICODECIFIELD Varchar(100) CHARACTER SET UTF8 COLLATE UNICODE_CI
);
This table was then filled with a number of records, with the same value
going into each three fields.
The default character set for the database is ISO8859_1. I have done all
my tests using flamerobin 0.8.6.1652 Unicode, connection charset was
ISO8859_1.
Now on that table if you do
SELECT * FROM Test WHERE Utf8Field SIMILAR TO 'DELL %' ;
You'll get 16 results. Amongst those results are incorrect matches like
for example 'DE SINGEL'. All 16 matches start with 'DE'
Change the query to:
SELECT * FROM Test WHERE Utf8Field SIMILAR TO 'DE %' ;
and you'll get 39 results, all starting with "D" but some incorrect
matches like 'DHR. F. BERENDE'.
Now, these same queries on AnsieField (which has the same data but
different character set) will give 3 and 6 results respectively and all
correct.
If you change the query to use LIKE instead of SIMILAR TO on Utf8Field
then you get the correct results.
To me this looks like some of the substring matching in the regexp
engine is using the wrong character set to calculate string sizes, so
instead of 'DELL %' it sees 'DE%' (because the 5 character 'DELL '
literal string is seen as a 2-char one).
I've tested this issue in Firebird 2.5 Beta 1 and Beta 2 running as
superserver on windows vista, 32bit. If you wish I can run the same
tests on xp32 bit and xp 64 bit.
Commits: ba9231d c89d55b
====== Test Details ======
See also test for CORE2006 and list of other related tickets there.
The text was updated successfully, but these errors were encountered: