Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid result of comparing of the same character in different cases (upper/lower) [CORE6194] #6439

Closed
firebird-automations opened this issue Nov 21, 2019 · 12 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Vladimir Lomov (l_vv)

Maybe I misunderstand something, but it seems that comparing the same character in different cases does not work correctly.
The same example works OK in Oracle.

Test case:

select ascii_val('Z') as Z_Upper, -- = 90
ascii_val('z') as Z_Lower, -- = 122
ascii_char(90) as Z_Upper_num, -- = 'Z'
ascii_char(122) as Z_Lower_num, -- = 'z'
case
when 'Z' = 'z' then 1 -- Why do we get here?
else 2
end as char_compare,
case
when ascii_char(90) = ascii_char(122) then 1
else 2 -- We are here, it is correct
end as ascii_char_compare
from rdb$database;

Database creation script:
SET SQL DIALECT 3;

CREATE DATABASE 'Test.db'
USER SYSDBA
PAGE_SIZE 16384 DEFAULT CHARACTER SET UTF8 COLLATION UNICODE_CI_AI;

commit;

Should COLLATION UNICODE_CI_AI affect the comparison of characters in different registers, or only sorting order?

@firebird-automations
Copy link
Collaborator Author

Modified by: Vladimir Lomov (l_vv)

description: Maybe I misunderstand something, but it seems that comparing the same character in different cases does not work correctly.
The same example works OK in Oracle.

Test case:

select ascii_val('Z') as Z_Upper, -- = 90
ascii_val('z') as Z_Lower, -- = 122
ascii_char(90) as Z_Upper_num, -- = 'Z'
ascii_char(122) as Z_Lower_num, -- = 'z'
case
when 'Z' = 'z' then 1 -- Why do we get here?
else 2
end as char_compare,
case
when ascii_char(90) = ascii_char(122) then 1
else 2 -- We are here, it is correct
end as ascii_char_compare
from rdb$database;

=>

Maybe I misunderstand something, but it seems that comparing the same character in different cases does not work correctly.
The same example works OK in Oracle.

Test case:

select ascii_val('Z') as Z_Upper, -- = 90
ascii_val('z') as Z_Lower, -- = 122
ascii_char(90) as Z_Upper_num, -- = 'Z'
ascii_char(122) as Z_Lower_num, -- = 'z'
case
when 'Z' = 'z' then 1 -- Why do we get here?
else 2
end as char_compare,
case
when ascii_char(90) = ascii_char(122) then 1
else 2 -- We are here, it is correct
end as ascii_char_compare
from rdb$database;

Database creation script:
SET SQL DIALECT 3;

CREATE DATABASE 'Test.db'
USER SYSDBA
PAGE_SIZE 16384 DEFAULT CHARACTER SET UTF8 COLLATION UNICODE_CI_AI;

commit;

@firebird-automations
Copy link
Collaborator Author

Modified by: Vladimir Lomov (l_vv)

description: Maybe I misunderstand something, but it seems that comparing the same character in different cases does not work correctly.
The same example works OK in Oracle.

Test case:

select ascii_val('Z') as Z_Upper, -- = 90
ascii_val('z') as Z_Lower, -- = 122
ascii_char(90) as Z_Upper_num, -- = 'Z'
ascii_char(122) as Z_Lower_num, -- = 'z'
case
when 'Z' = 'z' then 1 -- Why do we get here?
else 2
end as char_compare,
case
when ascii_char(90) = ascii_char(122) then 1
else 2 -- We are here, it is correct
end as ascii_char_compare
from rdb$database;

Database creation script:
SET SQL DIALECT 3;

CREATE DATABASE 'Test.db'
USER SYSDBA
PAGE_SIZE 16384 DEFAULT CHARACTER SET UTF8 COLLATION UNICODE_CI_AI;

commit;

=>

Maybe I misunderstand something, but it seems that comparing the same character in different cases does not work correctly.
The same example works OK in Oracle.

Test case:

select ascii_val('Z') as Z_Upper, -- = 90
ascii_val('z') as Z_Lower, -- = 122
ascii_char(90) as Z_Upper_num, -- = 'Z'
ascii_char(122) as Z_Lower_num, -- = 'z'
case
when 'Z' = 'z' then 1 -- Why do we get here?
else 2
end as char_compare,
case
when ascii_char(90) = ascii_char(122) then 1
else 2 -- We are here, it is correct
end as ascii_char_compare
from rdb$database;

Database creation script:
SET SQL DIALECT 3;

CREATE DATABASE 'Test.db'
USER SYSDBA
PAGE_SIZE 16384 DEFAULT CHARACTER SET UTF8 COLLATION UNICODE_CI_AI;

commit;

Should COLLATION UNICODE_CI_AI affect the comparison of characters in different registers, or only sorting order?

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Did not reproduce.
The result for LI-T4.0.0.1650-dev Firebird 4.0 Beta 1 is:

Z_UPPER Z_LOWER Z_UPPER_NUM Z_LOWER_NUM CHAR_COMPARE ASCII_CHAR_COMPARE
======= ======= =========== =========== ============ ==================
90 122 Z z 2 2

(i.e. as expected).
Test.db was created with copy/paste of to your scipt.

@firebird-automations
Copy link
Collaborator Author

Commented by: Vladimir Lomov (l_vv)

Just run this test, will attach screentshot below:

isql -quiet -bail -input test.sql

Contents of test.sql:

SET NAMES UTF8;

connect localhost:Test.db user XXX password YYY ROLE DB_OWNER;

SET SQL DIALECT 3;

select ascii_val('Z') as Z_Upper, -- = 90
ascii_val('z') as Z_Lower, -- = 122
ascii_char(90) as Z_Upper_num, -- = 'Z'
ascii_char(122) as Z_Lower_num, -- = 'z'
case
when 'Z' = 'z' then 1 -- Why do we get here?
else 2
end as char_compare,
case
when ascii_char(90) = ascii_char(122) then 1
else 2 -- We are here, it is correct
end as ascii_char_compare
from rdb$database;

commit;

exit;

@firebird-automations
Copy link
Collaborator Author

Commented by: Vladimir Lomov (l_vv)

Sorry, can't attach screenshot.

Paste text output of isql:

Y:\home\data\projects\Test\sql\01_firebird\02_tables_create>99_test.bat Active code page: 1251 21.11.2019 17:50:12 Executing test.sql... Z_UPPER Z_LOWER Z_UPPER_NUM Z_LOWER_NUM CHAR_COMPARE ASCII_CHAR_COMPARE ======= ======= =========== =========== ============ ================== 90 122 Z z 1 2 21.11.2019 17:50:13 Done Active code page: 866

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

With
SET NAMES UTF8;
reproduced.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Your literals uses UNICODE_CI_AI (case insensitive), while ASCII_CHAR returns data in ASCII charset/collation (case sensitive) internally in the engine, as conversion to connection charset only happens when data is going to client.

So, there is no bug.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Vladimir,

> Should COLLATION UNICODE_CI_AI affect the comparison of characters in different registers, or only sorting order?

From SQL Standard:

----------------
8.2 <comparison predicate>
...
3) The comparison of two character strings is determined as follows:

a) Syntax Rules of Subclause 9.13, "Collation determination", are applied with the declared types of the
two character strings as VALSET, yielding CS returned from COLL.

b) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is
effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to
the length of the longer string by concatenation on the right of one or more pad characters, where the
pad character is chosen based on CS. If CS has the NO PAD characteristic, then the pad character is
an implementation-dependent character different from any character in the character set of X and Y
that collates less than any string under CS. Otherwise, the pad character is a <space>.

c) The result of the comparison of X and Y is given by the collation CS.

d) Depending on the collation, two strings may compare as equal even if they are of different lengths or
contain different sequences of characters
...
----------------

You see - strings comparison is affected by collation

@firebird-automations
Copy link
Collaborator Author

Commented by: Vladimir Lomov (l_vv)

But what then, if in select you need sort row case-insensitive, but in PSQL you need compare data from same columns case-sensitive?
Using upper () / lower () in oreder by will reduce query performance :(

And perhaps this should be clarified in the documentation?

The Language Reference for version 2.5 (in English) does not say what COLLATION specifically applies to - only to sort order, or to compare characters, too:

1. Firebird 2.5 Language Reference (English), page 22

UNICODE_CI_AI Case-insensitive, accent-insensitive collation, works alphabetically without taking
character case or accents into account. Added in Firebird 2.5

2. The Language Reference for version 4.0 (in Russian) says that COLLATION applies to sort order:

Firebird 4.0 Beta 2 DBMS SQL Language Reference, page 52
UNICODE_CI_AI Sorting is case-insensitive, in alphabetical order.
(UNICODE_CI_AI Сортировка происходит без учёта регистра символа, в алфавитном порядке).

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Based on comments from Adriano and Vlad, there is no issue to be resolved.

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Won't Fix [ 2 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant