New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharp-S character treated incorrectly in UNICODE_CI_AI collation [CORE4136] #4463
Comments
Commented by: Stefan Heymann (stefanheymann) I should add that UNICODE_CI (wihout AI) works correctly on this character: select will show a match for TEST_4, which is correct. |
Modified by: @asfernandesassignee: Adriano dos Santos Fernandes [ asfernandes ] |
Modified by: @asfernandesstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 2.5.3 [ 10461 ] Fix Version: 3.0 Alpha 2 [ 10560 ] |
Commented by: Stefan Heymann (stefanheymann) It's working now (tested 2013-07-08 with Firebird 2.5.3.26671). Thanks! |
Commented by: @pavel-zotov Can anyone who knows German language explain following results: set names utf8; recreate table test(text varchar(10) collate unicode_ci_ai, patt varchar(10) collate unicode_ci_ai); insert into test values('ß','s'); set list on; Output:TEXT ß TEXT ß TEXT ss |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Resolved [ 5 ] QA Status: Done successfully Test Details: Have question about different results of comparison, see issue 29-may-2015. |
Commented by: Stefan Heymann (stefanheymann) Historically, the sharp-s character (ß) is a typographical ligature of a "long" and a "round" lowercase s. Today it is a complete character of its own, so "ß" and "ss" are not the same (you can, however, treat them as the same in sorting). The same is true for "ß" and "s". There is no such thing as an uppercase sharp-S, so the Unicode code point U+1E9E LATIN CAPITAL LETTER SHARP S is completely pointless. |
Commented by: @pcisar Test created. |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Modified by: @pavel-zotovstatus: Closed [ 6 ] => Closed [ 6 ] Test Details: Have question about different results of comparison, see issue 29-may-2015. => Have question about different results of comparison, see issue 29-may-2015. |
Submitted by: Stefan Heymann (stefanheymann)
Is related to QA529
Votes: 1
The UNICODE_CI_AI collation treats the Sharp-s character (U+00DF) incorrectly.
This character (used in German language text) is special in that there is only a lower-case form, no upper-case (having derived from a ligature between a long and a round lowercase "s". Forget about U+1E9E, which is an abstract invention by the Unicode consortium that has no practical use in German language).
To reproduce the bug, try this on a UTF8 database:
select
case when 'Übergeek' collate unicode_ci_ai like 'ÜB%' collate unicode_ci_ai
then '=' else '<>' end as test_1,
case when 'Übergeek' collate unicode_ci_ai like 'üb%' collate unicode_ci_ai
then '=' else '<>' end as test_2,
case when 'Fußball' collate unicode_ci_ai like 'fu%' collate unicode_ci_ai
then '=' else '<>' end as test_3,
case when 'Fußball' collate unicode_ci_ai like 'fuß%' collate unicode_ci_ai
then '=' else '<>' end as test_4,
case when upper ('Fußball') like upper ('fuß%')
then '=' else '<>' end as test_5
from rdb$database
TEST_4 will show a mismatch where it should show a match.
Commits: 0e7302f fb41d66 FirebirdSQL/fbt-repository@9b97921 FirebirdSQL/fbt-repository@41851cd
====== Test Details ======
Have question about different results of comparison, see issue 29-may-2015.
Perhaps, it also related to CORE4739.
See also sample in CORE857 ( 19/Apr/15 08:56 AM )
The text was updated successfully, but these errors were encountered: