You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Accent insensitive comparison: diacritical letters with DIAGONAL crossing stroke pass only test on EQUALITY to their non-accented forms [CORE4739]
#5044
Ø = U+00D8 // LATIN CAPITAL LETTER O WITH STROKE' (U+00D8), used in danish & iceland alphabets;
Ð = U+00D0 // LATIN CAPITAL LETTER ETH' (U+00D0), iceland
Ŀ = U+013F // LATIN CAPITAL LETTER L WITH MIDDLE DOT' (U+013F), catalone (valencian)
Ł = U+0141 // LATIN CAPITAL LETTER L WITH STROKE' (U+0141), polish
-- can be compared with their non-accented forms only using '=' or 'is NOT distinct from' for getting result TRUE.
Other kinds of comparison: STARTING WITH, LIKE, SIMILAR TO and evaluation of result POS() - fails.
Test query:
with recursive
d as \(
select
cast\( 'ØÐ' \|\| 'Ł' \|\| 'Ŀ' \|\| 'ĘĄĂÂÎŢŐŰĖÅĽĢÁÉÍÓÚÝÀÈÌÒÙÂÊÎÔÛÃÑÕÄËÏÖÜŸÇŠĄĘŹŻĂŞŢ' as varchar\(80\) character set utf8\) s
,cast\( 'OD' \|\| 'L' \|\| 'L' \|\| 'EAAAITOUEALGAEIOUYAEIOUAEIOUANOAEIOUYCSAEZZAST' as varchar\(80\) character set utf8\) t
from rdb$database
\)
,r as\(select 1 i from rdb$database union all select r\.i\+1 from r where r\.i < 100\)
,e as\(
select
substring\(d\.s from r\.i for 1\) c
,substring\(d\.t from r\.i for 1\) t
from d join r on r\.i <= char\_length\(d\.s\)
\)
,f as \(
select
e\.c as utf\_char
,e\.t as latin\_char
,iif\( e\.c collate co\_utf8\_ci\_ai = e\.t, 1, 0 \) equal\_test
,iif\( position\(e\.t, e\.c collate co\_utf8\_ci\_ai\) \>0 , 1, 0 \) pos\_test
,iif\( e\.c collate co\_utf8\_ci\_ai starting with e\.t, 1, 0 \) start\_with\_test
,iif\( e\.c collate co\_utf8\_ci\_ai like e\.t, 1, 0 \) like\_test
,iif\( e\.c collate co\_utf8\_ci\_ai similar to e\.t, 1, 0 \) similar\_to\_letter\_test
,iif\( e\.c collate co\_utf8\_ci\_ai similar to '\[\[:ALPHA:\]\]', 1, 0 \) similar\_to\_alpha\_test
from e
\)
select \*
from f
order by equal\_test \+ pos\_test \+ start\_with\_test \+ like\_test \+ similar\_to\_letter\_test \+ similar\_to\_alpha\_test
,utf\_char
;
Result that I've got on Windows and Linux can be seen in attach (screenshot).
Test Details: Perhaps, it also related to CORE4736 ("Sharp-S character treated incorrectly in UNICODE_CI_AI collation"). => Perhaps, it also related to CORE4136 ("Sharp-S character treated incorrectly in UNICODE_CI_AI collation").
Submitted by: @pavel-zotov
Attachments:
diacritical-comparison-of-letters-with-diagonal-stokes.png.zip
The following letters:
Ø = U+00D8 // LATIN CAPITAL LETTER O WITH STROKE' (U+00D8), used in danish & iceland alphabets;
Ð = U+00D0 // LATIN CAPITAL LETTER ETH' (U+00D0), iceland
Ŀ = U+013F // LATIN CAPITAL LETTER L WITH MIDDLE DOT' (U+013F), catalone (valencian)
Ł = U+0141 // LATIN CAPITAL LETTER L WITH STROKE' (U+0141), polish
-- can be compared with their non-accented forms only using '=' or 'is NOT distinct from' for getting result TRUE.
Other kinds of comparison: STARTING WITH, LIKE, SIMILAR TO and evaluation of result POS() - fails.
Test query:
Result that I've got on Windows and Linux can be seen in attach (screenshot).
Commits: aa70f4f FirebirdSQL/fbt-repository@38c40cf
====== Test Details ======
Perhaps, it also related to CORE4136 ("Sharp-S character treated incorrectly in UNICODE_CI_AI collation").
The text was updated successfully, but these errors were encountered: