Accent insensitive comparison: diacritical letters with DIAGONAL crossing stroke pass only test on EQUALITY to their non-accented forms [CORE4739] #5044

firebird-automations · 2015-04-07T02:03:22Z

Attachments:
diacritical-comparison-of-letters-with-diagonal-stokes.png.zip

The following letters:

Ø = U+00D8 // LATIN CAPITAL LETTER O WITH STROKE' (U+00D8), used in danish & iceland alphabets;
Ð = U+00D0 // LATIN CAPITAL LETTER ETH' (U+00D0), iceland
Ŀ = U+013F // LATIN CAPITAL LETTER L WITH MIDDLE DOT' (U+013F), catalone (valencian)
Ł = U+0141 // LATIN CAPITAL LETTER L WITH STROKE' (U+0141), polish

-- can be compared with their non-accented forms only using '=' or 'is NOT distinct from' for getting result TRUE.
Other kinds of comparison: STARTING WITH, LIKE, SIMILAR TO and evaluation of result POS() - fails.

Test query:

with recursive
d as \(
    select
     cast\( 'ØÐ' \|\| 'Ł' \|\| 'Ŀ' \|\| 'ĘĄĂÂÎŢŐŰĖÅĽĢÁÉÍÓÚÝÀÈÌÒÙÂÊÎÔÛÃÑÕÄËÏÖÜŸÇŠĄĘŹŻĂŞŢ' as varchar\(80\) character set utf8\) s
    ,cast\( 'OD' \|\| 'L' \|\| 'L' \|\| 'EAAAITOUEALGAEIOUYAEIOUAEIOUANOAEIOUYCSAEZZAST' as varchar\(80\) character set utf8\) t
    from rdb$database
\)
,r as\(select 1 i from rdb$database union all select r\.i\+1 from r where r\.i < 100\)
,e as\(
    select
         substring\(d\.s from r\.i for 1\) c
        ,substring\(d\.t from r\.i for 1\) t
    from d join r on r\.i <= char\_length\(d\.s\)
\)
,f as \(
    select
         e\.c as utf\_char
        ,e\.t as latin\_char
        ,iif\( e\.c collate co\_utf8\_ci\_ai = e\.t, 1, 0 \) equal\_test
        ,iif\( position\(e\.t, e\.c collate co\_utf8\_ci\_ai\) \>0 , 1, 0 \) pos\_test
        ,iif\( e\.c collate co\_utf8\_ci\_ai starting with e\.t, 1, 0 \) start\_with\_test
        ,iif\( e\.c collate co\_utf8\_ci\_ai like e\.t, 1, 0 \) like\_test
        ,iif\( e\.c collate co\_utf8\_ci\_ai similar to e\.t, 1, 0 \) similar\_to\_letter\_test
        ,iif\( e\.c collate co\_utf8\_ci\_ai similar to '\[\[:ALPHA:\]\]', 1, 0 \) similar\_to\_alpha\_test
    from e
\)
select \*
from f
order by equal\_test \+ pos\_test \+ start\_with\_test \+ like\_test \+ similar\_to\_letter\_test \+ similar\_to\_alpha\_test
        ,utf\_char
;

Result that I've got on Windows and Linux can be seen in attach (screenshot).

Commits: aa70f4f FirebirdSQL/fbt-repository@38c40cf

====== Test Details ======

Perhaps, it also related to CORE4136 ("Sharp-S character treated incorrectly in UNICODE_CI_AI collation").

The text was updated successfully, but these errors were encountered:

firebird-automations · 2015-04-07T02:06:05Z

Modified by: @pavel-zotov

Attachment: diacritical-comparison-of-letters-with-diagonal-stokes.png.zip [ 12700 ]

firebird-automations · 2015-05-29T17:17:31Z

Modified by: @pavel-zotov

status: Open [ 1 ] => Open [ 1 ]

QA Status: No test

Test Details: Perhaps, it also related to CORE4736 ("Sharp-S character treated incorrectly in UNICODE_CI_AI collation").

firebird-automations · 2018-06-25T09:34:48Z

Modified by: @pavel-zotov

status: Open [ 1 ] => Open [ 1 ]

Test Details: Perhaps, it also related to CORE4736 ("Sharp-S character treated incorrectly in UNICODE_CI_AI collation"). => Perhaps, it also related to CORE4136 ("Sharp-S character treated incorrectly in UNICODE_CI_AI collation").

firebird-automations · 2019-10-10T18:46:22Z

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

firebird-automations · 2019-10-10T19:01:12Z

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Beta 2 [ 10888 ]

firebird-automations · 2019-10-11T11:44:52Z

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Done successfully

firebird-automations · 2019-10-11T11:45:00Z

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

firebird-automations closed this as completed Oct 11, 2019

firebird-automations added fix-version: 4.0 Beta 2 resolution: fixed priority: minor component: charsets/collation type: bug qa: done successfully labels Apr 25, 2021

firebird-automations assigned asfernandes Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accent insensitive comparison: diacritical letters with DIAGONAL crossing stroke pass only test on EQUALITY to their non-accented forms [CORE4739] #5044

Accent insensitive comparison: diacritical letters with DIAGONAL crossing stroke pass only test on EQUALITY to their non-accented forms [CORE4739] #5044

firebird-automations commented Apr 7, 2015

firebird-automations commented Apr 7, 2015

firebird-automations commented May 29, 2015

firebird-automations commented Jun 25, 2018

firebird-automations commented Oct 10, 2019

firebird-automations commented Oct 10, 2019

firebird-automations commented Oct 11, 2019

firebird-automations commented Oct 11, 2019

Accent insensitive comparison: diacritical letters with DIAGONAL crossing stroke pass only test on EQUALITY to their non-accented forms [CORE4739] #5044

Accent insensitive comparison: diacritical letters with DIAGONAL crossing stroke pass only test on EQUALITY to their non-accented forms [CORE4739] #5044

Comments

firebird-automations commented Apr 7, 2015

Test query:

firebird-automations commented Apr 7, 2015

firebird-automations commented May 29, 2015

firebird-automations commented Jun 25, 2018

firebird-automations commented Oct 10, 2019

firebird-automations commented Oct 10, 2019

firebird-automations commented Oct 11, 2019

firebird-automations commented Oct 11, 2019