Issue Details (XML | Word | Printable)

Key: CORE-6094
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Dmitry Yemanov
Reporter: Arioch
Votes: 0
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Outer join tables, on text columns with different collates (perhaps charsets too), may propagate wrong LIKE predicate

Created: 03/Jul/19 12:59 PM   Updated: 03/Jul/19 03:58 PM
Component/s: None
Affects Version/s: 2.1.7, 3.0.3
Fix Version/s: None

Environment: Firebird 2.1.7, Firebird 3.unknown

QA Status: No test


 Description  « Hide
Initial report: https://stackoverflow.com/questions/56858452/why-did-my-where-clause-affect-my-left-join
Initial discussion: https://www.sql.ru/forum/1314381-a/fb3-join-vs-like-vs-case-sensitivity

The example is based upon mismatched collates but probably implementation behavior is the same with regard to charsets too.

DDL:

~~~~~~~~~~~~~~~~~
create table t_CI ( -- case-INSENSITIVE
  id VarChar(10) character set UTF8 primary key collate UNICODE_CI,
  payload integer
);

create table t_CS ( -- case-SENSITIVE
  id VarChar(10) character set UTF8 primary key collate UNICODE,
  payload integer
);


insert into t_CS values ( 'ABCD', -100 );

insert into t_CI values ( 'ABCD', +22 );
~~~~~~~~~~~~~~

We now have two tables with ID fields having different collates. However the ID value is the same within both tables and is within both collates.

When joining those two tables - the results should be either data from both tables or no data at all.

Situation, when upon join by ID data is fetched from only one of the two tables but not from another would be abnormal !!!



Abnormal query returning only half of the data, most probably due to propagation of `LIKE` predicate with wrong collation:
~~~~~~~~~~~
select * from T_CI
 left join T_CS on T_CS.id = T_CI.id
 where T_CI.id like '%bc%'
;

/*
Plan:
PLAN JOIN (T_CI NATURAL, T_CS INDEX (RDB$PRIMARY3))

Adapted plan:
PLAN JOIN (T_CI NATURAL, T_CS INDEX (INTEG_21))
*/
~~~~~~~~~~~
  
  
For comparison:


Correct queries returning data from BOTH tables:
~~~~~~~~
select * from T_CS, T_CI where T_CS.id = T_CI.id
;

select * from T_CS
 left join T_CI on T_CS.id = T_CI.id
;

select * from T_CI
 left join T_CS on T_CS.id = T_CI.id
;

select * from T_CI
 left join T_CS on T_CS.id = T_CI.id
 where T_CI.id like '%BC%'
;

select * from T_CS
 left join T_CI on T_CS.id = T_CI.id
 where T_CS.id like '%BC%'
;

select * from T_CI
 left join T_CS on T_CS.id = T_CI.id
where T_CI.id collate unicode_ci like '%bc%'
;
~~~~~~~~~~~~~~~


Correct queries returning no data at all:
~~~~~~~
select * from T_CS
 left join T_CI on T_CS.id = T_CI.id
where T_CS.id like '%bc%'
;

select * from T_CI
 left join T_CS on T_CS.id = T_CI.id
where T_CI.id collate unicode like '%bc%'
;
~~~~~~~


One more query, somewhat similar, different in engaging master-table index thus probably preventing LIKE-propagation
~~~~~~~~~~
select * from T_CI
 left join T_CS on T_CS.id = T_CI.id
 where T_CI.id starting with 'abc' ;
;

/*
Plan:
PLAN JOIN (T_CI INDEX (RDB$PRIMARY2), T_CS INDEX (RDB$PRIMARY3))
*/
~~~~~~~~~~~~~~~~~~~~~


 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Arioch added a comment - 03/Jul/19 01:47 PM - edited
One more jinxed example.

---------------
select * from T_CI
 left join T_CS on T_CS.id = T_CI.id
where T_CI.id collate unicode like _UTF8 '%bc%' --collate unicode
---------------

Like we saw above, "nailing down" the case-sensitive collation makes the query correctly return zero rows.
Now we move the collation enforcing to another side of the LIKE

---------------------
select * from T_CI
 left join T_CS on T_CS.id = T_CI.id
where T_CI.id /* collate unicode */ like _UTF8 '%bc%' collate unicode
--------------------

Suddenly the query outputs the data row, from both tables.

I can not explain this sudden deviation from the test results above, at all.