Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HASH/MERGE JOIN is not used for more than two streams if they are joined via USING/NATURAL clauses and join is based on DBKEY concatenations [CORE4809] #5107

Closed
firebird-automations opened this issue May 20, 2015 · 10 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Votes: 1

recreate table tn(x int primary key using index tn_x);
commit;
insert into tn select row_number()over() from rdb$types;
commit;

set planonly;

----------- test `traditional` join form -----------------

select count(*)
from (select rdb$db_key||'' a from tn) r
join (select rdb$db_key||'' a from tn) s on r.a = s.a;

PLAN HASH (S TN NATURAL, R TN NATURAL)

select count(*)
from (select rdb$db_key||'' a from tn) r
join (select rdb$db_key||'' a from tn) s on r.a = s.a
join (select rdb$db_key||'' a from tn) t on s.a = t.a;

PLAN HASH (HASH (T TN NATURAL, S TN NATURAL), R TN NATURAL)

select count(*)
from (select rdb$db_key||'' a from tn) r
join (select rdb$db_key||'' a from tn) s on r.a = s.a
join (select rdb$db_key||'' a from tn) t on s.a = t.a
join (select rdb$db_key||'' a from tn) u on t.a = u.a;

PLAN HASH (HASH (HASH (U TN NATURAL, T TN NATURAL), S TN NATURAL), R TN NATURAL)

Conclusion: ALL OK for this form of joins, optimizer DOES take in account ability to apply HJ regardless of data sources number.

----------- test join on named columns form -----------------

select count(*)
from (select rdb$db_key||'' a from tn) r
join (select rdb$db_key||'' a from tn) s using(a);

PLAN HASH (S TN NATURAL, R TN NATURAL)

select count(*)
from (select rdb$db_key||'' a from tn) r
join (select rdb$db_key||'' a from tn) s using(a)
join (select rdb$db_key||'' a from tn) t using(a);

PLAN JOIN (HASH (S TN NATURAL, R TN NATURAL), T TN NATURAL) ----------- NESTED LOOPS on outer phase

select count(*)
from (select rdb$db_key||'' a from tn) r
join (select rdb$db_key||'' a from tn) s using(a)
join (select rdb$db_key||'' a from tn) t using(a)
join (select rdb$db_key||'' a from tn) u using(a);

PLAN JOIN (HASH (S TN NATURAL, R TN NATURAL), T TN NATURAL, U TN NATURAL) ----------- NESTED LOOPS on outer phase

----------- test natural join form -----------------

select count(*)
from (select rdb$db_key||'' a from tn) r
natural join (select rdb$db_key||'' a from tn) s;

PLAN HASH (S TN NATURAL, R TN NATURAL)

select count(*)
from (select rdb$db_key||'' a from tn) r
natural join (select rdb$db_key||'' a from tn) s
natural join (select rdb$db_key||'' a from tn) t;

PLAN JOIN (HASH (S TN NATURAL, R TN NATURAL), T TN NATURAL) ----------- NESTED LOOPS on outer phase

select count(*)
from (select rdb$db_key||'' a from tn) r
natural join (select rdb$db_key||'' a from tn) s
natural join (select rdb$db_key||'' a from tn) t
natural join (select rdb$db_key||'' a from tn) u;

PLAN JOIN (HASH (S TN NATURAL, R TN NATURAL), T TN NATURAL, U TN NATURAL) ----------- NESTED LOOPS on outer phase

Commits: cb6ddcd 07263d6 FirebirdSQL/fbt-repository@f697ea0 FirebirdSQL/fbt-repository@f93b285

====== Test Details ======

Test verifies only 3.0. For 2.5.x see test for CORE4822.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

It appears that USING is not SQL standard conformant -- it seems to be an MySQL'ism.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

> USING is not SQL standard conforman

Quote 5WD-02-Foundation-2003-09.pdf (SQL-2003), page 312:

7.7 <joined table>
Function
Specify a table derived from a Cartesian product, inner join, or outer join.

Format
<joined table> ::=
<cross join>
| <qualified join>
| <natural join>
<cross join> ::=

CROSS JOIN
::=
\[ \] JOIN
::=
NATURAL\[ \] JOIN
\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \[ 1 \] ::= \| ::= ON ::= USING \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- \[ 2 \] ::=

What about this ?

PS. I was sure that JOIN USING & NATURAL JOIN both are just 'syntax sugar' in FB... :-)

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Sorry, I had done some googling and could not find any usage aside from MySQL, hence the comment.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Version: 2.5.4 [ 10585 ]

Version: 2.5.3 Update 1 [ 10650 ]

Version: 2.1.7 [ 10651 ]

Version: 3.0 Beta 1 [ 10332 ]

Version: 2.5.3 [ 10461 ]

Version: 2.1.6 [ 10460 ]

Version: 3.0 Alpha 2 [ 10560 ]

Version: 3.0 Alpha 1 [ 10331 ]

Version: 2.5.2 Update 1 [ 10521 ]

Version: 2.1.5 Update 1 [ 10522 ]

Version: 2.5.2 [ 10450 ]

Version: 2.5.1 [ 10333 ]

Version: 2.5.0 [ 10221 ]

Component: Engine [ 10000 ]

summary: Force optimizer to consider usage of HASH JOIN when number of data sources is three and more and they are joined on USING(<col>) or by NATURAL clauses => Force optimizer to consider usage of HASH/MERGE JOIN when number of data sources is three and more and they are joined on USING(<col>) or by NATURAL clauses

Version: 3.0 Beta 2 [ 10586 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

If key for join is changed from RDB$DB_KEY to common field than all become OK in 3.0: optimizer DOES make plans with HASH join in all cases (checked on WI-T3.0.0.31846):

recreate table tn(id_key int primary key using index tn_id_key);
commit;
insert into tn(id_key)
with recursive r as (select 0 i from rdb$database union all select r.i+1 from r where r.i<99)
select r1.i*100+r0.i from r r1, r r0;
commit;
set statistics index tn_id_key;
commit;

set planonly;

----------- test `traditional` join form -----------------

select count(*)
from (select id_key||'' a from tn) r
join (select id_key||'' a from tn) s on r.a = s.a;

PLAN HASH (S TN NATURAL, R TN NATURAL)

select count(*)
from (select id_key||'' a from tn) r
join (select id_key||'' a from tn) s on r.a = s.a
join (select id_key||'' a from tn) t on s.a = t.a;

PLAN HASH (HASH (T TN NATURAL, S TN NATURAL), R TN NATURAL)

select count(*)
from (select id_key||'' a from tn) r
join (select id_key||'' a from tn) s on r.a = s.a
join (select id_key||'' a from tn) t on s.a = t.a
join (select id_key||'' a from tn) u on t.a = u.a;

PLAN HASH (HASH (HASH (U TN NATURAL, T TN NATURAL), S TN NATURAL), R TN NATURAL)

----------- test join on named columns form -----------------

select count(*)
from (select id_key||'' a from tn) r
join (select id_key||'' a from tn) s using(a);

PLAN HASH (S TN NATURAL, R TN NATURAL)

select count(*)
from (select id_key||'' a from tn) r
join (select id_key||'' a from tn) s using(a)
join (select id_key||'' a from tn) t using(a);

PLAN HASH (T TN NATURAL, HASH (S TN NATURAL, R TN NATURAL))

select count(*)
from (select id_key||'' a from tn) r
join (select id_key||'' a from tn) s using(a)
join (select id_key||'' a from tn) t using(a)
join (select id_key||'' a from tn) u using(a);

PLAN HASH (U TN NATURAL, HASH (T TN NATURAL, HASH (S TN NATURAL, R TN NATURAL)))

----------- test natural join form -----------------

select count(*)
from (select id_key||'' a from tn) r
natural join (select id_key||'' a from tn) s;

PLAN HASH (S TN NATURAL, R TN NATURAL)

select count(*)
from (select id_key||'' a from tn) r
natural join (select id_key||'' a from tn) s
natural join (select id_key||'' a from tn) t;

PLAN HASH (T TN NATURAL, HASH (S TN NATURAL, R TN NATURAL))

select count(*)
from (select id_key||'' a from tn) r
natural join (select id_key||'' a from tn) s
natural join (select id_key||'' a from tn) t
natural join (select id_key||'' a from tn) u;

PLAN HASH (U TN NATURAL, HASH (T TN NATURAL, HASH (S TN NATURAL, R TN NATURAL)))

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

issuetype: Improvement [ 4 ] => Bug [ 1 ]

summary: Force optimizer to consider usage of HASH/MERGE JOIN when number of data sources is three and more and they are joined on USING(<col>) or by NATURAL clauses => HASH/MERGE JOIN is not used for more than two streams if they are joined via USING/NATURAL clauses and join is based on DBKEY concatenations

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Beta 2 [ 10586 ]

Fix Version: 2.5.5 [ 10670 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Done successfully

Test Details: Test verifies only 3.0. For 2.5.x see test for CORE4822.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment