Allow hash/merge joins for non-field (dbkey or derived expression) equalities [CORE4528] #4846

firebird-automations · 2014-08-25T12:08:27Z

Test case:

recreate table tn(x int);
commit;
set planonly;

var #⁠1
--------
SQL> select * from (select x a from tn) r join (select x b from tn) s on r.a = s.b;

PLAN HASH (S TN NATURAL, R TN NATURAL) -- OK

var #⁠2
--------
SQL> select * from (select rdb$db_key||'' a from tn) r join (select rdb$db_key||'' b from tn) s on r.a = s.b;

PLAN JOIN (R TN NATURAL, S TN NATURAL) -- hash join could be here

Commits: ee30e8e FirebirdSQL/fbt-repository@ee218fc

====== Test Details ======

As of build #⁠31840 (~20-may-2015), join of three and more sources which is specified by `USING` or `NATURAL` clauses will use nested loops.
See also: CORE4809

The text was updated successfully, but these errors were encountered:

firebird-automations · 2014-08-25T13:51:05Z

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

firebird-automations · 2014-08-25T14:32:53Z

Modified by: @dyemanov

Version: 2.5.3 [ 10461 ]

Version: 2.1.6 [ 10460 ]

Version: 3.0 Alpha 1 [ 10331 ]

Version: 2.5.2 Update 1 [ 10521 ]

Version: 2.1.5 Update 1 [ 10522 ]

Version: 2.5.2 [ 10450 ]

Version: 2.1.5 [ 10420 ]

Component: Engine [ 10000 ]

summary: Improve plan when self-join table using rdb$db_key => Allow hash/merge joins for non-field (dbkey or derived expression) equalities

firebird-automations · 2014-08-25T14:35:53Z

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Beta 1 [ 10332 ]

firebird-automations · 2014-08-25T14:36:26Z

Commented by: @dyemanov

Please test with a next snapshot build.

firebird-automations · 2014-08-25T16:26:17Z

Modified by: @dyemanov

description: Test #⁠1

recreate table tn(x int);
commit;
set planonly;

var #⁠1
--------
SQL> select * from (select x a from tn) r join (select x b from tn) s on r.a = s.b;

PLAN HASH (S TN NATURAL, R TN NATURAL) -- OK

var #⁠2
--------
SQL> select * from (select rdb$db_key||'' a from tn) r join (select rdb$db_key||'' b from tn) s on r.a = s.b;

PLAN JOIN (R TN NATURAL, S TN NATURAL) -- hash join could be here

Test #⁠2

create sequence g; commit;
recreate table t( id int primary key, s varchar(36) ); commit;
insert into t
select i,s from
(
select gen_id(g,1) i, uuid_to_char(gen_uuid()) s
from rdb$types a,rdb$types b,(select 1 i from rdb$types rows 20) rows 10000
)
order by rand();
commit;

var #⁠1 (emulating left join with select only one field from driven table)
--------
select count(*) from (
select http://x.id, (select z.s from t z where z.rdb$db_key = x.k) s
from (select id, rdb$db_key k from t a order by id) x
)

-- works very fast:

Select Expression
-> Singularity Check
-> Filter
-> Table "T" as "Z" Access By ID
-> DBKEY
Select Expression
-> Aggregate
-> Materialize
-> Table "T" as "X A" Access By ID
-> Index "RDB$PRIMARY2" Full Scan
1 records fetched
31 ms, 50015 fetch(es)

Table Natural Index
****************************************************
T 20000

var #⁠2 (common left join using rdb$db_key):
--------
select count(*) from (
select http://x.id,z.s
from (select id, rdb$db_key k from t a order by id) x
left join t z on x.k = z.rdb$db_key
);

-- works very slow:

#⁠ LI-T3.0.0.31288:
Select Expression
-> Aggregate
-> Nested Loop Join (outer)
-> Table "T" as "X A" Access By ID
-> Index "RDB$PRIMARY2" Full Scan
-> Filter
-> Table "T" as "Z" Full Scan
1 records fetched
67248 ms, 227 read(s), 204300015 fetch(es)

Table Natural Index
****************************************************
T 100000000 10000

== vs 2.5: ==

#⁠ LI-V2.5.3.26790:
49146 ms, 204300015 fetch(es)

Table Natural Index Update
***************************************************************
T 100000000 10000

=>

Test case:

recreate table tn(x int);
commit;
set planonly;

var #⁠1
--------
SQL> select * from (select x a from tn) r join (select x b from tn) s on r.a = s.b;

PLAN HASH (S TN NATURAL, R TN NATURAL) -- OK

var #⁠2
--------
SQL> select * from (select rdb$db_key||'' a from tn) r join (select rdb$db_key||'' b from tn) s on r.a = s.b;

PLAN JOIN (R TN NATURAL, S TN NATURAL) -- hash join could be here

firebird-automations · 2014-08-25T17:43:04Z

Commented by: @pavel-zotov

Test again on LI-T3.0.0.31301.

Results:

SQL> recreate table tn(x int primary key using index tn_x); commit;
SQL> set plan on;
SQL> select * from (select x a from tn) r join (select x b from tn) s on r.a = s.b;

PLAN JOIN (R TN NATURAL, S TN INDEX (TN_X)) -- OK, because of index on `X` field

SQL> select * from (select x a from tn) r join (select x b from tn) s on r.a+0 = s.b+0;

PLAN HASH (S TN NATURAL, R TN NATURAL) -- OK: suppress index usage leads to HJ

--------------

Now the same but with RDB$DB_KEY :

SQL> select * from (select rdb$db_key a from tn) r join (select rdb$db_key b from tn) s on r.a = s.b;

PLAN JOIN (R TN NATURAL, S TN INDEX ()) -- OK, because we did not add ||'' (analog for "+0" for numeric fields in join cond.)

-- Now add this ||'':
SQL> select * from (select rdb$db_key a from tn) r join (select rdb$db_key b from tn) s on r.a||'' = s.b||'';

PLAN JOIN (S TN NATURAL, R TN INDEX ()) -- but can`t get HJ.

PS. As workaround: hash join will be in use only if we do like this:

SQL> select * from (select rdb$db_key||'' a from tn) r join (select rdb$db_key||'' b from tn) s on r.a = s.b;

PLAN HASH (S TN NATURAL, R TN NATURAL)

(i.e. move ||'' inside EVERY datasource from JOIN condition).

PPS. I'm not sure is this related to source problem of ticket or no. It's not hard to 'move' all such ||'' inside datasources from join expression but it will be look as something 'not-common'.

firebird-automations · 2014-08-25T19:28:15Z

Commented by: Sean Leyne (seanleyne)

Dmitry,

This case seems to be related to core-4530 which you just opened, are they?

firebird-automations · 2014-08-25T20:27:58Z

Commented by: @dyemanov

No, they're unrelated.

firebird-automations · 2014-08-26T11:56:23Z

Commented by: @dyemanov

Pavel, concatenation is somewhat specific operation for DBKEYs, there may be a different issue. I will investigate and open a separate ticket, if required.

firebird-automations · 2014-08-26T13:18:04Z

Commented by: @pavel-zotov

HASH join often wins vs NL (maybe you remember: I did some tests few months ago and sent results to you).

IMO, it will be worth-while if HJ will be avaliable when joining using rdb$db_key - for instance, to avoid huge memory consuming when doing SORT on wide record.
Of course, this will have sense only when OUTER hash joins will be implemented (to give us ability to specify order of join).

Currently NL using RDB$DB_KEY *wins* 'plain old' SORT if TempCacheLimit set to default (small) 64 M (I'm speaking about SS).

DDL:

recreate table t(id int primary key, x int, y int, s1 varchar(10000), s2 varchar(10000), s3 varchar(10000)); commit;
insert into t
select
row_number()over()
,rand()*100
,rand()*100
,rpad('',10000, uuid_to_char(gen_uuid()))
,rpad('',10000, uuid_to_char(gen_uuid()))
,rpad('',10000, uuid_to_char(gen_uuid()))
from rdb$types,rdb$types,(select 1 i from rdb$types) rows 100000;
commit;

Consider that we need to get such result: select x,s1,s2,s3 from t order by y;

Test-1: sort only 'narrow' source + Nested Loops join with rest fields using RDB$DB_KEY:

commit; select mon$stat_id si,mon$stat_group sg,mon$max_memory_used mmu,mon$max_memory_allocated mma from mon$memory_usage order by sg,si;
out /dev/null;
set stat on;

select y.x, y.s1, y.s2, y.s3
from(
select rdb$db_key k from t order by y
) x
left join t y on x.k=y.rdb$db_key;

set stat off;
out;
commit; select mon$stat_id si,mon$stat_group sg,mon$max_memory_used mmu,mon$max_memory_allocated mma from mon$memory_usage order by sg,si;

Test-2: 'plain old' way to SORT rows

commit; select mon$stat_id si,mon$stat_group sg,mon$max_memory_used mmu,mon$max_memory_allocated mma from mon$memory_usage order by sg,si;
out /dev/null;
set stat on;
select x,s1,s2,s3 from t order by y; -- no joins, but HUGE memory consumption (see below)
set stat off;
out;
commit; select mon$stat_id si,mon$stat_group sg,mon$max_memory_used mmu,mon$max_memory_allocated mma from mon$memory_usage order by sg,si;

RESULTS:
#⁠#⁠#⁠#⁠#⁠#⁠#⁠#⁠

Trace for TEST-1:

select y.x, y.s1, y.s2, y.s3
from(
select rdb$db_key k from t order by y
) x
left join t y on x.k=y.rdb$db_key
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Select Expression
-> Nested Loop Join (outer)
-> Sort (record length: 24, key length: 8)
-> Table "T" as "X T" Full Scan
-> Filter
-> Table "T" as "Y" Access By ID
-> DBKEY
100000 records fetched
4844 ms, 1001585 read(s), 1900056 fetch(es)

Table Natural Index
****************************************************
T 100000 100000

TempSpace usage for TEST-1: NO detected fb_sort_* files which exceeded 64M (command for watch: lsof -a +L1 /dev/shm)

Trace for TEST-2:

select x,s1,s2,s3 from t order by y
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Select Expression
-> Sort (record length: 30040, key length: 8)
-> Table "T" Full Scan
100000 records fetched
10328 ms, 497974 read(s), 1000056 fetch(es)

Table Natural Index
****************************************************
T 100000

TempSpace usage for TEST-2:

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
firebird 29585 firebird 6u REG 0,17 869212160 0 2199948 /dev/shm/fb_sort_VBA1RC (deleted)

Tue Aug 26 14:05:13 MSK 2014
firebird 29585 firebird 6u REG 0,17 1663041536 0 2199948 /dev/shm/fb_sort_VBA1RC (deleted)

Tue Aug 26 14:05:15 MSK 2014
firebird 29585 firebird 6u REG 0,17 2389704704 0 2199948 /dev/shm/fb_sort_VBA1RC (deleted)

Tue Aug 26 14:05:17 MSK 2014
firebird 29585 firebird 6u REG 0,17 3094347776 0 2199948 /dev/shm/fb_sort_VBA1RC (deleted)

As we can see, ~3 Gb of memory was consumed for this sort.

PS. Test with Hash Join show results worse than NL, but I had to run INCORRECT query: INNER join only can be used now.

select y.x, y.s1, y.s2, y.s3
from(
select rdb$db_key||'' k from t order by y
) x
join t y on x.k||'' = y.rdb$db_key||'' -- ::: NB ::: unfortunately, only INNER join can be used now for hash
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Select Expression
-> Filter
-> Hash Join (inner)
-> Table "T" as "Y" Full Scan
-> Record Buffer (record length: 25)
-> Sort (record length: 24, key length: 8)
-> Table "T" as "X T" Full Scan
100000 records fetched
6015 ms, 1500073 read(s), 2000112 fetch(es)

Table Natural Index Updat
************************************************************
T 200000

But here also NO detected fb_sort_* files which exceeded 64M.

firebird-automations · 2015-05-27T19:52:48Z

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Done successfully

Test Details: As of build #⁠31840 (~20-may-2015), join of three and more sources which is specified by `USING` or `NATURAL` clauses will use nested loops.
See also: CORE4809

firebird-automations closed this as completed May 27, 2015

firebird-automations assigned dyemanov Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow hash/merge joins for non-field (dbkey or derived expression) equalities [CORE4528] #4846

Allow hash/merge joins for non-field (dbkey or derived expression) equalities [CORE4528] #4846

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

description: Test #⁠1

Test #⁠2

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 26, 2014

firebird-automations commented Aug 26, 2014

firebird-automations commented May 27, 2015

Allow hash/merge joins for non-field (dbkey or derived expression) equalities [CORE4528] #4846

Allow hash/merge joins for non-field (dbkey or derived expression) equalities [CORE4528] #4846

Comments

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

description: Test #⁠1

Test #⁠2

firebird-automations commented Aug 25, 2014

Results:

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 25, 2014

firebird-automations commented Aug 26, 2014

firebird-automations commented Aug 26, 2014

DDL:

Test-1: sort only 'narrow' source + Nested Loops join with rest fields using RDB$DB_KEY:

Test-2: 'plain old' way to SORT rows

Trace for TEST-1:

Trace for TEST-2:

firebird-automations commented May 27, 2015