You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
--create sequence g;altersequence g restart with 0;
recreate table t1(x int, y int);
recreate table t2(x int);
insert into t1
select mod(i,3), i
from (
select gen_id(g,1) i
from rdb$types a, rdb$types b, (select1 i from rdb$types rows 20)
union allselect1from rdb$database where1=0-- "materialization": prevent from repeating evaluation gen_id() in the outer part of query
rows 1000000
);
insert into t2
select mod(i, 20)
from (
select gen_id(g,1) i
from rdb$types
union allselect1from rdb$database where1=0
rows 20
);
commit;
createindext1_x_yon t1(x, y);
createindext2_xon t2(x);
commit;
This script will produce table T1 with 1'000'000 rows and T2 with 20 rows, and their index statistics will be:
SQL>set list on; select rdb$relation_name,rdb$index_name,rdb$statistics from rdb$indices where rdb$relation_name in ('T1','T2');
RDB$RELATION_NAME T1
RDB$INDEX_NAME T1_X_Y
RDB$STATISTICS 9\.999999974752427e-07
RDB$RELATION_NAME T2
RDB$INDEX_NAME T2_X
RDB$STATISTICS 0.05000000074505806
Now run these queries:
RUN-1
selectcount(*)
from t1 a join t2 b ona.x=b.xwhereb.x=0;
RUN-2
selectcount(*)
from t2 b left join t1 a using(x)
where x=0; -- yes, unqualified (neither "a" nor "b" prefix here)
RUN-3
selectcount(*)
from t1 a join t2 b using(x)
where x =0; -- yes, unqualified (neither "a" nor "b" prefix here)
Then:
Compare plan of RUN-1 and RUN-2 & RUN-3: the 2nd and 3rd will use NATURAL reads which leads to poor performance
Compare performance statistics on 2.5 and 3.0: the former WINS
I got following data for mentioned queries:
RUN-1
WI-V2.5.5.26952:
selectcount(*)
from t1 a join t2 b ona.x=b.xwhereb.x=0
PLAN JOIN (B INDEX (T2_X), A INDEX (T1_X_Y))
1 records fetched
919 ms, 13500 read(s), 667169 fetch(es)
Table Natural Index
****************************************************
T1 333333
T2 1
WI-V3.0.0.32208:
selectcount(*)
from t1 a join t2 b ona.x=b.xwhereb.x=0Select Expression
-> Aggregate
-> Nested Loop Join (inner)
-> Filter
-> Table "T2"as"B" Access By ID
-> Bitmap
-> Index "T2_X" Range Scan (full match)
-> Filter
-> Table "T1"as"A" Access By ID
-> Bitmap
-> Index "T1_X_Y" Range Scan (partial match: 1/2)
1 records fetched
1164 ms, 6702 read(s), 666918 fetch(es) -- elapsed time more than by 25% exceeds 2.5. Why ?
Table Natural Index
****************************************************
T1 333333
T2 1
RUN-2
WI-V2.5.5.26952:
selectcount(*)
from t2 b left join t1 a using(x)
where x=0
PLAN JOIN (B NATURAL, A INDEX (T1_X_Y)) -- natural here is EXPECTED because of LEFT join1 records fetched
2779 ms, 40516 read(s), 2001616 fetch(es)
Table Natural Index
****************************************************
T1 1000000
T2 20
WI-V3.0.0.32208:
selectcount(*)
from t2 b left join t1 a using(x)
where x=0Select Expression
-> Aggregate
-> Filter
-> Nested Loop Join (outer)
-> Table "T2"as"B" Full Scan
-> Filter
-> Table "T1"as"A" Access By ID
-> Bitmap
-> Index "T1_X_Y" Range Scan (partial match: 1/2)
1 records fetched
3651 ms, 20120 read(s), 2000840 fetch(es) -- elapsed time more than by 30% exceeds 2.5. Why ?
Table Natural Index
****************************************************
T1 1000000
T2 20
RUN-3
WI-V2.5.5.26952:
selectcount(*)
from t1 a join t2 b using(x)
where x =0
PLAN JOIN (A NATURAL, B INDEX (T2_X)) -- why NATURAL here ??1 records fetched
7304 ms, 13001 read(s), 6025990 fetch(es)
Table Natural Index
****************************************************
T1 1000000
T2 1000000
WI-V3.0.0.32208:
selectcount(*)
from t1 a join t2 b using(x)
where x =0Select Expression
-> Aggregate
-> Nested Loop Join (inner)
-> Table "T1"as"A" Full Scan -- why NATURAL here ??-> Filter
-> Table "T2"as"B" Access By ID
-> Bitmap
-> Index "T2_X" Range Scan (full match)
1 records fetched
9895 ms, 4420 read(s), 6012908 fetch(es) -- elapsed time here exceeds 2.5 more than by 35%. Why ?
Table Natural Index
****************************************************
T1 1000000
T2 1000000
PS. Adding prefix from alias in RUN-3:
select count(*)
from t1 a join t2 b using(x)
where b.x = 0;
turn index usage ON and plan will be: PLAN JOIN (B INDEX (T2_X), A INDEX (T1_X_Y)) -- i.e. most efficient for this query
PPS. This query:
select mod(i,3), i
from (
select gen_id(g,1) i
from rdb$types a, rdb$types b, (select1 i from rdb$types rows 20)
union allselect1from rdb$database where1=0-- "materialization": prevent from repeating evaluation gen_id() in the outer part of query
rows 1000000
);
DOES require rdb$types be aliased (as "a", "b") in 2.5 (otherwise get "alias RDB$TYPES conflicts with an alias in the same statement"), but not in 3.0. Is this expected behaviour?
The text was updated successfully, but these errors were encountered:
summary: Performance degradation on nested loops when joining big table "A" and small "B" on indexed field 'X' which is STARTING part of compound index {"X", "Y"} of "A". Natural scans occur if "X" is not prefixed by alias on join with USING() clause. => Sub-optimal plan for nested loops: fullscan occurs if column name is not prefixed by alias when joining with USING() clause
I'm not sure this is a bug, more or less "as designed". An implicit join replaces X (unqualified) with COALESCE(A.X, B.X) -- this is a handy trick to workaround "ambigous column reference" error. But COALESCE(A.X, B.X) cannot be evaluated for a single stream (be it A or B), hence a fullscan. The optimizer is not aware of this trick and cannot detect that A.X is actually equvalent to B.X inside COALESCE. And I don't see how it can be improved.
Submitted by: @pavel-zotov
Consider following script:
This script will produce table T1 with 1'000'000 rows and T2 with 20 rows, and their index statistics will be:
Now run these queries:
RUN-1
RUN-2
RUN-3
Then:
I got following data for mentioned queries:
RUN-1
WI-V2.5.5.26952:
WI-V3.0.0.32208:
RUN-2
WI-V2.5.5.26952:
WI-V3.0.0.32208:
RUN-3
WI-V2.5.5.26952:
WI-V3.0.0.32208:
PS. Adding prefix from alias in RUN-3:
turn index usage ON and plan will be:
PLAN JOIN (B INDEX (T2_X), A INDEX (T1_X_Y)) -- i.e. most efficient for this query
PPS. This query:
DOES require rdb$types be aliased (as "a", "b") in 2.5 (otherwise get "alias RDB$TYPES conflicts with an alias in the same statement"), but not in 3.0. Is this expected behaviour?
The text was updated successfully, but these errors were encountered: