Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sub-optimal plan for nested loops: fullscan occurs if column name is not prefixed by alias when joining with USING() clause [CORE5040] #5327

Open
firebird-automations opened this issue Dec 9, 2015 · 2 comments

Comments

@firebird-automations
Copy link
Collaborator

firebird-automations commented Dec 9, 2015

Submitted by: @pavel-zotov

Consider following script:

--create sequence g;
alter sequence g restart with 0;
recreate table t1(x int, y int);
recreate table t2(x int);

insert into t1
select mod(i,3), i
from (
    select gen_id(g,1) i
    from rdb$types a, rdb$types b, (select 1 i from rdb$types rows 20)
    union all
    select 1 from rdb$database where 1=0 -- "materialization": prevent from repeating evaluation gen_id() in the outer part of query
    rows 1000000
);

insert into t2
select mod(i, 20)
from (
    select gen_id(g,1) i
    from rdb$types
    union all
    select 1 from rdb$database where 1=0
    rows 20
);
commit;

create index t1_x_y on t1(x, y);
create index t2_x on t2(x);
commit;

This script will produce table T1 with 1'000'000 rows and T2 with 20 rows, and their index statistics will be:

SQL> set list on; select rdb$relation_name,rdb$index_name,rdb$statistics from rdb$indices where rdb$relation_name in ('T1','T2');

RDB$RELATION_NAME               T1
RDB$INDEX_NAME                  T1_X_Y
RDB$STATISTICS                  9\.999999974752427e-07

RDB$RELATION_NAME               T2
RDB$INDEX_NAME                  T2_X
RDB$STATISTICS                  0.05000000074505806

Now run these queries:

RUN-1

select count(*)
from t1 a join t2 b on a.x = b.x
where b.x = 0;

RUN-2

select count(*)
from t2 b left join t1 a using(x)
where x=0; -- yes, unqualified (neither  "a" nor "b" prefix here)

RUN-3

select count(*)
from t1 a join t2 b using(x)
where x = 0; -- yes, unqualified (neither  "a" nor "b" prefix here)

Then:

  1. Compare plan of RUN-1 and RUN-2 & RUN-3: the 2nd and 3rd will use NATURAL reads which leads to poor performance
  2. Compare performance statistics on 2.5 and 3.0: the former WINS

I got following data for mentioned queries:

RUN-1

WI-V2.5.5.26952:

select count(*)
from t1 a join t2 b on a.x = b.x
where b.x = 0

PLAN JOIN (B INDEX (T2_X), A INDEX (T1_X_Y))
1 records fetched
    919 ms, 13500 read(s), 667169 fetch(es)

Table                             Natural     Index
****************************************************
T1                                           333333
T2                                                1

WI-V3.0.0.32208:

select count(*)
from t1 a join t2 b on a.x = b.x
where b.x = 0

Select Expression
    -> Aggregate
        ->  Nested Loop Join (inner)
            -> Filter
                -> Table "T2" as "B" Access By ID
                    -> Bitmap
                        -> Index "T2_X" Range Scan (full match)
            -> Filter
                -> Table "T1" as "A" Access By ID
                    -> Bitmap
                        -> Index "T1_X_Y" Range Scan (partial match: 1/2)
1 records fetched
   1164 ms, 6702 read(s), 666918 fetch(es) -- elapsed time more than by 25% exceeds 2.5. Why ?

Table                             Natural     Index
****************************************************
T1                                           333333
T2                                                1

RUN-2

WI-V2.5.5.26952:

select count(*)
from t2 b left join t1 a using(x)
where x=0

PLAN JOIN (B NATURAL, A INDEX (T1_X_Y)) -- natural here is EXPECTED because of LEFT join
1 records fetched
   2779 ms, 40516 read(s), 2001616 fetch(es)

Table                             Natural     Index
****************************************************
T1                                          1000000
T2                                     20

WI-V3.0.0.32208:

select count(*)
from t2 b left join t1 a using(x)
where x=0

Select Expression
    -> Aggregate
        -> Filter
            ->  Nested Loop Join (outer)
                -> Table "T2" as "B" Full Scan
                -> Filter
                    -> Table "T1" as "A" Access By ID
                        -> Bitmap
                            -> Index "T1_X_Y" Range Scan (partial match: 1/2)
1 records fetched
   3651 ms, 20120 read(s), 2000840 fetch(es) -- elapsed time more than by 30% exceeds 2.5. Why ?

Table                             Natural     Index
****************************************************
T1                                          1000000
T2                                     20

RUN-3

WI-V2.5.5.26952:

select count(*)
from t1 a join t2 b using(x)
where x = 0

PLAN JOIN (A NATURAL, B INDEX (T2_X)) -- why NATURAL here ??
1 records fetched
   7304 ms, 13001 read(s), 6025990 fetch(es)

Table                             Natural     Index
****************************************************
T1                                1000000
T2                                          1000000

WI-V3.0.0.32208:

select count(*)
from t1 a join t2 b using(x)
where x = 0

Select Expression
    -> Aggregate
        ->  Nested Loop Join (inner)
            -> Table "T1" as "A" Full Scan -- why NATURAL here ??
            -> Filter
                -> Table "T2" as "B" Access By ID
                    -> Bitmap
                        -> Index "T2_X" Range Scan (full match)
1 records fetched
   9895 ms, 4420 read(s), 6012908 fetch(es) -- elapsed time here exceeds 2.5 more than by 35%. Why ?

Table                             Natural     Index
****************************************************
T1                                1000000
T2                                          1000000

PS. Adding prefix from alias in RUN-3:

select count(*)
from t1 a join t2 b using(x)
where b.x = 0;

turn index usage ON and plan will be:
PLAN JOIN (B INDEX (T2_X), A INDEX (T1_X_Y)) -- i.e. most efficient for this query

PPS. This query:

select mod(i,3), i
from (
    select gen_id(g,1) i
    from rdb$types a, rdb$types b, (select 1 i from rdb$types rows 20)
    union all
    select 1 from rdb$database where 1=0 -- "materialization": prevent from repeating evaluation gen_id() in the outer part of query
    rows 1000000
);

DOES require rdb$types be aliased (as "a", "b") in 2.5 (otherwise get "alias RDB$TYPES conflicts with an alias in the same statement"), but not in 3.0. Is this expected behaviour?

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: Performance degradation on nested loops when joining big table "A" and small "B" on indexed field 'X' which is STARTING part of compound index {"X", "Y"} of "A". Natural scans occur if "X" is not prefixed by alias on join with USING() clause. => Sub-optimal plan for nested loops: fullscan occurs if column name is not prefixed by alias when joining with USING() clause

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I'm not sure this is a bug, more or less "as designed". An implicit join replaces X (unqualified) with COALESCE(A.X, B.X) -- this is a handy trick to workaround "ambigous column reference" error. But COALESCE(A.X, B.X) cannot be evaluated for a single stream (be it A or B), hence a fullscan. The optimizer is not aware of this trick and cannot detect that A.X is actually equvalent to B.X inside COALESCE. And I don't see how it can be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant