Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow inner join vs. fast left join [CORE4666] #4977

Closed
firebird-automations opened this issue Jan 13, 2015 · 12 comments
Closed

Slow inner join vs. fast left join [CORE4666] #4977

firebird-automations opened this issue Jan 13, 2015 · 12 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Valdir Stiebe Junior (ogecrom)

Is related to CORE3707
Is related to CORE1686

Attachments:
teste.7z

Shouldn't the execution plan on the following 4 queries be the same?
On queries 1 and 3, there are way more non indexed reads.

On real data some queries like 1 and 3 take 5 minutes or more to execute while queries like 2 and 4 execute on less than few seconds.
Queries 2 and 3 looks like an ugly hack to circumvent a bug. But I found a lot of queries in our software that were rewritten on this fashion for optimization purposes.

1)
select *
from
V_TESTE_ED T inner join
V_SECOND_TABLE_ED S on S.ID_TESTE = T.ID_TESTE

PLAN JOIN (S SECOND_TABLE NATURAL, T TESTE INDEX (PK_TESTE))

2)
select *
from
V_TESTE_ED T left join
V_SECOND_TABLE_ED S on S.ID_TESTE = T.ID_TESTE
where
S.ID_SECOND_TABLE is not null

PLAN JOIN (T TESTE NATURAL, S SECOND_TABLE INDEX (FK_TESTE))

3)
select *
from
TESTE T inner join
SECOND_TABLE S on S.ID_TESTE = T.ID_TESTE

PLAN JOIN (S NATURAL, T INDEX (PK_TESTE))

4)
select *
from
TESTE T left join
SECOND_TABLE S on S.ID_TESTE = T.ID_TESTE
where
S.ID_SECOND_TABLE is not null

PLAN JOIN (T NATURAL, S INDEX (FK_TESTE))

====== Test Details ======

Data in test DB are skewed:

SQL> select id_teste,count(*) from second_table group by 1; --- child table
ID_TESTE COUNT
============ ============
0 12000

SQL> select count(*) from teste; -- parent table

   COUNT

============
5000

No sense in test until histograms will be implemented.

@firebird-automations
Copy link
Collaborator Author

Commented by: Valdir Stiebe Junior (ogecrom)

Attached the test database used for queries 1 to 4.

@firebird-automations
Copy link
Collaborator Author

Modified by: Valdir Stiebe Junior (ogecrom)

Attachment: teste.7z [ 12663 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Valdir Stiebe Junior (ogecrom)

Sorry, the SECOND_TABLE FK index have selectivity 1, greather than the PK index. But the problem ocurrs for any value greater than the PK index (all FK).

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

You cannot expect really good plans from the uneven distribution. SECOND_TABLE have all the records matching a single record inside TESTE.

@firebird-automations
Copy link
Collaborator Author

Commented by: Valdir Stiebe Junior (ogecrom)

Ok, please mark this issue as duplicated of CORE3707.
Your comment "I suppose this situation could be improved by assuming that any FK index has selectivity equal to 1 / master_cardinality (i.e. the same as the master PK selectivity)." explains it all.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

They're not duplicates, although related. CORE3707 is about zero (probably outdated) selectivities, so that the engine needs to guess its real value. In your case, selectivities are correct, but value distribution is uneven so the engine's assumption that every inner retrieval gets (cardinality * selectivity) rows is wrong. I don't think it can be fixed without value distribution histograms.

@firebird-automations
Copy link
Collaborator Author

Commented by: Valdir Stiebe Junior (ogecrom)

Got it. I will watch for CORE1686, then.

Just a hint for anyone who read this comments.
Instead of using a left join with where key is not null, for us, the preferable method of killing an index is adding zero on the join column.
Like this:

select *
from
V_TESTE_ED T inner join
V_SECOND_TABLE_ED S on S.ID_TESTE = T.ID_TESTE + 0

As seen on http://www.firebirdfaq.org/faq158/

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

Link: This issue is related to CORE3707 [ CORE3707 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

Link: This issue is related to CORE1686 [ CORE1686 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Duplicate [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: Deferred

Test Details: Data in test DB are skewed:

SQL> select id_teste,count(*) from second_table group by 1; --- child table
ID_TESTE COUNT
============ ============
0 12000

SQL> select count(*) from teste; -- parent table

   COUNT

============
5000

No sense in test until histograms will be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant