Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make optimizer to consider ORDER BY optimization when making decision about join order [CORE1482] #1898

Closed
firebird-automations opened this issue Sep 28, 2007 · 21 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @livius2

Relate to CORE4428
Is related to QA628

Votes: 2

When only limited number of rows is requested (using FIRST/ROW) from ordered set, fetching records in index order is preferable to sort. Optimizer should prefer join order that will allow using navigational index.

Just an idea: Introducing fixed cardinality of result set forced by FIRST/ROWS into calculations may solve this issue naturally.

Original description:
i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY http://H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

plan should be like this
PLAN JOIN (H ORDER PK_HIST_DB__ID,A INDEX (PK_ADRESY_DB__ID))

Commits: 15113ae FirebirdSQL/fbt-repository@2f66134

====== Test Details ======

in 2.1 and 2.5: PLAN SORT (JOIN (M NATURAL, D INDEX (THORSES_COLOR_ID))) -- doesn`t see 'rows 1'.
in 3.0 plan should CHANGE and take in account 'rows N' limit.

@firebird-automations
Copy link
Collaborator Author

Modified by: @livius2

description: i have problem with Firebird 1.5.4 and Select First

i do something like that

SELECT FIRST 10 ID, NAME FROM TABLEX

and tableX have 1 000 000 records

this query run in 10 minutes!

definition of table

"ID" INTEGER CONSTRAINT "NK_HIST_DB__ID" NOT NULL,
"LICZNIK" VARCHAR(20),
"REKORD" VARCHAR(20),
"OBIEKT" VARCHAR(10),
"KOD" VARCHAR(10),
"SYSTEM" VARCHAR(5),
"KANAL" VARCHAR(2),
"DATAP" DATE,
"CZASP" VARCHAR(8),
"DATAZ" DATE,
"CZASZ" VARCHAR(8),
"KODELEM" VARCHAR(20),
"ELEMENT" VARCHAR(50),
"STAN" VARCHAR(50),
"ALARM" VARCHAR(15),
"ZRODLO" VARCHAR(10),
"PRIORYTET" VARCHAR(1),
"ZNACZNIK" VARCHAR(1),
"RAPORT" VARCHAR(20),
"RET1" VARCHAR(4),
"RET2" VARCHAR(4),
"RET3" VARCHAR(4),
"RET4" VARCHAR(4),
"RET5" VARCHAR(4),
"RODZAJ" VARCHAR(20),
"RODZAJSTAN" VARCHAR(20),
"ID_OBJ" INTEGER,
CONSTRAINT "PK_TABLE_X__ID" PRIMARY KEY ("ID")

=>

i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY http://H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I found better plan
query optimizer should return this plan

@firebird-automations
Copy link
Collaborator Author

Modified by: @livius2

description: i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY http://H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

=>

i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY http://H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

plan should be like this
PLAN JOIN (H ORDER PK_HIST_DB__ID,A INDEX (PK_ADRESY_DB__ID))

summary: SELECT FIRST and big table - geting few records take long time => SELECT FIRST and big table - geting few records take long time - wrong plan

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

There is nothing to improve in engine here. The fact that you want to get only one row is irrelevant, because you asked for row from ordered set, so engine has to process the whole result set from your poor query (inner join of big tables without constraints that has to be processed as natural scan of 100k rows joining X rows from 1m rows) and then sort it to find the row you wanted. It should be order of magnitude faster without ORDER BY.

The execution plan you wanted could be hardly created. Query execution is a waterfall-like process defined in trees, but to achieve your desired plan, optimizer has to internally merge different adjacent operations from different levels into one and re-optimize the result to see whether resulted plan is better. This would require one hell of complex optimizer that may get better plans in few special cases, but would otherwise just waste cycles.

Next time consult firebird-support or firebird-devel list before making pointless tracker entries, please.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Open [ 1 ] => Closed [ 6 ]

resolution: Won't Fix [ 2 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

as you can see this plan PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))
is good only when reserved word First or in FB2 ROWS not occure in query

but when thises words occure optimizer should use anodher algoritm
becaouse of that i write this as issue

i try this query in MSSQL with Top clause and this run in percent of milisecond

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Pavel, you're wrong here. The master stream (H) can be easily navigated via the index and row-by-row joined with the dependent stream (A). Moreover, the optimizer always chooses such a plan if possible, even if the FIRST clause is not specified.

So, I really don't know why this is not happening here. Karol, is index PK_HIST_DB__ID active? Do you have an up-to-date index statistics? Did you try this query on FB2 (after restoring as an ODS11 database)?

Anyway, I doubt this can be fixed in the v1.5 series, as there are no plans to release v1.5.5.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

yes - before i post this heare
i update statistics for this indexes and both are active
of course both are 1.0 - becaouse this is primary key in both tables

i will try this on FB2 but not today

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

Dmitry, it seems that you didn't get it right either :-) ORDER BY is applied to final result, i.e. optimization via navigational index is considered after join order is chosen and checked against it. Table cardinalities (with absence of constraints) lead optimizer to choose A->H join order that doesn't allow to use navigational index on H.

What Karol is asking for is to add a heuristic to the optimizer on condition when FIRST/ROWS + ORDER BY is used (it's pointless without FIRST/ROWS) to use (if possible, of course) join order that would allow navigational index without using standard join order calculation (it's only way to get to the desired H->A join order in this case). While it could be relatively easy to implement to get better plans for simple joins like this, it could quickly turn into nightmare on more complex ones. I'm not against adding heuristic extensions to basically cost-based optimizer per se, but this one would be quite complex (with pushing information down the stream) to get great plan for very special case. I think that's not worth the effort. For cases like this we have PLAN clause to force the desired plan.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Pavel Cisar wrote: "For cases like this we have PLAN clause to force the desired plan. "

I known that plan clause can by used-
but to many places are now in Firebird to do this

i post this only to do Firebird better - now is very good - but can be better
i known that this is many work to do - but may by in future this should be done?

if optimizer spent one more ms to check this i suppose this is acceptable?
executing and fetching time is more important - i suppose

in real world systems - plan can not by explicitly specified (not always) becaouse users add something to where clause
by search form - and permanent plan clause can hang query execution

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

Ok, I have reopened this entry to keep the idea around for consideration. But don't expect it re-evaluated sooner than after 3.0 release.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Closed [ 6 ] => Reopened [ 4 ]

resolution: Won't Fix [ 2 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

priority: Major [ 3 ] => Minor [ 4 ]

Version: 2.0.3 [ 10200 ]

Version: 2.1.0 [ 10041 ]

description: i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY http://H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

plan should be like this
PLAN JOIN (H ORDER PK_HIST_DB__ID,A INDEX (PK_ADRESY_DB__ID))

=>

When only limited number of rows is requested (using FIRST/ROW) from ordered set, fetching records in index order is preferable to sort. Optimizer should prefer join order that will allow using navigational index.

Just an idea: Introducing fixed cardinality of result set forced by FIRST/ROWS into calculations may solve this issue naturally.

Original description:
i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY http://H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

plan should be like this
PLAN JOIN (H ORDER PK_HIST_DB__ID,A INDEX (PK_ADRESY_DB__ID))

summary: SELECT FIRST and big table - geting few records take long time - wrong plan => Make optimizer to consider ORDER BY optimization when making decision about join order

environment: WINDOWS 2003 SERVER on Server side and WinXP Prof on Client side =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Workflow: jira [ 13190 ] => Firebird [ 13975 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Reopened [ 4 ] => In Progress [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: In Progress [ 3 ] => Open [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Beta 1 [ 10332 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue relate to CORE4428 [ CORE4428 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Done successfully

Test Details: in 2.1 and 2.5: PLAN SORT (JOIN (M NATURAL, D INDEX (THORSES_COLOR_ID))) -- doesn`t see 'rows 1'.
in 3.0 plan should CHANGE and take in account 'rows N' limit.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Link: This issue is related to QA628 [ QA628 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment