Issue Details (XML | Word | Printable)

Key: CORE-1482
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Minor Minor
Assignee: Dmitry Yemanov
Reporter: Karol Bieniaszewski
Votes: 2
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Make optimizer to consider ORDER BY optimization when making decision about join order

Created: 28/Sep/07 03:44 AM   Updated: 28/Sep/15 01:09 PM
Component/s: Engine
Affects Version/s: 1.5.4, 2.0.3, 2.1.0
Fix Version/s: 3.0 Beta 1

Issue Links:
Relate

QA Status: Done successfully
Test Details:
in 2.1 and 2.5: PLAN SORT (JOIN (M NATURAL, D INDEX (THORSES_COLOR_ID))) -- doesn`t see 'rows 1'.
in 3.0 plan should CHANGE and take in account 'rows N' limit.


 Description  « Hide
When only limited number of rows is requested (using FIRST/ROW) from ordered set, fetching records in index order is preferable to sort. Optimizer should prefer join order that will allow using navigational index.

Just an idea: Introducing fixed cardinality of result set forced by FIRST/ROWS into calculations may solve this issue naturally.

Original description:
i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

plan should be like this
PLAN JOIN (H ORDER PK_HIST_DB__ID,A INDEX (PK_ADRESY_DB__ID))



 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Karol Bieniaszewski made changes - 28/Sep/07 06:30 AM
Field Original Value New Value
Description i have problem with Firebird 1.5.4 and Select First

i do something like that

SELECT FIRST 10 ID, NAME FROM TABLEX

and tableX have 1 000 000 records

this query run in 10 minutes!

definition of table

  "ID" INTEGER CONSTRAINT "NK_HIST_DB__ID" NOT NULL,
  "LICZNIK" VARCHAR(20),
  "REKORD" VARCHAR(20),
  "OBIEKT" VARCHAR(10),
  "KOD" VARCHAR(10),
  "SYSTEM" VARCHAR(5),
  "KANAL" VARCHAR(2),
  "DATAP" DATE,
  "CZASP" VARCHAR(8),
  "DATAZ" DATE,
  "CZASZ" VARCHAR(8),
  "KODELEM" VARCHAR(20),
  "ELEMENT" VARCHAR(50),
  "STAN" VARCHAR(50),
  "ALARM" VARCHAR(15),
  "ZRODLO" VARCHAR(10),
  "PRIORYTET" VARCHAR(1),
  "ZNACZNIK" VARCHAR(1),
  "RAPORT" VARCHAR(20),
  "RET1" VARCHAR(4),
  "RET2" VARCHAR(4),
  "RET3" VARCHAR(4),
  "RET4" VARCHAR(4),
  "RET5" VARCHAR(4),
  "RODZAJ" VARCHAR(20),
  "RODZAJSTAN" VARCHAR(20),
  "ID_OBJ" INTEGER,
CONSTRAINT "PK_TABLE_X__ID" PRIMARY KEY ("ID")
i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!
Karol Bieniaszewski added a comment - 28/Sep/07 07:41 AM
I found better plan
query optimizer should return this plan

Karol Bieniaszewski made changes - 28/Sep/07 07:41 AM
Summary SELECT FIRST and big table - geting few records take long time SELECT FIRST and big table - geting few records take long time - wrong plan
Description i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!
i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

plan should be like this
PLAN JOIN (H ORDER PK_HIST_DB__ID,A INDEX (PK_ADRESY_DB__ID))

Pavel Cisar added a comment - 28/Sep/07 07:48 AM - edited
There is nothing to improve in engine here. The fact that you want to get only one row is irrelevant, because you asked for row from ordered set, so engine has to process the whole result set from your poor query (inner join of big tables without constraints that has to be processed as natural scan of 100k rows joining X rows from 1m rows) and then sort it to find the row you wanted. It should be order of magnitude faster without ORDER BY.

The execution plan you wanted could be hardly created. Query execution is a waterfall-like process defined in trees, but to achieve your desired plan, optimizer has to internally merge different adjacent operations from different levels into one and re-optimize the result to see whether resulted plan is better. This would require one hell of complex optimizer that may get better plans in few special cases, but would otherwise just waste cycles.

Next time consult firebird-support or firebird-devel list before making pointless tracker entries, please.

Pavel Cisar made changes - 28/Sep/07 07:48 AM
Status Open [ 1 ] Closed [ 6 ]
Resolution Won't Fix [ 2 ]
Karol Bieniaszewski added a comment - 28/Sep/07 08:13 AM
as you can see this plan PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))
is good only when reserved word First or in FB2 ROWS not occure in query

but when thises words occure optimizer should use anodher algoritm
becaouse of that i write this as issue

i try this query in MSSQL with Top clause and this run in percent of milisecond

Dmitry Yemanov added a comment - 28/Sep/07 08:28 AM
Pavel, you're wrong here. The master stream (H) can be easily navigated via the index and row-by-row joined with the dependent stream (A). Moreover, the optimizer always chooses such a plan if possible, even if the FIRST clause is not specified.

So, I really don't know why this is not happening here. Karol, is index PK_HIST_DB__ID active? Do you have an up-to-date index statistics? Did you try this query on FB2 (after restoring as an ODS11 database)?

Anyway, I doubt this can be fixed in the v1.5 series, as there are no plans to release v1.5.5.

Karol Bieniaszewski added a comment - 28/Sep/07 08:41 AM - edited
yes - before i post this heare
i update statistics for this indexes and both are active
of course both are 1.0 - becaouse this is primary key in both tables

i will try this on FB2 but not today

Pavel Cisar added a comment - 28/Sep/07 11:40 AM
Dmitry, it seems that you didn't get it right either :-) ORDER BY is applied to final result, i.e. optimization via navigational index is considered after join order is chosen and checked against it. Table cardinalities (with absence of constraints) lead optimizer to choose A->H join order that doesn't allow to use navigational index on H.

What Karol is asking for is to add a heuristic to the optimizer on condition when FIRST/ROWS + ORDER BY is used (it's pointless without FIRST/ROWS) to use (if possible, of course) join order that would allow navigational index without using standard join order calculation (it's only way to get to the desired H->A join order in this case). While it could be relatively easy to implement to get better plans for simple joins like this, it could quickly turn into nightmare on more complex ones. I'm not against adding heuristic extensions to basically cost-based optimizer per se, but this one would be quite complex (with pushing information down the stream) to get great plan for very special case. I think that's not worth the effort. For cases like this we have PLAN clause to force the desired plan.

Karol Bieniaszewski added a comment - 01/Oct/07 05:31 AM
Pavel Cisar wrote: "For cases like this we have PLAN clause to force the desired plan. "

I known that plan clause can by used-
but to many places are now in Firebird to do this

i post this only to do Firebird better - now is very good - but can be better
i known that this is many work to do - but may by in future this should be done?

if optimizer spent one more ms to check this i suppose this is acceptable?
executing and fetching time is more important - i suppose

in real world systems - plan can not by explicitly specified (not always) becaouse users add something to where clause
by search form - and permanent plan clause can hang query execution

Pavel Cisar added a comment - 01/Oct/07 08:16 AM
Ok, I have reopened this entry to keep the idea around for consideration. But don't expect it re-evaluated sooner than after 3.0 release.

Pavel Cisar made changes - 01/Oct/07 08:16 AM
Resolution Won't Fix [ 2 ]
Status Closed [ 6 ] Reopened [ 4 ]
Pavel Cisar made changes - 01/Oct/07 08:25 AM
Environment WINDOWS 2003 SERVER on Server side and WinXP Prof on Client side
Summary SELECT FIRST and big table - geting few records take long time - wrong plan Make optimizer to consider ORDER BY optimization when making decision about join order
Affects Version/s 2.0.3 [ 10200 ]
Affects Version/s 2.1.0 [ 10041 ]
Description i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

plan should be like this
PLAN JOIN (H ORDER PK_HIST_DB__ID,A INDEX (PK_ADRESY_DB__ID))

When only limited number of rows is requested (using FIRST/ROW) from ordered set, fetching records in index order is preferable to sort. Optimizer should prefer join order that will allow using navigational index.

Just an idea: Introducing fixed cardinality of result set forced by FIRST/ROWS into calculations may solve this issue naturally.

Original description:
i have problem with Firebird 1.5.4 and Select First

i do something like that

select
FIRST 1
*
FROM
HIST_DB H
INNER JOIN ADRESY_DB A ON A.ID=H.ID_OBJ
ORDER BY H.ID ASC

PLAN SORT (JOIN (A NATURAL,H INDEX (FK_HIST_DB__ID_OBJ)))

and HIST_DB have 1 000 000 records
ADRESY_DB have 100 000 records

this query run in 10 minutes!

plan should be like this
PLAN JOIN (H ORDER PK_HIST_DB__ID,A INDEX (PK_ADRESY_DB__ID))

Priority Major [ 3 ] Minor [ 4 ]
Dmitry Yemanov made changes - 14/Oct/07 08:17 AM
Assignee Dmitry Yemanov [ dimitr ]
Pavel Cisar made changes - 28/Jan/08 01:17 PM
Workflow jira [ 13190 ] Firebird [ 13975 ]
Dmitry Yemanov made changes - 06/Dec/13 05:24 PM
Status Reopened [ 4 ] In Progress [ 3 ]
Dmitry Yemanov made changes - 05/Jan/14 07:37 PM
Status In Progress [ 3 ] Open [ 1 ]
Dmitry Yemanov made changes - 05/Jan/14 07:40 PM
Status Open [ 1 ] Resolved [ 5 ]
Fix Version/s 3.0 Beta 1 [ 10332 ]
Resolution Fixed [ 1 ]
Dmitry Yemanov made changes - 15/May/14 09:50 AM
Link This issue relate to CORE-4428 [ CORE-4428 ]
Dmitry Yemanov made changes - 15/May/14 09:51 AM
Comment [ it works for order by ... rows, but for some reason does not apply to the min / max

select min(t1.id)
from t1
join t2 on t1.id = t2.t1_id;


Select Expression
    -> Aggregate
        -> Nested Loop Join (inner)
            -> Table "T2" Full Scan
            -> Filter
                -> Table "T1" Access By ID
                    -> Bitmap
                        -> Index "PK_T1" Unique Scan

         MIN
============
           1

Current memory = 11621440
Delta memory = 547712
Max memory = 11672536
Elapsed time= 3.302 sec
Buffers = 2048
Reads = 11421
Writes = 0
Fetches = 6026958 <-- no good

vs

select min(t2.id)
from t1
join t2 on t1.id = t2.t1_id;


Select Expression
    -> Aggregate
        -> Nested Loop Join (inner)
            -> Table "T2" Access By ID
                -> Index "PK_T2" Full Scan
            -> Filter
                -> Table "T1" Access By ID
                    -> Bitmap
                        -> Index "PK_T1" Unique Scan

         MIN
============
           1

Current memory = 11840168
Delta memory = 1672
Max memory = 11872248
Elapsed time= 0.058 sec
Buffers = 2048
Reads = 0
Writes = 0
Fetches = 38 <-- good

vs

same as 1 query, but much faster

select t1.id
from t1
join t2 on t1.id = t2.t1_id
order by t1.id rows 1;


Select Expression
    -> First N Records
        -> Nested Loop Join (inner)
            -> Table "T1" Access By ID
                -> Index "PK_T1" Full Scan
            -> Filter
                -> Table "T2" Access By ID
                    -> Bitmap
                        -> Index "FK_T2_T1" Range Scan (full match)

          ID
============
           1

Current memory = 11933608
Delta memory = 52288
Max memory = 11933760
Elapsed time= 0.062 sec
Buffers = 2048
Reads = 0
Writes = 0
Fetches = 137 <-- good
]
Pavel Zotov made changes - 28/May/15 03:01 PM
Status Resolved [ 5 ] Resolved [ 5 ]
Test Details in 2.1 and 2.5: PLAN SORT (JOIN (M NATURAL, D INDEX (THORSES_COLOR_ID))) -- doesn`t see 'rows 1'.
in 3.0 plan should CHANGE and take in account 'rows N' limit.
QA Status Done successfully
Pavel Cisar made changes - 28/Sep/15 01:09 PM
Link This issue is related to QA-628 [ QA-628 ]