Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad optimization of queries with DB_KEY [CORE1295] #1716

Closed
firebird-automations opened this issue Jun 2, 2007 · 31 comments
Closed

Bad optimization of queries with DB_KEY [CORE1295] #1716

firebird-automations opened this issue Jun 2, 2007 · 31 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Alexander Tyutik (tut)

Is duplicated by CORE3997
Is related to QA509
Relate to CORE4530

To reproduce the problem you need any table from your DB and next procedure:

CREATE PROCEDURE GET_DB_KEY
RETURNS (
"DB_KEY" CHAR(8) CHARACTER SET OCTETS)
AS
BEGIN
SELECT RDB$DB_KEY FROM YOUR_TABLE WHERE ID = 3 INTO :DB_KEY;
SUSPEND;
END

Now good query with my test table called "Tree" (2 indexed reads):

----------------------------------------------------------------------------------------------------------------------

SELECT * FROM GET_DB_KEY G LEFT JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY"

PLAN JOIN (Tree INDEX (PK_Tree), T INDEX ())

----------------------------------------------------------------------------------------------------------------------

Bad queries:

----------------------------------------------------------------------------------------------------------------------

SELECT * FROM GET_DB_KEY G JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY"

PLAN JOIN (Tree INDEX (PK_Tree), T NATURAL)

----------------------------------------------------------------------------------------------------------------------

SELECT * FROM "Tree" WHERE RDB$DB_KEY = (SELECT "DB_KEY" FROM GET_DB_KEY)

PLAN (Tree INDEX (PK_Tree)) PLAN (Tree NATURAL)

----------------------------------------------------------------------------------------------------------------------

In both queries all records from table "Tree" were fetched

Commits: 134d923 57f2da4 bebc026 a1d9ca6 b07ef45 c7853fa FirebirdSQL/fbt-repository@a8ad99b

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

I can't comprehend what you're trying to do here, as you're mixing apples and oranges in a way that doesn't make sense.

1. Retrieval via RDB$DB_KEY can't use any index, as it's meaningless . DB_KEY is "direct pointer" to the row, so retrieval is mere calculation of the location (data page + slot) and off you go.

2. Join of two different tables via their DB_KEY is pointless as they can't match from obvious reason (similar to comparison of two memory pointers to different objects for equality).

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Won't Fix [ 2 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Alexander Tyutik (tut)

> I can't comprehend what you're trying to do here, as you're mixing apples and oranges in a way that doesn't make sense.

No. In my version of procedure get DB_KEY from table "Tree" too.

> Join of two different tables via their DB_KEY is pointless

I haven't join diffirent tables. Read above. Seems i explained problem not clear :-(

Here is my version of procedure:

CREATE PROCEDURE GET_DB_KEY
RETURNS (
"DB_KEY" CHAR(8) CHARACTER SET OCTETS)
AS
BEGIN
SELECT RDB$DB_KEY FROM "Tree" WHERE ID = 3 INTO :DB_KEY;
SUSPEND;
END

and i wanted to say that you should use any existing table from your database instead of YOUR_TABLE. And then i wrote that i used table "Tree". You may use any other table, but should recompile procedure vith valid table name and ID value (or what you use instead of id).

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

Ok, so you at least compare apples to apples, but that doesn't invalidate the first point, that retrieval via RDB$DB_KEY don't use indices by design, as DB_KEY is in fact a pointer. In fact, using index would be slower than direct calculation of the physical row location from DB_KEY.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Alexander Tyutik (tut)

Im my test sample there are 10 records in table "Tree".

Now compare next queries and plans:

=====================================================================

SELECT * FROM GET_DB_KEY G LEFT JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY";

PLAN JOIN (Tree INDEX (PK_Tree), T INDEX ())

2 index reads from "Tree" here

----------------------------------------------------------------------------------------------------------------------

SELECT * FROM GET_DB_KEY G JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY";

PLAN JOIN (Tree INDEX (PK_Tree), T NATURAL)

10 non index reads and 1 index read here

=====================================================================

EXECUTE BLOCK
RETURNS ( ID INTEGER )
AS
DECLARE DBKEY CHAR(8) CHARACTER SET OCTETS;
BEGIN
FOR SELECT "DB_KEY" FROM GET_DB_KEY INTO :DBKEY DO
BEGIN
SELECT ID FROM "Tree" T WHERE T.RDB$DB_KEY = :DBKEY INTO :ID;
SUSPEND;
END
END

PLAN (T INDEX ()) PLAN (Tree INDEX (PK_Tree))

2 index reads from "Tree" here

----------------------------------------------------------------------------------------------------------------------

SELECT ID FROM "Tree" WHERE RDB$DB_KEY = (SELECT "DB_KEY" FROM GET_DB_KEY)

PLAN (Tree INDEX (PK_Tree)) PLAN (Tree NATURAL)

10 non index and 1 index read frem "Tree" here

=====================================================================

You still don't see the problem?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Pavel, access via DB_KEY always shown in execution statistics as indexed read, IIRC

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

Vlad, that makes sense :-) But as you can see, DB_KEY access method is reported inconsistently in human readable form of execution plan. It's either INDEX without index name or NATURAL (depends on the context). Don't know what would be an outcome of any attempt to specify explicit execution plan that involve access via DB_KEY in this situation. Maybe we can fill an improvement request to report DB_KEY access method via special keyword (DB_KEY for example). Anyway, it's a different matter than original improvement request.

@firebird-automations
Copy link
Collaborator Author

Commented by: Alexander Tyutik (tut)

Pavel, and what next? Bug marked as closed and no new requests appeared instead of it. And, in general, i didn't understand why don't you like original improvement request. It called "Bad optimization of queries with DB_KEY" and it is really so.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

Alexander, I closed your improvement request because there is nothing wrong with optimization of data retrieval via DB_KEY is it stands now. Current DB_KEY retrieval method is far much faster than retrieval with use of index (no index scan is needed at all, engine computes the exact row location from DB_KEY value). So from this point, your issue description and request is completely invalid, as there is simply no way how to make the DB_KEY retrieval faster or better.

If there is any issue with retrieval via DB_KEY, then it's in how this access method is reported in execution plan. But you must understand that it's rather a small glitch than real issue, as execution plan in "human readable form" is and always was an abbreviation of real plan used by engine. It shows only most important and common execution paths and methods in condensed form (some element like evals are stripped out).

So, if any improvement request should be made here, then against "RSB tree to text" conversion routine, not the optimizer itself or whatever. If you want to fill such improvement request, then feel free to do so, but I don't have the urge to do so myself.

@firebird-automations
Copy link
Collaborator Author

Commented by: Alexander Tyutik (tut)

> If there is any issue with retrieval via DB_KEY, then it's in how this access method is reported in execution plan.

Sorry, but it's not clear for me.

I asked server in my query to get me one record by one known db_key of that record. And server reported me that in some kind of queries it immediately fetched record by db_key, and in other queries it fetched all records of the table and after that it filtered each record comparing db_keys.

So, do you want to tell that server reported me incorrect amount of fetched records (it really fetched one record but shown 10 in statistics) or what?

It does not matter for me as for user what plan server generated. I don't like that it fetched all records in case when he should fetch only one. That's why for me fetching of all records of a table is a bad optimization.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Pavel, I do see the optimization problem. Access via DBKEY is *always* reported as INDEX() -- without an index name -- but it's never reported as NATURAL. 10 non-indexed reads clearly show that the inner stream is scanned sequentially instead of involving a fast DBKEY lookup.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

This is a regression as compared with v1.5.x.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Closed [ 6 ] => Reopened [ 4 ]

assignee: Dmitry Yemanov [ dimitr ]

resolution: Won't Fix [ 2 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Version: 2.0.1 [ 10090 ]

Version: 2.0.0 [ 10091 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Workflow: jira [ 12260 ] => Firebird [ 15596 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

issuetype: Improvement [ 4 ] => Bug [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Reopened [ 4 ] => In Progress [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Just to add a note: regression is only the last statement in the example (with a sub-query). And it looks fixed in v2.5. The second sample (inner join) works equally bad in the every FB version.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

One more example, slightly different this time:

select * from rdb$relations
where rdb$db_key = ? and rdb$relation_id = 0

select * from rdb$relations
where rdb$db_key = ? and rdb$relation_name = 'RDB$RELATIONS'

I'd expect to see the plan using DBKEY only, not an index (even primary) or a bitmap involving two predicates.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: In Progress [ 3 ] => Open [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is duplicated by CORE3997 [ CORE3997 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Backported into v2.5.3.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 2.5.3 [ 10461 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Link: This issue is related to QA509 [ QA509 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue relate to CORE4530 [ CORE4530 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

Test created.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test => Done successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment