Issue Details (XML | Word | Printable)

Key: CORE-1295
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Dmitry Yemanov
Reporter: Alexander Tyutik
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Bad optimization of queries with DB_KEY

Created: 02/Jun/07 06:29 AM   Updated: 19/Jan/16 05:48 AM
Component/s: None
Affects Version/s: 2.0.0, 2.0.1, 2.1 Alpha 1
Fix Version/s: 3.0 Alpha 1, 2.5.3

Issue Links:
Duplicate
 
Relate

QA Status: Done successfully


 Description  « Hide
To reproduce the problem you need any table from your DB and next procedure:

CREATE PROCEDURE GET_DB_KEY
RETURNS (
    "DB_KEY" CHAR(8) CHARACTER SET OCTETS)
AS
BEGIN
  SELECT RDB$DB_KEY FROM YOUR_TABLE WHERE ID = 3 INTO :DB_KEY;
  SUSPEND;
END



Now good query with my test table called "Tree" (2 indexed reads):

----------------------------------------------------------------------------------------------------------------------

SELECT * FROM GET_DB_KEY G LEFT JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY"

PLAN JOIN (Tree INDEX (PK_Tree), T INDEX ())

----------------------------------------------------------------------------------------------------------------------


Bad queries:

----------------------------------------------------------------------------------------------------------------------

SELECT * FROM GET_DB_KEY G JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY"

PLAN JOIN (Tree INDEX (PK_Tree), T NATURAL)

----------------------------------------------------------------------------------------------------------------------

SELECT * FROM "Tree" WHERE RDB$DB_KEY = (SELECT "DB_KEY" FROM GET_DB_KEY)

PLAN (Tree INDEX (PK_Tree)) PLAN (Tree NATURAL)

----------------------------------------------------------------------------------------------------------------------

In both queries all records from table "Tree" were fetched

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Pavel Cisar added a comment - 02/Jun/07 06:46 AM
I can't comprehend what you're trying to do here, as you're mixing apples and oranges in a way that doesn't make sense.

1. Retrieval via RDB$DB_KEY can't use any index, as it's meaningless . DB_KEY is "direct pointer" to the row, so retrieval is mere calculation of the location (data page + slot) and off you go.

2. Join of two different tables via their DB_KEY is pointless as they can't match from obvious reason (similar to comparison of two memory pointers to different objects for equality).

Alexander Tyutik added a comment - 02/Jun/07 06:59 AM
> I can't comprehend what you're trying to do here, as you're mixing apples and oranges in a way that doesn't make sense.

No. In my version of procedure get DB_KEY from table "Tree" too.

> Join of two different tables via their DB_KEY is pointless

I haven't join diffirent tables. Read above. Seems i explained problem not clear :-(

Here is my version of procedure:

CREATE PROCEDURE GET_DB_KEY
RETURNS (
    "DB_KEY" CHAR(8) CHARACTER SET OCTETS)
AS
BEGIN
  SELECT RDB$DB_KEY FROM "Tree" WHERE ID = 3 INTO :DB_KEY;
  SUSPEND;
END

and i wanted to say that you should use any existing table from your database instead of YOUR_TABLE. And then i wrote that i used table "Tree". You may use any other table, but should recompile procedure vith valid table name and ID value (or what you use instead of id).

Pavel Cisar added a comment - 02/Jun/07 07:29 AM
Ok, so you at least compare apples to apples, but that doesn't invalidate the first point, that retrieval via RDB$DB_KEY don't use indices by design, as DB_KEY is in fact a pointer. In fact, using index would be slower than direct calculation of the physical row location from DB_KEY.

Alexander Tyutik added a comment - 02/Jun/07 08:09 AM
Im my test sample there are 10 records in table "Tree".

Now compare next queries and plans:

=====================================================================

SELECT * FROM GET_DB_KEY G LEFT JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY";

PLAN JOIN (Tree INDEX (PK_Tree), T INDEX ())

2 index reads from "Tree" here

----------------------------------------------------------------------------------------------------------------------

SELECT * FROM GET_DB_KEY G JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY";

PLAN JOIN (Tree INDEX (PK_Tree), T NATURAL)

10 non index reads and 1 index read here

=====================================================================

EXECUTE BLOCK
RETURNS ( ID INTEGER )
AS
  DECLARE DBKEY CHAR(8) CHARACTER SET OCTETS;
BEGIN
  FOR SELECT "DB_KEY" FROM GET_DB_KEY INTO :DBKEY DO
  BEGIN
    SELECT ID FROM "Tree" T WHERE T.RDB$DB_KEY = :DBKEY INTO :ID;
    SUSPEND;
  END
END

PLAN (T INDEX ()) PLAN (Tree INDEX (PK_Tree))

2 index reads from "Tree" here

----------------------------------------------------------------------------------------------------------------------

SELECT ID FROM "Tree" WHERE RDB$DB_KEY = (SELECT "DB_KEY" FROM GET_DB_KEY)

PLAN (Tree INDEX (PK_Tree)) PLAN (Tree NATURAL)

10 non index and 1 index read frem "Tree" here

=====================================================================


You still don't see the problem?

Vlad Khorsun added a comment - 02/Jun/07 08:31 AM
Pavel, access via DB_KEY always shown in execution statistics as indexed read, IIRC

Pavel Cisar added a comment - 02/Jun/07 09:18 AM
Vlad, that makes sense :-) But as you can see, DB_KEY access method is reported inconsistently in human readable form of execution plan. It's either INDEX without index name or NATURAL (depends on the context). Don't know what would be an outcome of any attempt to specify explicit execution plan that involve access via DB_KEY in this situation. Maybe we can fill an improvement request to report DB_KEY access method via special keyword (DB_KEY for example). Anyway, it's a different matter than original improvement request.

Alexander Tyutik added a comment - 03/Jun/07 04:39 AM
Pavel, and what next? Bug marked as closed and no new requests appeared instead of it. And, in general, i didn't understand why don't you like original improvement request. It called "Bad optimization of queries with DB_KEY" and it is really so.

Pavel Cisar added a comment - 03/Jun/07 05:23 AM
Alexander, I closed your improvement request because there is nothing wrong with optimization of data retrieval via DB_KEY is it stands now. Current DB_KEY retrieval method is far much faster than retrieval with use of index (no index scan is needed at all, engine computes the exact row location from DB_KEY value). So from this point, your issue description and request is completely invalid, as there is simply no way how to make the DB_KEY retrieval faster or better.

If there is any issue with retrieval via DB_KEY, then it's in how this access method is reported in execution plan. But you must understand that it's rather a small glitch than real issue, as execution plan in "human readable form" is and always was an abbreviation of real plan used by engine. It shows only most important and common execution paths and methods in condensed form (some element like evals are stripped out).

So, if any improvement request should be made here, then against "RSB tree to text" conversion routine, not the optimizer itself or whatever. If you want to fill such improvement request, then feel free to do so, but I don't have the urge to do so myself.

Alexander Tyutik added a comment - 03/Jun/07 02:31 PM
> If there is any issue with retrieval via DB_KEY, then it's in how this access method is reported in execution plan.

Sorry, but it's not clear for me.

I asked server in my query to get me one record by one known db_key of that record. And server reported me that in some kind of queries it immediately fetched record by db_key, and in other queries it fetched all records of the table and after that it filtered each record comparing db_keys.

So, do you want to tell that server reported me incorrect amount of fetched records (it really fetched one record but shown 10 in statistics) or what?

It does not matter for me as for user what plan server generated. I don't like that it fetched all records in case when he should fetch only one. That's why for me fetching of all records of a table is a bad optimization.

Dmitry Yemanov added a comment - 15/Jun/07 03:17 AM
Pavel, I do see the optimization problem. Access via DBKEY is *always* reported as INDEX() -- without an index name -- but it's never reported as NATURAL. 10 non-indexed reads clearly show that the inner stream is scanned sequentially instead of involving a fast DBKEY lookup.

Dmitry Yemanov added a comment - 16/Jun/07 10:50 AM
This is a regression as compared with v1.5.x.

Dmitry Yemanov added a comment - 19/Feb/10 11:35 AM
Just to add a note: regression is only the last statement in the example (with a sub-query). And it looks fixed in v2.5. The second sample (inner join) works equally bad in the every FB version.

Dmitry Yemanov added a comment - 19/Feb/10 05:35 PM
One more example, slightly different this time:

select * from rdb$relations
where rdb$db_key = ? and rdb$relation_id = 0

select * from rdb$relations
where rdb$db_key = ? and rdb$relation_name = 'RDB$RELATIONS'

I'd expect to see the plan using DBKEY only, not an index (even primary) or a bitmap involving two predicates.

Dmitry Yemanov added a comment - 27/Nov/12 05:37 PM
Backported into v2.5.3.

Pavel Cisar added a comment - 25/Sep/15 12:25 PM
Test created.