New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad optimization of queries with DB_KEY [CORE1295] #1716
Comments
Commented by: @pcisar I can't comprehend what you're trying to do here, as you're mixing apples and oranges in a way that doesn't make sense. 1. Retrieval via RDB$DB_KEY can't use any index, as it's meaningless . DB_KEY is "direct pointer" to the row, so retrieval is mere calculation of the location (data page + slot) and off you go. 2. Join of two different tables via their DB_KEY is pointless as they can't match from obvious reason (similar to comparison of two memory pointers to different objects for equality). |
Commented by: Alexander Tyutik (tut) > I can't comprehend what you're trying to do here, as you're mixing apples and oranges in a way that doesn't make sense. No. In my version of procedure get DB_KEY from table "Tree" too. > Join of two different tables via their DB_KEY is pointless I haven't join diffirent tables. Read above. Seems i explained problem not clear :-( Here is my version of procedure: CREATE PROCEDURE GET_DB_KEY and i wanted to say that you should use any existing table from your database instead of YOUR_TABLE. And then i wrote that i used table "Tree". You may use any other table, but should recompile procedure vith valid table name and ID value (or what you use instead of id). |
Commented by: @pcisar Ok, so you at least compare apples to apples, but that doesn't invalidate the first point, that retrieval via RDB$DB_KEY don't use indices by design, as DB_KEY is in fact a pointer. In fact, using index would be slower than direct calculation of the physical row location from DB_KEY. |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Commented by: Alexander Tyutik (tut) Im my test sample there are 10 records in table "Tree". Now compare next queries and plans: ===================================================================== SELECT * FROM GET_DB_KEY G LEFT JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY"; PLAN JOIN (Tree INDEX (PK_Tree), T INDEX ()) 2 index reads from "Tree" here ---------------------------------------------------------------------------------------------------------------------- SELECT * FROM GET_DB_KEY G JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY"; PLAN JOIN (Tree INDEX (PK_Tree), T NATURAL) 10 non index reads and 1 index read here ===================================================================== EXECUTE BLOCK PLAN (T INDEX ()) PLAN (Tree INDEX (PK_Tree)) 2 index reads from "Tree" here ---------------------------------------------------------------------------------------------------------------------- SELECT ID FROM "Tree" WHERE RDB$DB_KEY = (SELECT "DB_KEY" FROM GET_DB_KEY) PLAN (Tree INDEX (PK_Tree)) PLAN (Tree NATURAL) 10 non index and 1 index read frem "Tree" here ===================================================================== You still don't see the problem? |
Commented by: @hvlad Pavel, access via DB_KEY always shown in execution statistics as indexed read, IIRC |
Commented by: @pcisar Vlad, that makes sense :-) But as you can see, DB_KEY access method is reported inconsistently in human readable form of execution plan. It's either INDEX without index name or NATURAL (depends on the context). Don't know what would be an outcome of any attempt to specify explicit execution plan that involve access via DB_KEY in this situation. Maybe we can fill an improvement request to report DB_KEY access method via special keyword (DB_KEY for example). Anyway, it's a different matter than original improvement request. |
Commented by: Alexander Tyutik (tut) Pavel, and what next? Bug marked as closed and no new requests appeared instead of it. And, in general, i didn't understand why don't you like original improvement request. It called "Bad optimization of queries with DB_KEY" and it is really so. |
Commented by: @pcisar Alexander, I closed your improvement request because there is nothing wrong with optimization of data retrieval via DB_KEY is it stands now. Current DB_KEY retrieval method is far much faster than retrieval with use of index (no index scan is needed at all, engine computes the exact row location from DB_KEY value). So from this point, your issue description and request is completely invalid, as there is simply no way how to make the DB_KEY retrieval faster or better. If there is any issue with retrieval via DB_KEY, then it's in how this access method is reported in execution plan. But you must understand that it's rather a small glitch than real issue, as execution plan in "human readable form" is and always was an abbreviation of real plan used by engine. It shows only most important and common execution paths and methods in condensed form (some element like evals are stripped out). So, if any improvement request should be made here, then against "RSB tree to text" conversion routine, not the optimizer itself or whatever. If you want to fill such improvement request, then feel free to do so, but I don't have the urge to do so myself. |
Commented by: Alexander Tyutik (tut) > If there is any issue with retrieval via DB_KEY, then it's in how this access method is reported in execution plan. Sorry, but it's not clear for me. I asked server in my query to get me one record by one known db_key of that record. And server reported me that in some kind of queries it immediately fetched record by db_key, and in other queries it fetched all records of the table and after that it filtered each record comparing db_keys. So, do you want to tell that server reported me incorrect amount of fetched records (it really fetched one record but shown 10 in statistics) or what? It does not matter for me as for user what plan server generated. I don't like that it fetched all records in case when he should fetch only one. That's why for me fetching of all records of a table is a bad optimization. |
Commented by: @dyemanov Pavel, I do see the optimization problem. Access via DBKEY is *always* reported as INDEX() -- without an index name -- but it's never reported as NATURAL. 10 non-indexed reads clearly show that the inner stream is scanned sequentially instead of involving a fast DBKEY lookup. |
Commented by: @dyemanov This is a regression as compared with v1.5.x. |
Modified by: @dyemanovstatus: Closed [ 6 ] => Reopened [ 4 ] assignee: Dmitry Yemanov [ dimitr ] resolution: Won't Fix [ 2 ] => |
Modified by: @pcisarWorkflow: jira [ 12260 ] => Firebird [ 15596 ] |
Modified by: @dyemanovissuetype: Improvement [ 4 ] => Bug [ 1 ] |
Modified by: @dyemanovstatus: Reopened [ 4 ] => In Progress [ 3 ] |
Commented by: @dyemanov Just to add a note: regression is only the last statement in the example (with a sub-query). And it looks fixed in v2.5. The second sample (inner join) works equally bad in the every FB version. |
Commented by: @dyemanov One more example, slightly different this time: select * from rdb$relations select * from rdb$relations I'd expect to see the plan using DBKEY only, not an index (even primary) or a bitmap involving two predicates. |
Modified by: @dyemanovstatus: In Progress [ 3 ] => Open [ 1 ] |
Modified by: @dyemanovstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 3.0 Alpha 1 [ 10331 ] |
Commented by: @dyemanov Backported into v2.5.3. |
Modified by: @dyemanovFix Version: 2.5.3 [ 10461 ] |
Commented by: @pcisar Test created. |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Modified by: @pavel-zotovQA Status: No test |
Modified by: @pavel-zotovQA Status: No test => Done successfully |
Submitted by: Alexander Tyutik (tut)
Is duplicated by CORE3997
Is related to QA509
Relate to CORE4530
To reproduce the problem you need any table from your DB and next procedure:
CREATE PROCEDURE GET_DB_KEY
RETURNS (
"DB_KEY" CHAR(8) CHARACTER SET OCTETS)
AS
BEGIN
SELECT RDB$DB_KEY FROM YOUR_TABLE WHERE ID = 3 INTO :DB_KEY;
SUSPEND;
END
Now good query with my test table called "Tree" (2 indexed reads):
----------------------------------------------------------------------------------------------------------------------
SELECT * FROM GET_DB_KEY G LEFT JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY"
PLAN JOIN (Tree INDEX (PK_Tree), T INDEX ())
----------------------------------------------------------------------------------------------------------------------
Bad queries:
----------------------------------------------------------------------------------------------------------------------
SELECT * FROM GET_DB_KEY G JOIN "Tree" T ON T.RDB$DB_KEY = G."DB_KEY"
PLAN JOIN (Tree INDEX (PK_Tree), T NATURAL)
----------------------------------------------------------------------------------------------------------------------
SELECT * FROM "Tree" WHERE RDB$DB_KEY = (SELECT "DB_KEY" FROM GET_DB_KEY)
PLAN (Tree INDEX (PK_Tree)) PLAN (Tree NATURAL)
----------------------------------------------------------------------------------------------------------------------
In both queries all records from table "Tree" were fetched
Commits: 134d923 57f2da4 bebc026 a1d9ca6 b07ef45 c7853fa FirebirdSQL/fbt-repository@a8ad99b
The text was updated successfully, but these errors were encountered: