Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sorting performance when long VARCHARs are involved [CORE2650] #3057

Closed
firebird-automations opened this issue Sep 29, 2009 · 8 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pmakowski

Two tables :
CREATE TABLE TABLE1
(
ID INTEGER NOT NULL,
CHAMP1 VARCHAR( 32) COLLATE ISO8859_1,
TABLE2_ID INTEGER NOT NULL,
REF_REMPLACEMENT VARCHAR( 10000) COLLATE ISO8859_1,
CONSTRAINT PK_TABLE1 PRIMARY KEY (ID)
);
CREATE TABLE TABLE2
(
ID INTEGER NOT NULL,
CHAMP2 VARCHAR( 32) COLLATE ISO8859_1,
CONSTRAINT PK_TABLE2 PRIMARY KEY (ID)
);
CREATE ASC INDEX IDX_TABLE1_1 ON TABLE1 (CHAMP1);
CREATE ASC INDEX IDX_TABLE2_1 ON TABLE2 (CHAMP2);
ALTER TABLE TABLE1 ADD CONSTRAINT FK_TABLE1_1
FOREIGN KEY (TABLE2_ID) REFERENCES TABLE2
(ID)
ON DELETE NO ACTION
ON UPDATE CASCADE
;

The SP to fill tables :

CREATE PROCEDURE SP_GEN_DATA AS
declare i integer;
declare j integer;
begin
i=0;
j=0;
while(i<=15000)do
begin
while(j<=4) do
begin
j=j+1;
INSERT INTO table2 VALUES (:j,'TBL'||:j);

end
INSERT INTO Table1 VALUES (:i,'VAL'||:i,:j,NULL);

i=i+1;
end
end

and here the queries :

the first that is slow :

SELECT Table1.Champ1,Table1.REF_REMPLACEMENT,Table2.Champ2
FROM Table1 INNER JOIN Table2 ON Table1.Table2_ID=http://Table2.ID
ORDER BY Table1.Champ1

and two others that have good speed :

SELECT Table1.Champ1,Table2.Champ2
FROM Table1 INNER JOIN Table2 ON Table1.Table2_ID=http://Table2.ID
ORDER BY Table1.Champ1

WITH A (Champ1,Champ2,ID) AS
(SELECT Table1.Champ1, Table2.Champ2, http://Table1.ID
FROM Table1 INNER JOIN Table2 ON Table1.Table2_ID=http://Table2.ID
ORDER BY Table1.Champ1)
SELECT A.Champ1, Table1.REF_REMPLACEMENT,A.champ2
FROM A JOIN Table1 ON http://Table1.ID=A.ID

funny to see that the latest make twice more indexed reads on Table1 but is really faster that the initial one

Commits: 8e692d2

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I'd say this is somewhat expected. Long varchar here is indirectly involved in the sorting -- we read all the necessary fields in a single pass (in the storage order) and then sort the records on disk. This is usually considered being cheaper than sorting only keys and then reading the whole table again (out of storage order) to fetch the remaining fields. But in fact, longer records in turn cause more sorting I/O and this may finally become more expensive than fetching all the records twice, as we see in this example.

I don't see any good solution here. If we'd implement two kinds of sorting (keys only and records), then it would be hard to estimate their costs in order to make a choice (much depends on whether the table is cached or not). Another option would be to compress the records in the sorting blocks thus reducing I/O but it would badly impact small (in-memory) sorts.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

issuetype: Bug [ 1 ] => Improvement [ 4 ]

summary: ORDER BY slowed down with a varchar not involved => Improve ORDER BY performance when long VARCHARs are involved

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

v4 will get this improved. If the optimizer sees a long sort record (usually due to long VARCHARs involved), it excludes all non-key fields from the sort. Once the sorting is done, records are re-fetched from corresponding tables. The threshold (what is considered long enough) is configurable.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

summary: Improve ORDER BY performance when long VARCHARs are involved => Improve sorting performance when long VARCHARs are involved

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 RC 1 [ 10930 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Done successfully

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment