New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sorting performance when long VARCHARs are involved [CORE2650] #3057
Comments
Commented by: @dyemanov I'd say this is somewhat expected. Long varchar here is indirectly involved in the sorting -- we read all the necessary fields in a single pass (in the storage order) and then sort the records on disk. This is usually considered being cheaper than sorting only keys and then reading the whole table again (out of storage order) to fetch the remaining fields. But in fact, longer records in turn cause more sorting I/O and this may finally become more expensive than fetching all the records twice, as we see in this example. I don't see any good solution here. If we'd implement two kinds of sorting (keys only and records), then it would be hard to estimate their costs in order to make a choice (much depends on whether the table is cached or not). Another option would be to compress the records in the sorting blocks thus reducing I/O but it would badly impact small (in-memory) sorts. |
Modified by: @dyemanovassignee: Dmitry Yemanov [ dimitr ] |
Modified by: @dyemanovissuetype: Bug [ 1 ] => Improvement [ 4 ] summary: ORDER BY slowed down with a varchar not involved => Improve ORDER BY performance when long VARCHARs are involved |
Commented by: @dyemanov v4 will get this improved. If the optimizer sees a long sort record (usually due to long VARCHARs involved), it excludes all non-key fields from the sort. Once the sorting is done, records are re-fetched from corresponding tables. The threshold (what is considered long enough) is configurable. |
Modified by: @dyemanovsummary: Improve ORDER BY performance when long VARCHARs are involved => Improve sorting performance when long VARCHARs are involved |
Modified by: @dyemanovstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 4.0 RC 1 [ 10930 ] |
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Closed [ 6 ] |
Submitted by: @pmakowski
Two tables :
CREATE TABLE TABLE1
(
ID INTEGER NOT NULL,
CHAMP1 VARCHAR( 32) COLLATE ISO8859_1,
TABLE2_ID INTEGER NOT NULL,
REF_REMPLACEMENT VARCHAR( 10000) COLLATE ISO8859_1,
CONSTRAINT PK_TABLE1 PRIMARY KEY (ID)
);
CREATE TABLE TABLE2
(
ID INTEGER NOT NULL,
CHAMP2 VARCHAR( 32) COLLATE ISO8859_1,
CONSTRAINT PK_TABLE2 PRIMARY KEY (ID)
);
CREATE ASC INDEX IDX_TABLE1_1 ON TABLE1 (CHAMP1);
CREATE ASC INDEX IDX_TABLE2_1 ON TABLE2 (CHAMP2);
ALTER TABLE TABLE1 ADD CONSTRAINT FK_TABLE1_1
FOREIGN KEY (TABLE2_ID) REFERENCES TABLE2
(ID)
ON DELETE NO ACTION
ON UPDATE CASCADE
;
The SP to fill tables :
CREATE PROCEDURE SP_GEN_DATA AS
declare i integer;
declare j integer;
begin
i=0;
j=0;
while(i<=15000)do
begin
while(j<=4) do
begin
j=j+1;
INSERT INTO table2 VALUES (:j,'TBL'||:j);
end
INSERT INTO Table1 VALUES (:i,'VAL'||:i,:j,NULL);
i=i+1;
end
end
and here the queries :
the first that is slow :
SELECT Table1.Champ1,Table1.REF_REMPLACEMENT,Table2.Champ2
FROM Table1 INNER JOIN Table2 ON Table1.Table2_ID=http://Table2.ID
ORDER BY Table1.Champ1
and two others that have good speed :
SELECT Table1.Champ1,Table2.Champ2
FROM Table1 INNER JOIN Table2 ON Table1.Table2_ID=http://Table2.ID
ORDER BY Table1.Champ1
WITH A (Champ1,Champ2,ID) AS
(SELECT Table1.Champ1, Table2.Champ2, http://Table1.ID
FROM Table1 INNER JOIN Table2 ON Table1.Table2_ID=http://Table2.ID
ORDER BY Table1.Champ1)
SELECT A.Champ1, Table1.REF_REMPLACEMENT,A.champ2
FROM A JOIN Table1 ON http://Table1.ID=A.ID
funny to see that the latest make twice more indexed reads on Table1 but is really faster that the initial one
Commits: 8e692d2
The text was updated successfully, but these errors were encountered: