Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple indexed join query slows terrible down with useage of 'like' [CORE2267] #2693

Open
firebird-automations opened this issue Jan 6, 2009 · 6 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Roy Damman (royd)

Problem discovered when trying to migrate a project from Firebird 1.5.x to 2.0 and after that to (the latest) version 2.1.1.
Note: All execution times are obtained by executing the involved queries twice and take the second time, so there would not be any caching issues.

2 Tables:
Table NUMMER:
select count(*) from nummer: > 14000000

Table NAAM:
select count(*) from naam: > 3700000

Query A:
select count(*) from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam like 'SEARCHEDNAME%'
Result: 11
Execution time: 2.0188
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

Query B:
select count(*) from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam = 'SEARCHEDNAME'
or even:
select count(*) from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam like 'SEARCHEDNAME'
Result: 6
Execution time: 0.0031
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

In both cases the same plan is used. And I think both queries can use the same lookup method of the matching B.naam values because the start of the string ('SEARCHEDNAME') is the same. But the first query is about 650 times slower then the second query and has only a double count of records involved. This problem did not occur with Firebird 1.5.X but this behaviour makes Firebird 2.x not useable for this project. Also I tried * instead of count(*).

Results:

Query A:
Execution time: 4.0437
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

Query B:
Execution time: 0.0079
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

Again a huge difference: Query B is 512 times faster.
I also tried the search with different names. All had the same result in execution time difference.

The Tables:

CREATE TABLE "NUMMER"
(
"ID" INTEGER NOT NULL,
"NUMMER" VARCHAR(12),
"NAAMID" INTEGER,
"POSTCODE" CHAR(8),
"HUISNUMMER_N" INTEGER,
PRIMARY KEY ("ID")
);

CREATE INDEX "NUMMER_NAAMNUMMER" ON "NUMMER"("NAAMID", "NUMMER");
CREATE INDEX "NUMMER_NUMMER" ON "NUMMER"("NUMMER", "NAAMID");
CREATE INDEX "NUMMER_PCHUISNR" ON "NUMMER"("POSTCODE", "HUISNUMMER_N");

CREATE TABLE "NAAM"
(
"NAAMID" INTEGER NOT NULL,
"NAAM" VARCHAR(50),
PRIMARY KEY ("NAAMID")
);

CREATE INDEX "NAAM_NAAM" ON "NAAM"("NAAM");

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Show us full execution statistics, please
If possible, for both FB 1.5 and FB 2.1

@firebird-automations
Copy link
Collaborator Author

Commented by: Mercea Paul (m24paul)

If you always have LIKE '81%' you could use STARTING WITH '81'

select count(*) from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE STARTING WITH '81' and B.naam = 'SEARCHEDNAME'

@firebird-automations
Copy link
Collaborator Author

Commented by: Roy Damman (royd)

@ Vlad

Which execution statistics do you want and how do I obtain them?

Extra statistics for Firebird 2.1.1 with usage of * instead of count(*)

Query A:
select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam like 'SEARCHEDNAME%'

Execution time: 00:00:04.0343
Prepare time: 00:00:00.0032
Starting Memory: 86508148
Current Memory: 86695300
Delta Memory: 187152
Number of Buffers: 20000
Reads: 1480
Writes: 31
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))
Rows Affected: 11

Query B:
select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam = 'SEARCHEDNAME'

Execution time: 00:00:00.0062
Prepare time: 00:00:00.0031
Starting Memory: 86508080
Current Memory: 86695192
Delta Memory: 187112
Number of Buffers: 20000
Reads: 1480
Writes: 39
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))
Rows Affected: 6

@mercea
This is not the case. And it shouldn't make any difference if the optimizer was OK.

select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE STARTING WITH '81' and B.naam = 'SEARCHEDNAME'
Execution time: 00:00:00.0062
Prepare time: 00:00:00.0016
Rows Affected: 6
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE STARTING WITH '81' and B.naam LIKE 'SEARCHEDNAME%'
Execution time: 00:00:04.0359
Prepare time: 00:00:00.0031
Rows Affected: 11
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE STARTING WITH '81' and B.naam STARTING WITH 'SEARCHEDNAME'
Execution time: 00:00:04.0359 (this is right, exact the same)
Prepare time: 00:00:00.0032
Rows Affected: 11
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

select * from nummer A JOIN NAAM B ON (B.NAAMID=A.NAAMID) where A.POSTCODE LIKE '81%' and B.naam STARTING WITH 'SEARCHEDNAME'
Execution time: 00:00:04.0375
Prepare time: 00:00:00.0031
Rows Affected: 11
Plan: PLAN JOIN (B INDEX (NAAM_NAAM), A INDEX (NUMMER_PCHUISNR, NUMMER_NAAMNUMMER))

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Please, show also results and statistics for the following queries :

SELECT COUNT(*) FROM NAAM WHERE naam LIKE 'SEARCHEDNAME%'
SELECT COUNT(*) FROM NAAM WHERE naam = 'SEARCHEDNAME'
SELECT COUNT(*) FROM nummer WHERE POSTCODE LIKE '81%' and NAAMID = <any id from the query above>

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Number of fetches would also be useful, I suppose.

@firebird-automations
Copy link
Collaborator Author

Commented by: Roy Damman (royd)

SELECT COUNT(*) FROM NAAM WHERE naam LIKE 'SEARCHEDNAME%'
COUNT: 91
Execution time: 00:00:00.0000
Prepare time: 00:00:00.0000
Starting Memory: 86328284
Current Memory: 86339364
Delta Memory: 11080
Number of Buffers: 20000
Reads: 313
Writes: 17
Plan: PLAN (NAAM INDEX (NAAM_NAAM))
Rows Affected: 1

SELECT COUNT(*) FROM NAAM WHERE naam = 'SEARCHEDNAME'
COUNT: 1
Execution time: 00:00:00.0000
Prepare time: 00:00:00.0000
Starting Memory: 86329632
Current Memory: 86333200
Delta Memory: 3568
Number of Buffers: 20000
Reads: 313
Writes: 29
Plan: PLAN (NAAM INDEX (NAAM_NAAM))
Rows Affected: 1

SELECT COUNT(*) FROM nummer WHERE POSTCODE LIKE '81%' and NAAMID = 11667 (NAAMID FROM QUERY 2)
COUNT: 6
Execution time: 00:00:00.0031
Prepare time: 00:00:00.0000
Starting Memory: 86344516
Current Memory: 86511124
Delta Memory: 166608
Number of Buffers: 20000
Reads: 992
Writes: 47
Plan: PLAN (NAAM INDEX (NAAM_NAAM))
Rows Affected: 1

I understand in which direction you are searching the explanation. So I changed SEARCHNAME in naam in UNIQUENAME so that the count of =UNIQUENAME and LIKE UNIQUENAME% would be the same. After that the statistics of QUERY A and QUERY B where in the same range. I will install FB 1.5.x once again and repeat the tests above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant