Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow executing a cross join of 3 tables in Firebird 2.X [CORE2200] #2628

Closed
firebird-automations opened this issue Nov 19, 2008 · 20 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Chau Chee Yang (cheeyang)

Relate to CORE3127

Attachments:
CORE.2200.zip

I have 3 tables:

1. Agent (0 row)
2. AR_PM (4,300+ rows)
3. AR_KnockOff (18,000+ rows)

I execute the following statement in Firebird 2.1.1:

SELECT A.Code FROM AGENT A, AR_PM B, AR_KnockOff C

It takes long times to get the first response back to my client software (IB Expert). However, this situation doesn't happen on Firebird 1.5 running the same query and same amount of data.

Case 1:

Firebird 2.1.1: 173 seconds
Firebird 1.5.X: 0 seconds

Case 2:

If I append one row into AGENT table, I get the response immediately after running the query:

Firebird 2.1.1: 0.093 seconds
Firebird 1.5.X: 0.031 seconds

I suspect there is some problem with Firebird 2.X. I also attach my database backup for both Firebird 1.5 and 2.1 for you to try out. Please pay attention on the time spent to run the query on Case 1.

Commits: 6e4c0cd 8120245 810f89b 84cd9b7 6220614 939d262 c77de9d

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Chau Chee Yang (cheeyang)

Firebird 2.X

1. Extract all files from the zip archive.
2. Restore db_2_0.fbk into firebird 2.X server
3. Execute statement: SELECT A.Code FROM AGENT A, AR_PM B, AR_KnockOff C
4. It takes more than 170 seconds to response

Firebird 1.5.X

1. Restore db_1_5.fbk into firebird 1.5 server
2. Execute statement: SELECT A.Code FROM AGENT A, AR_PM B, AR_KnockOff C
3. you get immediate response

@firebird-automations
Copy link
Collaborator Author

Modified by: Chau Chee Yang (cheeyang)

Attachment: CORE.2200.zip [ 11173 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

summary: Extremely slow executing a natural join of 3 tables in Firebird 2.X => Extremely slow executing a cross join of 3 tables in Firebird 2.X

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

The problem is slightly unusual. With a cross join without any predicates, it actually makes no difference which table to start from. FB 1.5 ignores this fact and always tries to start from a smaller table, as for a regular predicate-driven join. FB 2.0 pretends to be smarter and avoids the [sometimes expensive] cost calculation, starting from the any independent table (in the current implementation, this is the last one in the list). In fact, both solutions should deliver nearly the same performance in all cases but one: when any of the tables is empty. In this case, it would be ideal to put it into the first place in order to skip reading other tables at all. But I wouldn't say that processing empty tables in cross joins is a common real-world practice, so perhaps you're the only person affected :-)

So, I see three possibilities here:

1) Remove the clever "fast-route" code and rely on costs, as in FB 1.5. This is dumb but it handles the empty table case automagically.
2) Add a special check for empty tables into the "fast-route" code.
3) Do nothing and treat this as a "no issue".

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

My answer:

3) Do nothing and treat this as a "no issue"

Natural Join are the dumbest thing that a user/developer could do.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

priority: Critical [ 2 ] => Minor [ 4 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Chau Chee Yang (cheeyang)

The reason I wrote this natural join query is to retrieve a structure (metadata).

My statement may have 2, 3, 4, or 5 statements. It is costly if I perform checking on the row count in each table and re-arrange the join tables sequence.

I personally feel this is an issue as MSSQL or MYSQL doesn't has this problem at all. It should be a problem in Firebird 2.X engine that didn't handle this case well.

Please do consider to fix it. Thank you.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I'm going to implement a fourth option, which should work fine regardless of the optimizer decision :-)

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => In Progress [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: In Progress [ 3 ] => Open [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.1.2 [ 10270 ]

Fix Version: 2.5 Beta 1 [ 10251 ]

Fix Version: 2.0.6 [ 10303 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Dmirty, so what is the fourth option?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Improve the cross join algorithm to stop as soon as any of the involved streams is detected as empty.

@firebird-automations
Copy link
Collaborator Author

Commented by: Chau Chee Yang (cheeyang)

Thank you very much

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue relate to CORE3127 [ CORE3127 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

IMO, engine should not begin ANY kind of join (not only cross) when absence of rows in at least one of it's data sources forces final result to be empty.

Sample:

recreate table t0(id int);
recreate table t1(id int);
recreate table t2(id int);
recreate table t3(id int);
commit;
insert into t0 select rand()*100 from rdb$types,rdb$types rows 30000;
commit;

insert into t1 select * from t0;
insert into t2 select * from t0;
--------- ::: NB ::: we do NOT add any row to the table `t3`, it remains empty -----------
commit;

set stat on;
set echo on;

select 1
from t1
natural join t2
natural join t3;

select 1
from t2
natural join t3
natural join t1;

select 1
from t3
natural join t2
natural join t1;

Result:

select 1
from t1
natural join t2
natural join t3;
Current memory = 3191104
Delta memory = 38040
Max memory = 4863032
Elapsed time= 14.562 sec
Buffers = 256
Reads = 769
Writes = 8
Fetches = 121552

select 1
from t2
natural join t3
natural join t1;
Current memory = 3191280
Delta memory = 176
Max memory = 4863032
Elapsed time= 0.195 sec
Buffers = 256
Reads = 747
Writes = 0
Fetches = 121491

select 1
from t3
natural join t2
natural join t1;
Current memory = 3191280
Delta memory = 0
Max memory = 4863032
Elapsed time= 0.196 sec
Buffers = 256
Reads = 748
Writes = 0
Fetches = 121491

Trace:

Query-1:
-----------
select 1
from t1
natural join t2
natural join t3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PLAN HASH (HASH (T2 NATURAL, T1 NATURAL), T3 NATURAL)
0 records fetched
14558 ms, 744 read(s), 8 write(s), 121487 fetch(es)

Table Natural Index
******************************************************
T1 30000
T2 30000

Query-2:
-----------
select 1
from t2
natural join t3
natural join t1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PLAN HASH (T1 NATURAL, HASH (T2 NATURAL, T3 NATURAL)
0 records fetched
193 ms, 743 read(s), 121487 fetch(es)

Table Natural Index
****************************************************
T1 30000
T2 30000

Query-3:
------------
select 1
from t3
natural join t2
natural join t1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PLAN HASH (T1 NATURAL, HASH (T2 NATURAL, T3 NATURAL))
0 records fetched
192 ms, 744 read(s), 121487 fetch(es)

Table Natural Index
******************************************************
T1 30000
T2 30000

All scans of tables `t1` & `t2` looks like wasted labour...

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

QA Status: No test

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test => Done successfully

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment