Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OUTER JOIN INEFFICIENT if foreign key exists (especially with views) [CORE1940] #2379

Open
firebird-automations opened this issue Jun 18, 2008 · 11 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Tim Kelly (m00bh000)

Unless my query performance analyzer is crap or I am being a loony, I think there is scope for greater efficiency by ignoring redundant tables which are outer joined:

Consider the following:

CREATE TABLE_A
(pk INTEGER, FIELD_A integer)

CREATE TABLE_B
(pk INTEGER, FIELD_B integer)

ALTER TABLE TABLE_A ADD PRIMARY KEY (PK);
ALTER TABLE TABLE_B ADD PRIMARY KEY (PK);

ALTER TABLE TABLE_B ADD FOREIGN KEY (PK) REFERENCES TABLE_A(PK);

SELECT TABLE_A.PK, FIELD_A FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

//According to performance analysis TABLE_B's index is queried even though clearly no fields are required from it and because it's left joined it can't affect the rows returned from table_A. The weird thing is that if the index doesn't exist then TABLE_B is optimized out! You may think big deal serve the silly DBA for writing such a stupid query, but consider:

CREATE VIEW VIEW_B
AS
SELECT TABLE_A.PK as PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT FIELD_A from VIEW_B

Not so silly now, but inefficient as TABLE_B's index is unnecessarily fetched (according to my analyzer). Can't we detect that TABLE_B is not actually required at all and optimize it out?

@firebird-automations
Copy link
Collaborator Author

Modified by: Tim Kelly (m00bh000)

description: Unless my query performance analyzer is crap or I am being a loony, I think there is scope for greater efficiency by ignoring redundant tables which are outer joined:

Consider the following:

CREATE TABLE_A
(pk INTEGER, FIELD_A integer)

CREATE TABLE_B
(pk INTEGER, FIELD_B integer)

SELECT TABLE_A.PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK
//According to performance analysis TABLE_B is queried even though clearly no fields are required from it and because it's left joined it can't affect the rows returned from table_A. Can't we detect that it plays no role in the query and ignore it? When you start scaling this up I think it could have considerable performance implications?!

CREATE VIEW VIEW_A
AS
SELECT TABLE_A.PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT * FROM VIEW_A
//Same issue as above

CREATE VIEW VIEW_B
AS
SELECT TABLE_A.PK as PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT FIELD_A from VIEW_B
//Once again why do we need to go to the bother of linking TABLE_B it is redundant!?

=>

Unless my query performance analyzer is crap or I am being a loony, I think there is scope for greater efficiency by ignoring redundant tables which are outer joined:

Consider the following:

CREATE TABLE_A
(pk INTEGER, FIELD_A integer)

CREATE TABLE_B
(pk INTEGER, FIELD_B integer)

SELECT TABLE_A.PK, FIELD_A FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

//According to performance analysis TABLE_B is queried even though clearly no fields are required from it and because it's left joined it can't affect the rows returned from table_A. You may think big deal serve the silly DBA for writing such a stupid query, but consider:

CREATE VIEW VIEW_B
AS
SELECT TABLE_A.PK as PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT FIELD_A from VIEW_B

Not so silly now, but inefficient as TABLE_B is unnecessarily queried (according to my analyzer). Can't we detect that TABLE_B is not actually required at all and optimize it out?

@firebird-automations
Copy link
Collaborator Author

Commented by: Tim Kelly (m00bh000)

If the answer is no I suspect it has something to do with pre-compiled PLAN's ?!

@firebird-automations
Copy link
Collaborator Author

Modified by: Tim Kelly (m00bh000)

description: Unless my query performance analyzer is crap or I am being a loony, I think there is scope for greater efficiency by ignoring redundant tables which are outer joined:

Consider the following:

CREATE TABLE_A
(pk INTEGER, FIELD_A integer)

CREATE TABLE_B
(pk INTEGER, FIELD_B integer)

SELECT TABLE_A.PK, FIELD_A FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

//According to performance analysis TABLE_B is queried even though clearly no fields are required from it and because it's left joined it can't affect the rows returned from table_A. You may think big deal serve the silly DBA for writing such a stupid query, but consider:

CREATE VIEW VIEW_B
AS
SELECT TABLE_A.PK as PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT FIELD_A from VIEW_B

Not so silly now, but inefficient as TABLE_B is unnecessarily queried (according to my analyzer). Can't we detect that TABLE_B is not actually required at all and optimize it out?

=>

Unless my query performance analyzer is crap or I am being a loony, I think there is scope for greater efficiency by ignoring redundant tables which are outer joined:

Consider the following:

CREATE TABLE_A
(pk INTEGER, FIELD_A integer)

CREATE TABLE_B
(pk INTEGER, FIELD_B integer)

ALTER TABLE TABLE_A ADD PRIMARY KEY (PK);
ALTER TABLE TABLE_B ADD PRIMARY KEY (PK);

ALTER TABLE TABLE_B ADD FOREIGN KEY (PK) REFERENCES TABLE_A(PK);

SELECT TABLE_A.PK, FIELD_A FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

//According to performance analysis TABLE_B's index is queried even though clearly no fields are required from it and because it's left joined it can't affect the rows returned from table_A. The weird thing is that if the index doesn't exist then TABLE_B is optimized out! You may think big deal serve the silly DBA for writing such a stupid query, but consider:

CREATE VIEW VIEW_B
AS
SELECT TABLE_A.PK as PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT FIELD_A from VIEW_B

Not so silly now, but inefficient as TABLE_B is unnecessarily queried (according to my analyzer). Can't we detect that TABLE_B is not actually required at all and optimize it out?

summary: OUTER JOIN INEFFICIENT (especially with views) => OUTER JOIN INEFFICIENT if foreign key exists (especially with views)

@firebird-automations
Copy link
Collaborator Author

Modified by: Tim Kelly (m00bh000)

description: Unless my query performance analyzer is crap or I am being a loony, I think there is scope for greater efficiency by ignoring redundant tables which are outer joined:

Consider the following:

CREATE TABLE_A
(pk INTEGER, FIELD_A integer)

CREATE TABLE_B
(pk INTEGER, FIELD_B integer)

ALTER TABLE TABLE_A ADD PRIMARY KEY (PK);
ALTER TABLE TABLE_B ADD PRIMARY KEY (PK);

ALTER TABLE TABLE_B ADD FOREIGN KEY (PK) REFERENCES TABLE_A(PK);

SELECT TABLE_A.PK, FIELD_A FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

//According to performance analysis TABLE_B's index is queried even though clearly no fields are required from it and because it's left joined it can't affect the rows returned from table_A. The weird thing is that if the index doesn't exist then TABLE_B is optimized out! You may think big deal serve the silly DBA for writing such a stupid query, but consider:

CREATE VIEW VIEW_B
AS
SELECT TABLE_A.PK as PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT FIELD_A from VIEW_B

Not so silly now, but inefficient as TABLE_B is unnecessarily queried (according to my analyzer). Can't we detect that TABLE_B is not actually required at all and optimize it out?

=>

Unless my query performance analyzer is crap or I am being a loony, I think there is scope for greater efficiency by ignoring redundant tables which are outer joined:

Consider the following:

CREATE TABLE_A
(pk INTEGER, FIELD_A integer)

CREATE TABLE_B
(pk INTEGER, FIELD_B integer)

ALTER TABLE TABLE_A ADD PRIMARY KEY (PK);
ALTER TABLE TABLE_B ADD PRIMARY KEY (PK);

ALTER TABLE TABLE_B ADD FOREIGN KEY (PK) REFERENCES TABLE_A(PK);

SELECT TABLE_A.PK, FIELD_A FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

//According to performance analysis TABLE_B's index is queried even though clearly no fields are required from it and because it's left joined it can't affect the rows returned from table_A. The weird thing is that if the index doesn't exist then TABLE_B is optimized out! You may think big deal serve the silly DBA for writing such a stupid query, but consider:

CREATE VIEW VIEW_B
AS
SELECT TABLE_A.PK as PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT FIELD_A from VIEW_B

Not so silly now, but inefficient as TABLE_B's index is unnecessarily fetched (according to my analyzer). Can't we detect that TABLE_B is not actually required at all and optimize it out?

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Tim,

1 - You have not provided details regarding the PLANs generated for the statements/Views. Without these details it is not possible to prove/disprove your concerns.

2 - The name of your analyzer would be also valuable, since the issue may be a known problem with the tool.

3 - I believe that you are confusing/misreading the PLAN details, since the statements provided require that Table B be accessed in order to to resolve FIELD_B in the VIEW_B.

4 - Did you perform COMMITs after the Primary Keys and Constraints were defined?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Honestly, I doubt the optimizer should be that clever. Some people use JOINs in the meaning similar to IN/EXISTS, i.e. to ensure that there's a corresponding record in the linked table. They don't need any data from the linked table(s). With the suggested optimization, their queries won't work anymore, although they're pretty valid from the SQL standard point of view.

@firebird-automations
Copy link
Collaborator Author

Commented by: Smirnoff Serg (wildsery)

Moreover, this query may return more than one record on each record for TABLE_A.
Optimizer must be clever enough to find out that the join is on primary key or unique, to make sure that this query returns only one record.

@firebird-automations
Copy link
Collaborator Author

Commented by: Tim Kelly (m00bh000)

Thanks for your quick response. I am using a trial version of SQL Manager 2008. And the Create Index's were just to give you an
idea of table structure, in reality the tables & indexes were created long ago.

Please compare the two queries below. The two queries can only ever return the same result set, but as you can see
firebird chooses a less efficient plan for the second query, resulting in a lot more fetches.

In practise my queries are machine generated, at compile time so i don't know which fields i will need.
It would be nice to have firebird optimize out the tables I don't need!

=========================================================================

SELECT FIELD_A
FROM
TABLE_A

PLAN (TABLE_A NATURAL)

Query Time
------------------------------------------------
Prepare : 16 ms
Execute : 31 ms
Avg fetch time: 0.69 ms

Memory
------------------------------------------------
Current: 9668700
Max : 10502524
Buffers: 2048

Operations
------------------------------------------------
Reads : 0
Writes : 0
Fetches: 6530

Plan:
------------------------------------------------
PLAN (TABLE_A NATURAL)

Enchanced Info:
+--------------------------+-------+-----------+---------+---------+----------+
| Table Name | Index | Non-Index | Updated | Deleted | Inserted |
| | reads | reads | | | |
+--------------------------+-------+-----------+---------+---------+----------+
| RDB$FIELDS| 12 | 0 | 0 | 0 | 0 |
| RDB$INDEX_SEGMENTS| 1 | 0 | 0 | 0 | 0 |
| RDB$RELATION_FIELDS| 12 | 0 | 0 | 0 | 0 |
| RDB$RELATIONS| 2 | 0 | 0 | 0 | 0 |
| RDB$RELATION_CONSTRAINTS| 1 | 0 | 0 | 0 | 0 |
| RDB$PROCEDURES| 0 | 3 | 0 | 0 | 0 |
| TABLE_A| 0 | 3,002 | 0 | 0 | 0 |
+--------------------------+-------+-----------+---------+---------+----------+

===========================================================================

SELECT FIELD_A
FROM
TABLE_A LEFT JOIN TABLE_A ON
TABLE_A.PK = TABLE_B.PK

PLAN JOIN (TABLE_A NATURAL, TABLE_B INDEX (RDB$PRIMARY20))

Query Time
------------------------------------------------
Prepare : 0 ms
Execute : 47 ms
Avg fetch time: 1.04 ms

Memory
------------------------------------------------
Current: 9671824
Max : 10502524
Buffers: 2048

Operations
------------------------------------------------
Reads : 0
Writes : 0
Fetches: 24559

Plan:
------------------------------------------------
PLAN JOIN (TABLE_A NATURAL, TABLE_B INDEX (RDB$PRIMARY20)) [RDB$PRIMARY20 = PK on TABLE_B]

Enchanced Info:
+--------------------------+-------+-----------+---------+---------+----------+
| Table Name | Index | Non-Index | Updated | Deleted | Inserted |
| | reads | reads | | | |
+--------------------------+-------+-----------+---------+---------+----------+
| RDB$FIELDS| 12 | 0 | 0 | 0 | 0 |
| RDB$INDEX_SEGMENTS| 1 | 0 | 0 | 0 | 0 |
| RDB$RELATION_FIELDS| 12 | 0 | 0 | 0 | 0 |
| RDB$RELATIONS| 2 | 0 | 0 | 0 | 0 |
| RDB$RELATION_CONSTRAINTS| 1 | 0 | 0 | 0 | 0 |
| RDB$PROCEDURES| 0 | 3 | 0 | 0 | 0 |
| TABLE_B| 3,002 | 0 | 0 | 0 | 0 |
| TABLE_A| 0 | 3,002 | 0 | 0 | 0 |
+--------------------------+-------+-----------+---------+---------+----------+

@firebird-automations
Copy link
Collaborator Author

Commented by: Tim Kelly (m00bh000)

Dmitry this is a LEFT JOIN so it does not matter if there's a corresponding record in the linked table or not!

@firebird-automations
Copy link
Collaborator Author

Commented by: Tim Kelly (m00bh000)

Smirnoff - "Moreover, this query may return more than one record on each record for TABLE_A."

Good point, I hadn't thought of that although as you point out my keys prevent this from occuring, mayb it is too much to expect it to work it out!

To give you a bit of background i've got a number of tables which each add a level of inheritence to an objects properties, say:

ADDRESS <- PERSON <- EMPLOYEE <- SUPERVISOR

So a supervisor's properties are spread over the four tables. I just happened to notice that when I was only requesting properties stored in the SUPERVISOR table It just seemed a shame to keep querying the ADDRESS table when for example I only wanted to know the EMPLOYEE'S hard hat size!

@firebird-automations
Copy link
Collaborator Author

Commented by: Tim Kelly (m00bh000)

Actually having thought about it in the bath I don't think it would be rocket science for this to be optimized, although I have never dared even look at the fire bird source! For simple 1 column joins, pseudocode:

void OptimizeLeftJoin(JOIN)
if ((JOIN.type = LEFT) && JOIN.TABLE_LFT.JOIN_FIELD.IS_UNIQUE() && JOIN.TABLE_RGT.JOIN_FIELD.IS_UNIQUE() && (JOIN.TABLE_RGT.NO_FIELDS_INVOLVED_IN_REST_OF_QUERY() = 0))
{
WeDontNeed(JOIN.TAB_RGT);
}

...I suspect this pseudo-code might cause firebird to no longer compile if it was copy-and-pasted! But at least I hope I made some1 chuckle?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant