Issue Details (XML | Word | Printable)

Key: CORE-1940
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Tim Kelly
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Firebird Core

OUTER JOIN INEFFICIENT if foreign key exists (especially with views)

Created: 18/Jun/08 05:26 PM   Updated: 19/Jun/08 12:15 PM
Component/s: Engine
Affects Version/s: 2.1.0
Fix Version/s: None


 Description  « Hide
Unless my query performance analyzer is crap or I am being a loony, I think there is scope for greater efficiency by ignoring redundant tables which are outer joined:

Consider the following:

CREATE TABLE_A
(pk INTEGER, FIELD_A integer)

CREATE TABLE_B
(pk INTEGER, FIELD_B integer)

ALTER TABLE TABLE_A ADD PRIMARY KEY (PK);
ALTER TABLE TABLE_B ADD PRIMARY KEY (PK);

ALTER TABLE TABLE_B ADD FOREIGN KEY (PK) REFERENCES TABLE_A(PK);

SELECT TABLE_A.PK, FIELD_A FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

//According to performance analysis TABLE_B's index is queried even though clearly no fields are required from it and because it's left joined it can't affect the rows returned from table_A. The weird thing is that if the index doesn't exist then TABLE_B is optimized out! You may think big deal serve the silly DBA for writing such a stupid query, but consider:

CREATE VIEW VIEW_B
AS
SELECT TABLE_A.PK as PK, FIELD_A, FIELD_B FROM TABLE_A LEFT JOIN TABLE_B ON TABLE_A.PK = TABLE_B.PK

SELECT FIELD_A from VIEW_B

Not so silly now, but inefficient as TABLE_B's index is unnecessarily fetched (according to my analyzer). Can't we detect that TABLE_B is not actually required at all and optimize it out?


 All   Comments   Change History   Subversion Commits      Sort Order: Descending order - Click to sort in ascending order
Tim Kelly added a comment - 19/Jun/08 12:06 PM - edited
Actually having thought about it in the bath I don't think it would be rocket science for this to be optimized, although I have never dared even look at the fire bird source! For simple 1 column joins, pseudocode:

void OptimizeLeftJoin(JOIN)
if ((JOIN.type = LEFT) && JOIN.TABLE_LFT.JOIN_FIELD.IS_UNIQUE() && JOIN.TABLE_RGT.JOIN_FIELD.IS_UNIQUE() && (JOIN.TABLE_RGT.NO_FIELDS_INVOLVED_IN_REST_OF_QUERY() = 0))
{
  WeDontNeed(JOIN.TAB_RGT);
}

...I suspect this pseudo-code might cause firebird to no longer compile if it was copy-and-pasted! But at least I hope I made some1 chuckle?!

Tim Kelly added a comment - 19/Jun/08 11:33 AM
Smirnoff - "Moreover, this query may return more than one record on each record for TABLE_A."

Good point, I hadn't thought of that although as you point out my keys prevent this from occuring, mayb it is too much to expect it to work it out!

To give you a bit of background i've got a number of tables which each add a level of inheritence to an objects properties, say:

ADDRESS <- PERSON <- EMPLOYEE <- SUPERVISOR

So a supervisor's properties are spread over the four tables. I just happened to notice that when I was only requesting properties stored in the SUPERVISOR table It just seemed a shame to keep querying the ADDRESS table when for example I only wanted to know the EMPLOYEE'S hard hat size!

Tim Kelly added a comment - 19/Jun/08 11:22 AM
Dmitry this is a LEFT JOIN so it does not matter if there's a corresponding record in the linked table or not!

Tim Kelly added a comment - 19/Jun/08 08:46 AM
Thanks for your quick response. I am using a trial version of SQL Manager 2008. And the Create Index's were just to give you an
idea of table structure, in reality the tables & indexes were created long ago.

Please compare the two queries below. The two queries can only ever return the same result set, but as you can see
firebird chooses a less efficient plan for the second query, resulting in a lot more fetches.

In practise my queries are machine generated, at compile time so i don't know which fields i will need.
It would be nice to have firebird optimize out the tables I don't need!


=========================================================================

SELECT FIELD_A
FROM
TABLE_A

PLAN (TABLE_A NATURAL)

Query Time
------------------------------------------------
Prepare : 16 ms
Execute : 31 ms
Avg fetch time: 0.69 ms

Memory
------------------------------------------------
Current: 9668700
Max : 10502524
Buffers: 2048

Operations
------------------------------------------------
Reads : 0
Writes : 0
Fetches: 6530

Plan:
------------------------------------------------
PLAN (TABLE_A NATURAL)

Enchanced Info:
+--------------------------+-------+-----------+---------+---------+----------+
| Table Name | Index | Non-Index | Updated | Deleted | Inserted |
| | reads | reads | | | |
+--------------------------+-------+-----------+---------+---------+----------+
| RDB$FIELDS| 12 | 0 | 0 | 0 | 0 |
| RDB$INDEX_SEGMENTS| 1 | 0 | 0 | 0 | 0 |
| RDB$RELATION_FIELDS| 12 | 0 | 0 | 0 | 0 |
| RDB$RELATIONS| 2 | 0 | 0 | 0 | 0 |
| RDB$RELATION_CONSTRAINTS| 1 | 0 | 0 | 0 | 0 |
| RDB$PROCEDURES| 0 | 3 | 0 | 0 | 0 |
| TABLE_A| 0 | 3,002 | 0 | 0 | 0 |
+--------------------------+-------+-----------+---------+---------+----------+

===========================================================================
===========================================================================

SELECT FIELD_A
FROM
TABLE_A LEFT JOIN TABLE_A ON
TABLE_A.PK = TABLE_B.PK

PLAN JOIN (TABLE_A NATURAL, TABLE_B INDEX (RDB$PRIMARY20))

Query Time
------------------------------------------------
Prepare : 0 ms
Execute : 47 ms
Avg fetch time: 1.04 ms

Memory
------------------------------------------------
Current: 9671824
Max : 10502524
Buffers: 2048

Operations
------------------------------------------------
Reads : 0
Writes : 0
Fetches: 24559

Plan:
------------------------------------------------
PLAN JOIN (TABLE_A NATURAL, TABLE_B INDEX (RDB$PRIMARY20)) [RDB$PRIMARY20 = PK on TABLE_B]

Enchanced Info:
+--------------------------+-------+-----------+---------+---------+----------+
| Table Name | Index | Non-Index | Updated | Deleted | Inserted |
| | reads | reads | | | |
+--------------------------+-------+-----------+---------+---------+----------+
| RDB$FIELDS| 12 | 0 | 0 | 0 | 0 |
| RDB$INDEX_SEGMENTS| 1 | 0 | 0 | 0 | 0 |
| RDB$RELATION_FIELDS| 12 | 0 | 0 | 0 | 0 |
| RDB$RELATIONS| 2 | 0 | 0 | 0 | 0 |
| RDB$RELATION_CONSTRAINTS| 1 | 0 | 0 | 0 | 0 |
| RDB$PROCEDURES| 0 | 3 | 0 | 0 | 0 |
| TABLE_B| 3,002 | 0 | 0 | 0 | 0 |
| TABLE_A| 0 | 3,002 | 0 | 0 | 0 |
+--------------------------+-------+-----------+---------+---------+----------+

Smirnoff Serg added a comment - 19/Jun/08 03:52 AM
Moreover, this query may return more than one record on each record for TABLE_A.
Optimizer must be clever enough to find out that the join is on primary key or unique, to make sure that this query returns only one record.

Dmitry Yemanov added a comment - 18/Jun/08 11:20 PM
Honestly, I doubt the optimizer should be that clever. Some people use JOINs in the meaning similar to IN/EXISTS, i.e. to ensure that there's a corresponding record in the linked table. They don't need any data from the linked table(s). With the suggested optimization, their queries won't work anymore, although they're pretty valid from the SQL standard point of view.

Sean Leyne added a comment - 18/Jun/08 08:45 PM
Tim,

1 - You have not provided details regarding the PLANs generated for the statements/Views. Without these details it is not possible to prove/disprove your concerns.

2 - The name of your analyzer would be also valuable, since the issue may be a known problem with the tool.

3 - I believe that you are confusing/misreading the PLAN details, since the statements provided require that Table B be accessed in order to to resolve FIELD_B in the VIEW_B.

4 - Did you perform COMMITs after the Primary Keys and Constraints were defined?

Tim Kelly added a comment - 18/Jun/08 05:35 PM
If the answer is no I suspect it has something to do with pre-compiled PLAN's ?!