Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A data source for JOIN that is result of aggregating must be used only one time [CORE3013] #3394

Open
firebird-automations opened this issue May 24, 2010 · 14 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @pavel-zotov

Consider two variants of query against table EMPLOYEE of "standard" database that is supplied with any FB.

Both variant results the rows with EMP_NO and such SALARY that are greater than average salary of all employees.
Also we call generator emp_no_gen to determine the quontity of operations that occures during this action:
1) via NON-correlated subquery:
select e.emp_no, e.salary, gen_id(emp_no_gen,0) g
from employee e
where (e.salary > (select avg(ex.salary)+0*gen_id(emp_no_gen,1) avgval from employee ex))

Sequential running of this variant shows that EMP_NO_GEN increases by 1 at each time. So we can be sure that aggregate part (inside WHERE) is evaluated only once.

2) via JOIN:
select e.emp_no, e.salary, gen_id(emp_no_gen,0) g
from employee e
join (select avg(ex.salary)+0*gen_id(emp_no_gen,1) avgval from employee ex) ea on e.salary > ea.avgval

In this case the EMP_NO_GEN increases by 42 each time (i.e. by step that is equal select count(*) from emp).

I think it will be useful to make Firebird "understand" that such part as 2nd in this join must be evaluated only once, like in variant 1.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

You shouldn't use GEN_ID as such and expect consistent results.

For the case you want, FB 3 has window functions, see it in my blog http://asfernandes.blogspot.com.

For now, one way to do that type of thing is with selectable procedures.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

Adriano,

I'm talking only about FB optimizer and how to make it much smarter.
GEN_ID has been used in these samples only for illustration that unnecessary work is done in case of JOIN.

I know that some heuristic rules are placed inside opt.cpp and propose that these rules can be improved.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

PS. OFFTOP. Just finished read your blog about windowed function. This is THE GREAT WORK! Especially running totals (sum(salary) over (order by salary)) and navigation via LAG / LEAD. Thank you in advance.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Version: 3.0 Initial [ 10301 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => In Progress [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0 Beta 2 [ 10586 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0 RC 1 [ 10584 ]

Fix Version: 3.0 Beta 2 [ 10586 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: In Progress [ 3 ] => Open [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0.0 [ 10048 ]

Fix Version: 3.0 RC 1 [ 10584 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I suggest this ticket to be either rewritten or closed in favor of the new one, the current description does not match the test case.

Look at the plan:

Select Expression
-> Filter
-> Nested Loop Join (inner)
-> Aggregate
-> Table "EMPLOYEE" as "EA EX" Full Scan
-> Table "EMPLOYEE" as "E" Full Scan

Obviously, the aggregated derived table is composed and accessed just once. However, the join condition is evaluated for the every row inside the inner stream E. And the "+0*GEN_ID" part (as well as any other expression) is evaluated as many times too, hence the visible effect.

For the first test case, result of the singleton invariant subquery is cached after its first evaluation. Caching/materializing a scalar value or a result set are completely different things, they must not be compared with each other.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

From what time FB start to do somewhat like 'cache' (or 'remember') results of aggregating ?
I just tried 2.5.0 (yes, release of october-2010) and got:

alter sequence emp_no_gen restart with 0;

select e.emp_no, e.salary, gen_id(emp_no_gen,0) g
from employee e
join
(
select avg(ex.salary + 0 * g ) avgval
from ( select salary, gen_id(emp_no_gen,1) g from employee ) ex ----------- CHANGED init. query here: moved gen_id() inside DT!
) ea on e.salary > ea.avgval
;

Output:

EMP_NO SALARY G
======= ===================== =====================
110 6000000.00 42
118 7480000.00 42
121 99000000.00 42

Trace:

PLAN JOIN (EA EX EMPLOYEE NATURAL, E NATURAL)
3 records fetched
0 ms, 2 read(s), 223 fetch(es), 42 mark(s)

Table Natural Index
****************************************************
EMPLOYEE 84

So, perhaps... no problem at all ?.. But I'm sure that at least on 2.5.x I did see many times inefficient statistics per tables...

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

> From what time FB start to do somewhat like 'cache' (or 'remember') results of aggregating ?

It never did. But the first stream of the nested loop join is always accessed once.

> So, perhaps... no problem at all ?

The problem (stated in the ticket title) exists and I know what exactly you had in mind. But this ticket is plain wrong, maybe just a badly though sample. I will create a proper one (or maybe change this one) a bit later.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Fix Version: 3.0 RC2 [ 10048 ] =>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants