Issue Details (XML | Word | Printable)

Key: CORE-3013
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Dmitry Yemanov
Reporter: Pavel Zotov
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Firebird Core

A data source for JOIN that is result of aggregating must be used only one time

Created: 24/May/10 02:20 PM   Updated: 04/Mar/16 07:59 PM
Component/s: Engine
Affects Version/s: None
Fix Version/s: None

Environment: all of FB versions


 Description  « Hide
Consider two variants of query against table EMPLOYEE of "standard" database that is supplied with any FB.

Both variant results the rows with EMP_NO and such SALARY that are greater than average salary of all employees.
Also we call generator emp_no_gen to determine the quontity of operations that occures during this action:
1) via NON-correlated subquery:
select e.emp_no, e.salary, gen_id(emp_no_gen,0) g
from employee e
where (e.salary > (select avg(ex.salary)+0*gen_id(emp_no_gen,1) avgval from employee ex))

Sequential running of this variant shows that EMP_NO_GEN increases by 1 at each time. So we can be sure that aggregate part (inside WHERE) is evaluated only once.

2) via JOIN:
select e.emp_no, e.salary, gen_id(emp_no_gen,0) g
from employee e
join (select avg(ex.salary)+0*gen_id(emp_no_gen,1) avgval from employee ex) ea on e.salary > ea.avgval

In this case the EMP_NO_GEN increases by 42 each time (i.e. by step that is equal select count(*) from emp).

I think it will be useful to make Firebird "understand" that such part as 2nd in this join must be evaluated only once, like in variant 1.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Adriano dos Santos Fernandes added a comment - 24/May/10 02:33 PM
You shouldn't use GEN_ID as such and expect consistent results.

For the case you want, FB 3 has window functions, see it in my blog http://asfernandes.blogspot.com.

For now, one way to do that type of thing is with selectable procedures.

Pavel Zotov added a comment - 24/May/10 02:56 PM
Adriano,

I'm talking only about FB optimizer and how to make it much smarter.
GEN_ID has been used in these samples only for illustration that unnecessary work is done in case of JOIN.

I know that some heuristic rules are placed inside opt.cpp and propose that these rules can be improved.

Pavel Zotov added a comment - 24/May/10 03:09 PM
PS. OFFTOP. Just finished read your blog about windowed function. This is THE GREAT WORK! Especially running totals (sum(salary) over (order by salary)) and navigation via LAG / LEAD. Thank you in advance.

Dmitry Yemanov added a comment - 02/Feb/16 10:46 AM
I suggest this ticket to be either rewritten or closed in favor of the new one, the current description does not match the test case.

Look at the plan:

Select Expression
    -> Filter
        -> Nested Loop Join (inner)
            -> Aggregate
                -> Table "EMPLOYEE" as "EA EX" Full Scan
            -> Table "EMPLOYEE" as "E" Full Scan

Obviously, the aggregated derived table is composed and accessed just once. However, the join condition is evaluated for the every row inside the inner stream E. And the "+0*GEN_ID" part (as well as any other expression) is evaluated as many times too, hence the visible effect.

For the first test case, result of the singleton invariant subquery is cached after its first evaluation. Caching/materializing a scalar value or a result set are completely different things, they must not be compared with each other.

Pavel Zotov added a comment - 02/Feb/16 02:46 PM
From what time FB start to do somewhat like 'cache' (or 'remember') results of aggregating ?
I just tried 2.5.0 (yes, release of october-2010) and got:
===
alter sequence emp_no_gen restart with 0;

select e.emp_no, e.salary, gen_id(emp_no_gen,0) g
from employee e
join
    (
        select avg(ex.salary + 0 * g ) avgval
        from ( select salary, gen_id(emp_no_gen,1) g from employee ) ex ----------- CHANGED init. query here: moved gen_id() inside DT!
    ) ea on e.salary > ea.avgval
;
===
Output:

 EMP_NO SALARY G
======= ===================== =====================
    110 6000000.00 42
    118 7480000.00 42
    121 99000000.00 42

Trace:

PLAN JOIN (EA EX EMPLOYEE NATURAL, E NATURAL)
3 records fetched
      0 ms, 2 read(s), 223 fetch(es), 42 mark(s)

Table Natural Index
****************************************************
EMPLOYEE 84

So, perhaps... no problem at all ?.. But I'm sure that at least on 2.5.x I did see many times inefficient statistics per tables...

Dmitry Yemanov added a comment - 02/Feb/16 03:06 PM
> From what time FB start to do somewhat like 'cache' (or 'remember') results of aggregating ?

It never did. But the first stream of the nested loop join is always accessed once.

> So, perhaps... no problem at all ?

The problem (stated in the ticket title) exists and I know what exactly you had in mind. But this ticket is plain wrong, maybe just a badly though sample. I will create a proper one (or maybe change this one) a bit later.