Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count ( DISTINCT ... ) is too slow [CORE214] #542

Closed
firebird-automations opened this issue Jan 17, 2003 · 7 comments
Closed

Count ( DISTINCT ... ) is too slow [CORE214] #542

firebird-automations opened this issue Jan 17, 2003 · 7 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: skopalik (skopalik)

SFID: 669770#⁠
Submitted By: skopalik

select distinct is about 10x to 100x faster then
select count(distinct ).
Tested version: FB 1.0.2.908 and FB 1.5.0.1761

test case:
create table a ( a int);

create procedure aset( cnt int) as
declare variable i int;
begin
i=0;
while(i<cnt)do begin
insert into a VALUES (:cnt/1000)
i=i+1;
end
end

execute procedure aset (10000)

PC: Notebokk Celeron 500/ 128 MB RAM
There are result captured from wisql:

select distinct * from a

PLAN SORT ((A NATURAL))

      A 

===========

     10 

Current memory = 0
Delta memory = 0
Max memory = 0
Elapsed time= 0.13 sec
Buffers = 75
Reads = 0
Writes 0
Fetches = 20072

select count (distinct a) from a

PLAN (A NATURAL)

  COUNT 

===========

      1 

Current memory = 0
Delta memory = 0
Max memory = 0
Elapsed time= 10.86 sec
Buffers = 75
Reads = 0
Writes 0
Fetches = 20071

Commits: 329fe9d 14ba585 FirebirdSQL/fbt-repository@3d35567 FirebirdSQL/fbt-repository@90e412b

====== Test Details ======

Multiple measures show that ratio in 2.5.5 and 3.0 is about 1.05 - 1.10 (this is how query "select count(*) from ( select distinct id from ... )" is SLOWER than "select count(distinct id) from ..."). In rare cases, when number of unique values is 100'000, this ratio was ~1.5.
Threshold was selected to be 1.8.

@firebird-automations
Copy link
Collaborator Author

Commented by: Alice F. Bird (firebirds)

Date: 2005-10-14 18:27
Sender: fuerchau
Logged In: YES
user_id=1300926

The time of "count(distinct xxx)" depends on the number of
different values in the database.
So i have now tested with a table of 30000 records.

If each record consists only 1 value, the time is ca. 10
seconds.
If i have perhaps 20 different values, the time is ca. 6
seconds.
if i have 4000 different values the result is immediatly
avalable !

Firebird 1.5.2.4731 and tested with IBExpert.

Because my application must work with millions of records
and i need this function, i would by happy to get a solution.

@firebird-automations
Copy link
Collaborator Author

Commented by: Alice F. Bird (firebirds)

Date: 2005-10-14 18:10
Sender: fuerchau
Logged In: YES
user_id=1300926

The time of "count(distinct xxx)" depends on the number of
different values in the database.
So i have now tested with a table of 30000 records.

If each record consists only 1 value, the time is ca. 10
seconds.
If i have perhaps 20 different values, the time is ca. 6
seconds.
if i have 4000 different values the result is immediatly
avalable !

Firebird 1.5.2.4731 and tested with IBExpert.

Because my application must work with millions of records
and i need this function, i would by happy to get a solution.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Workflow: jira [ 10238 ] => Firebird [ 14466 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 3.0 Beta 2 [ 10586 ]

Fix Version: 2.5.5 [ 10670 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Done successfully

Test Details: Multiple measures show that ratio in 2.5.5 and 3.0 is about 1.05 - 1.10 (this is how query "select count(*) from ( select distinct id from ... )" is SLOWER than "select count(distinct id) from ..."). In rare cases, when number of unique values is 100'000, this ratio was ~1.5.
Threshold was selected to be 1.8.

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants