Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIST DISTINCT for BLOBs [CORE3253] #3621

Open
firebird-automations opened this issue Nov 20, 2010 · 4 comments
Open

LIST DISTINCT for BLOBs [CORE3253] #3621

firebird-automations opened this issue Nov 20, 2010 · 4 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @paulvink

Duplicates CORE859

Currently, the DISTINCT specifier of the LIST function has no effect if the argument is a BLOB expression.
This is probably for the same reason as with SELECT, but a fix would be appreciated anyway.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I do agree that it works wrongly. This is because the sorting operation (which is internally involved in DISTINCT and GROUP BY) handles BLOB IDs instead of the blob contents. But we have a problem here. Sorting blobs by their contents is somewhat horrible from the performance POV, and currently practically impossible considering out sorting implementation which uses fixed length keys and is also limited by 64K. But we cannot reject this either, as it would break a lot of applications (badly written, but who cares). See CORE859 which had to be rolled back because of queries like "select distinct *" where the select list includes blobs.

I believe some partial solutions could be possible, but so far I don't see any generic solution.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue duplicates CORE859 [ CORE859 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Commented by Paul Vinkenoog:

OK, I've just documented DISTINCT (within LIST) as having no effect for BLOBs.

Still... MAX, MIN, MAXVALUE, MINVALUE, etc. do work on BLOBs, and consider the full length where necessary.

Performance-wise, I guess that LIST(DISTINCT) wouldn't be more costly than these:
- in most cases you can find the difference in the first few characters;
- sometimes it takes longer;
- sometimes you need to compare entire BLOBs (if they are equal, or if the only difference is at the tail end).

Please also read my new comment for CORE3252, as this is a different issue.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

All the functions you mention don't require sorting, while any DISTINCT operation does. And, in the current implementation, it's impossible to sort on varying length values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant