Issue Details (XML | Word | Printable)

Key: CORE-1345
Type: Bug Bug
Status: Reopened Reopened
Priority: Major Major
Assignee: Unassigned
Reporter: Rene Pijnacker
Votes: 1
Watchers: 6
Operations

If you were logged in you would be able to see more operations.
Firebird Core

DISTINCT and UNION no longer work for BLOB fields

Created: 05/Jul/07 01:35 AM   Updated: 16/Mar/10 07:56 PM
Component/s: None
Affects Version/s: 2.1 Beta 1
Fix Version/s: None

Environment: Windows XP


 Description  « Hide
I tried to run our application through BDE using firebird 2.1 beta
on opening of some screen I get error message No support for blob sorting.
This works on all other versions of firebird

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Dmitry Yemanov added a comment - 05/Jul/07 02:18 AM
This is an intentional change. Blobs cannot be sorted by, including the GROUP BY and DISTINCT operations. Prior versions allowed that but produced unexpected results as in fact BLOB IDs were sorted.

Bruno Depero added a comment - 03/Sep/07 05:58 AM
I tried to restore few databases from 2.0.2 to 2.1 but gbak finished before completing due to blob field that with 2.1.x cannot be sorted.
When 2.1.x will be released I wish to migrate from 2.0.x to 2.1.
7 databases, very large ddl: there is a fast way to find what items do a sort on a blob field (view, stored, triggers, ...) ?

Thank you in advance !
Bruno

Nickolay Samofatov added a comment - 27/Nov/07 07:44 AM
We have encountered the issue with with pretty much all commercial Firebird applications that we tried to port to 2.1, and the needed fixes for applications are sometimes non-trivial or even impossible, and very difficult to explain to the customer. The issue had to be ranked as critical and the release blocker for us internally. CORE-1250 is another report on the same issue.

The access frameworks routinely generate queries like "select distinct * from <sometable>" and

"select ...many fields... from <sometable>
union
select ...many fields... from <sometable>"

The previous engine used to treat two non-null BLOBs as DISTINCT if they came from different tables or the different fields of the same table, which was correct behavior (meaning it was consistent and acceptable).

Changing DISTINCT and UNION behavior with BLOBs breaks a big percentage of applications for no logical reason.

We may forbid using BLOBs in ORDER BY clause, because it produces incorrect results, but breaking DISTINCT and UNION cannot be justified.

Dmitry Yemanov added a comment - 27/Nov/07 08:21 AM
create table blb1 (id int, col blob);
create table blb2 (id int, col blob);
commit;

insert into blb1 values (1, 'asd');
insert into blb1 values (1, 'asd');
insert into blb2 values (1, 'asd');
commit;

select distinct * from blb1;
-- 2 rows returned

select * from blb1
union
select * from blb2
-- 3 rows returned

Do you call it a correct result?

Nickolay Samofatov added a comment - 27/Nov/07 09:57 AM
And the result is:

>isql -i a_test.sql

          ID COL
============ =================

           1 80:0
           1 80:2
           1 81:0

This is not too bad result, IMO. Much better than returning a error and breaking existing programs.

Not many programs rely on ORDER BY for blob fields, but using DISTINCT and UNION with blobs is the standard practice, you like it or not.

Consider this example:

create database 'aa.fdb' user 'sysdba' password 'masterkey';

create table products (
  id integer not null primary key,
  dsc blob
);

create table orders (
  id integer not null primary key,
  product_id integer references products,
  qty integer
);

insert into products values(1, 'Red Database');
insert into orders values(1, 1, 1);

select distinct products.id, products.dsc
from products, orders
where products.id = orders.product_id;

How can you justify to the user that this query which has produced expected and consistent results before has to be fixed?

Old style and analytical queries very commonly join a whole bunch of tables (leveling many-many relationships), filter it in a WHERE clause and than use DISTINCT to figure out the rows that they needed. The change breaks this perfectly legal approach.

Adriano dos Santos Fernandes added a comment - 27/Nov/07 10:11 AM
As we discussed in fb-devel, others DBMS doesn't allow this query either.

But, we can make it work based on primary key/unique, i.e., if any primary key/unique of the same table of a blob is present (as in your example), this record will be always distinct.

Alexander Peshkov added a comment - 27/Nov/07 10:25 AM
Nickolay, your sample with:
>isql -i a_test.sql
shows that we have badly designed client tool (our isql), not that such queries are correct.

I agree with Adriano's suggestion - let's use PK (or any other unique constraint if present), this will let most of old queries run. But reanimating old _bugs_ (like one mentioned in your sample) for backward-compatibility reasons is not a good way to go.

Nickolay Samofatov added a comment - 27/Nov/07 11:37 PM
Guys, are we making the best engine in the world or a patchy mess?

1. Absence/presence/active status of constraints or indexes should not change SELECT statement validity or results (remember DDL dependencies and other stuff?)

2. Compatibility with existing installed base is hugely important, and if we are going to inflict migration costs, that has to be justified with something other than calling old predictable and consistent behavior "bugs". Either change the functionality properly or don't touch it. Breaking applications for the sake of breakage is not the way forward.

3. If we want to attack the issue seriously so it compares BLOB values in generic scenario correctly we have to implement SORT callbacks or something similar to compare BLOB values and a little bunch of optimizations to avoid performance regressions with old queries.

4. Regarding the other servers not supporting the feature, please come on, don't be silly:

mysql> create table a (b longtext);
Query OK, 0 rows affected (0.30 sec)

mysql> insert into a values('test');
Query OK, 1 row affected (0.44 sec)

mysql> insert into a values('test');
Query OK, 1 row affected (0.44 sec)

mysql> select distinct * from a;
+------+
| b |
+------+
| test |
+------+
1 row in set (0.20 sec)

========

postgres=# create table aa (b text);
CREATE TABLE
postgres=# insert into aa values('test');
INSERT 0 1
postgres=# insert into aa values('test');
INSERT 0 1
postgres=# select distinct * from aa;
  b
------
 test
(1 row)

Alexander Peshkov added a comment - 28/Nov/07 04:12 AM
If we can provide postgres-like behavior, I see no problems. But certainly this will affect performance - at least by reading of blob pages from disk. BTW, presence of active unique constraint my become a serious optimization here.

Philip Williams added a comment - 16/Mar/10 07:56 PM
Blobs aren't always stored on separate pages, if room exists to fit them on the main record page, right? I, for one, often use blobs when I'm not sure a large varchar will handle outlier cases, despite knowing that 99% of the time, the value will be very small and even a small varchar would be sufficient. Memo fields, for example, need to be large for rare cases, but most of the time users won't take the time to even write complete sentences, let alone 4kb of text. So there *may* be a performance cost in looking at the blob value, but you shouldn't assume it'll be catastrophic.

And for those few users who do want to run a distinct on blobs because they're trying to eliminate duplicate photos in their collections (!), I expect they know who they are and will be okay with exchanging speed for features.