Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial Indexes [CORE3384] #3750

Closed
firebird-automations opened this issue Mar 14, 2011 · 20 comments
Closed

Partial Indexes [CORE3384] #3750

firebird-automations opened this issue Mar 14, 2011 · 20 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Vladimir Arkhipov (arkinform)

Votes: 4

Example:

CREATE UNIQUE INDEX orders_number_idx ON orders (number)
WHERE number is not null

Description from http://www.postgresql.org/docs/8.0/static/indexes-partial.html

"A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries for only those table rows that satisfy the predicate."

"A major motivation for partial indexes is to avoid indexing common values. Since a query searching for a common value (one that accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point in keeping those rows in the index at all. This reduces the size of the index, which will speed up queries that do use the index. It will also speed up many table update operations because the index does not need to be updated in all cases."

Very very useful !!!

@firebird-automations
Copy link
Collaborator Author

Modified by: Vladimir Arkhipov (arkinform)

description: Example:

CREATE UNIQUE INDEX orders_number_idx ON orders (number)
WHERE number is not null

Description from http://www.postgresql.org/docs/8.0/static/indexes-partial.html:

"A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries for only those table rows that satisfy the predicate."

"A major motivation for partial indexes is to avoid indexing common values. Since a query searching for a common value (one that accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point in keeping those rows in the index at all. This reduces the size of the index, which will speed up queries that do use the index. It will also speed up many table update operations because the index does not need to be updated in all cases."

Very very useful !!!

=>

Example:

CREATE UNIQUE INDEX orders_number_idx ON orders (number)
WHERE number is not null

Description from http://www.postgresql.org/docs/8.0/static/indexes-partial.html

"A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries for only those table rows that satisfy the predicate."

"A major motivation for partial indexes is to avoid indexing common values. Since a query searching for a common value (one that accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point in keeping those rows in the index at all. This reduces the size of the index, which will speed up queries that do use the index. It will also speed up many table update operations because the index does not need to be updated in all cases."

Very very useful !!!

@firebird-automations
Copy link
Collaborator Author

Commented by: @ibaseru

Have doubts about this feature since it was implemented in FoxPro.
I understand that it will be nice for the former FoxPro users, but...

Less space used by index - yes and good. Speedup queries? No. The amount of index pages being read will not be much less than in ordinary index. Histograms are better, mostly knowing that data may change, and such partial index must be rebuilt with new condition to be used in new conditions.

Well, maybe I'm wrong, but I'd like to see more theoretical examples (close to practics) with benefits of indices like this. Examples from PostgreSQL looks too childish.

@firebird-automations
Copy link
Collaborator Author

Commented by: Ann Harrison (awharrison)

Firebird differs from Postgres in a couple of ways that affect this
request. One is that Firebird indexes use prefix compression, so
a duplicate index entry has a data length of zero, so common values
are very compact.

The other is that Firebird does not keep multiple index entries for
the same value in the same record. In Postgres, each version of
a record has an entry in every index - so if you modify a non-indexed
field in a record with three indexes defined, you get three new index
entries for that record, each with the same values for the key as
the entries for the older record version. In Firebird, updating a non-
indexed field doesn't affect any indexes. Updating one of three
indexed fields affects at most one index.- if the new value already
existed in an older version, it won't be re-entered.

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Rothe (drothe)

Real World:
We have a Field that ist mostly Null, but very selective if filled. A lot of queries filter the records by applying the where-condition "Field is Null". A minor number queries locate records by using the Index on this Field.

In query-plans for the the "Field is Null" case, the index of this Field is choosen (because of a higher selectivity) before other "better indexes". We work around this problem by trying to remember to disable the index with "Field+0 is Null" in these cases. This speeds up these queries by some order of magnitude. But from time to time somebody forgets this "trick" - so I would really like to fix this problem by excluding some Values from the Index.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Dirk, in this case a value distribution histogram is your friend, as with its existence the optimizer will not use an index for FIELD IS NULL while still be using an index for equality comparisons.

@firebird-automations
Copy link
Collaborator Author

Commented by: Vladimir Arkhipov (arkinform)

I see 2 applications of partial indexes:

1. Restrict unique values for not null fields. For example, number of case history for patient in medicine. The number may be not exist yet. But not null number must be unique. I don't know how to do it "gracefully" without partial indexes.

CREATE UNIQUE INDEX patients_histnum_idx ON patients (histnum)
WHERE histnum is not null

2. Large tables with fields where the most of values is null. Patient attribute, for example, allergy (or something like that) where 3-5% of values filled. We never need to find patients without allergy, but we often need to know about the presence of allergy for patients. I understand that Firebird store indexes very compact, but it is still unnecessary time and disk space.

P.S. What about speed of garbage collection for indexes where 99% of values is null? Is it problem or no?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

As for your case (1), I'm not sure I get it. Firebird allows you to have any number of NULLs inside a uniquely constrained column. What is the problem?

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Component: Engine [ 10000 ]

Version: 3.0 Alpha 1 [ 10331 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: Vladimir Arkhipov (arkinform)

Unfortunately I did not know about this feature. Thank you :)

@firebird-automations
Copy link
Collaborator Author

Commented by: Vladimir Arkhipov (arkinform)

What about case (2) and speed of garbage collection? Does this affect garbage collection in principle?

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Rothe (drothe)

Are "value distribution histograms" an expected 3.0 feature? I haven't that found that much information by googling for it.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Vladimir: yes, these other points seem to be valid, although the cost of garbage collection in duplicated chains decreased since ODS11.
Dirk: yes, it is scheduled for v3.0, see CORE1686.

@firebird-automations
Copy link
Collaborator Author

Commented by: Dirk Rothe (drothe)

Dmitry: Very nice, and a big thank you to all involved!

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

My reading of this case is that this issue can be closed, since there is no new action/change which is required. Correct?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I'm not sure. Some benefits of this feature still exist, the question is primarily whether they justify the development cost. But regardless, the priority should be decreased to "minor".

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

If you still need real world sample:

table REPORT with column STATE (something like Firebird Tracker)

values of state:
closed - common value e.g. 1000000 records
open - e.g. 200 records
reported - e.g. 800 records

and you then can create partial index:

CREATE INDEX IXAP_REPORTS_STATE ON REPORT (STATE)
WHERE STATE<>'closed'

this index is small and is used by common queries

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

If this feature will be done
and it can be then used in referential constraints, then benefit will be much bigger

Example

-----------------------------------------------------
instead of tables NOTES_CUSTOMER, NOTES_PERSON, NOTES_THING ...

-----------------------------------------------------
or one table with many ids

CRATE TABLE NOTES
(
ID BIGINT NOT NULL PRIMARY KEY

, ID_CUSTOMER BIGINT NOT NULL
, ID_PERSON BIGINT NOT NULL
, ID_THING BIGINT NOT NULL
... other refids ...

,REF_TYPE INTEGER NOT NULL
DESCRIPTION VARCHAR(200)
);

-----------------------------------------------------
we can do this:
CRATE TABLE NOTES
(
ID BIGINT NOT NULL PRIMARY KEY
, ID_REF BIGINT NOT NULL
,REF_TYPE INTEGER NOT NULL
DESCRIPTION VARCHAR(200)
);

ALTER TABLE NOTES ADD CONSTRAINT FK_NOTES__CUSTOMER FOREIGN KEY(ID_REF) WHERE REF_TYPE=1 REFERENCES CUSTOMER(ID) ON DELETE NO ACTION ON UPDATE CASCADE;
ALTER TABLE NOTES ADD CONSTRAINT FK_NOTES__PERSON FOREIGN KEY(ID_REF) WHERE REF_TYPE=2 REFERENCES PERSON(ID) ON DELETE NO ACTION ON UPDATE CASCADE;
ALTER TABLE NOTES ADD CONSTRAINT FK_NOTES__THING FOREIGN KEY(ID_REF) WHERE REF_TYPE=3 REFERENCES THING(ID) ON DELETE NO ACTION ON UPDATE CASCADE;

This is maybe not ideal design - but sometimes this will simplify many problems

@justin-fyfe
Copy link

+1 if this feature can be done. Not for performance but for unique enforcement. For example, consider the table:

CREATE TABLE FOO_TBL (
     ID    UUID NOT NULL,
     USER_ID UUID NOT NULL,
     UNIQUE_VALUE VARCHAR(256) NOT NULL, -- THE UNIQUE VALUE
     START_TS TIMESTAMP NOT NULL, -- THE TIME THE VALUE BECAME ACTIVE
     STOP_TS TIMESTAMP, -- THE TIME THAT THE VALUE WAS NO LONGER ACTIVE
     CONSTRAINT PK_FOO PRIMARY KEY (ID),
     CONSTRAINT FK_FOO_USER FOREIGN KEY (USER_ID) REFERENCES USER_TBL(USER_ID)
);

Basically a table which needs to track changes over time of a value (can be anything like an medical record number, etc.) but basically the business requirement is: value must be unique if it is active.

In PostgreSQL, Oracle, and even SQLite this is done via:

CREATE UNIQUE INDEX FOO_UQ_IDX ON FOO_TBL(UNIQUE_VALUE) WHERE (STOP_TS IS NULL);

i.e. Enforce the unique constraint on UNIQUE_VALUE if it has not been indicated as no longer valid.

If there is a way to do this other than indexes in Firebird I'd love an example - currently we've just removed unique checking on this value when users use Firebird, it would be nice if our Firebird provider could behave the same way as our Postgres, SQLite and Oracle providers :)

@hvlad
Copy link
Member

hvlad commented Jul 27, 2021

You may use expression index right now:

CREATE UNIQUE INDEX FOO_UQ_IDX ON FOO_TBL COMPUTED BY (CASE WHEN STOP_TS IS NULL THEN UNIQUE_VALUE END) 

@dyemanov dyemanov self-assigned this Aug 12, 2022
@dyemanov dyemanov changed the title Partial Indexes (not partitioning) [CORE3384] Partial Indexes [CORE3384] Oct 19, 2022
@pavel-zotov
Copy link

QA: see tests for #7257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants