Issue Details (XML | Word | Printable)

Key: CORE-2201
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Dmitry Yemanov
Reporter: Martijn Tonies
Votes: 2
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Ability to specify length for index segments to be indexed on char-based columns

Created: 19/Nov/08 06:05 AM   Updated: 15/Nov/17 04:29 PM
Component/s: Engine
Affects Version/s: None
Fix Version/s: None


 Description  « Hide
When you have the ability to specify the length to be indexed for index segments, you can index very large char-based columns, yet not completely. Nevertheless, the index would help speeding things up when there's plenty of searches in the WHERE clause for smaller strings.

Example syntax (taken from MySQL);
CREATE [UNIQUE] INDEX index_name
    [index_type]
    ON tbl_name (index_col_name,...)
    [index_type]

index_col_name:
    col_name [(length)] [ASC | DESC]

Obviously, when you specify the length, it cannot be a unique index.

The difference between this index and an expression based index (eg: SUBSTRING(1, 10) to index the first 10 chars), is that with this index you don't have to use the same expression in the WHERE clause. For large char-based columns that cannot be indexed (eg: 10000 chars), this can help speeding queries up because it can actually use the data that exists in the table (and index).

RDB$INDEX_SEGMENTS should have the specified length for the segment and NULL otherwise? :-)

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Sergey Mereutsa added a comment - 19/Nov/08 06:19 AM
Why just not use index by expression? You can create index by substring() function.

Dmitry Yemanov added a comment - 19/Nov/08 06:20 AM
I believe that such an index cannot be used for ordering, so it seems making no sense to specify ASC or DESC there...

Martijn Tonies added a comment - 19/Nov/08 06:27 AM - edited
Sergey, an expression based index will only be used if you use the same expression in the WHERE clause, as explained in the request.

Dmitry, that might be true, yes (syntax is just an example). But if no other indices exist, can't the index be used as the basis for sorting? Meaning that it would start out by arranging data in the temporary sort table according to the index and then sort according to table data? This might not make sense at all to you though ;-)

The reason for this functionality is:
- avoid having to create larger index keys (current largest is 1/4th of page size)
- ability to still index large char-based columns (like in, for example, Oracle)

Dmitry Yemanov added a comment - 19/Nov/08 06:44 AM
Should we throw an error if this kind of index is created for a non-string column? Or would you prefer other datatypes to be converted to strings and at the same time truncated up to the specified length? This could be useful for e.g. text blobs, but if they're of UTF8, then the truncation could fail due to a length limit crossing the character boundary...

Martijn Tonies added a comment - 19/Nov/08 06:59 AM
I think an error should be thrown in the case of non-char columns, yet, text blobs should be handled, I think. The truncation should be char based, not byte based?

So you think this could be a useful one?

Adriano dos Santos Fernandes added a comment - 19/Nov/08 07:07 AM
> The truncation should be char based

Agreed.

Dmitry Yemanov added a comment - 19/Nov/08 07:08 AM
You have almost convinced me :-) Actually, I had the same idea sometime in the past. But let's see what other guys think about it.

Vlad Khorsun added a comment - 19/Nov/08 07:47 AM
Looks good.

But in fact there is at least two feature requests - ability to create such index and ability for optimizer to use it ;)

As for syntax - i think <length> can be specified for variable length data types only (currently [var]char and blob with subtype TEXT) and must take into account number of chars of course.

Sean Leyne added a comment - 19/Nov/08 01:13 PM
Martijn et al,

1 - Martijn, it would have better for this case to not have been opened, but a post made to the devel list to get this type of discussion out of the way first.

2 - It strikes me to be a bit of a "solution looking for a problem"

3 - what would happen in the follow situation?

- A VarChar( 10000) column for which an index is defined for only the first 1000 chars,
- A row has text which reads "...Sean is the greatest lover in the world..." but the index cuts this text to read: "...Sean is the greate".
- a query is created "WHERE Column contains "greatest"

4 - for large text fields, isn't the real problem that users are looking for a full text search functionality?

Although I would suggest that a 'simple' "word index" (an index where the column is tokenized [based on user defined chars] into words and each word separately indexed) would also be appropriate, to allow for optimized containing searches. This would meet the needs in a large number of cases.


Martijn Tonies added a comment - 20/Nov/08 05:33 AM
Sean,

1) not subscribed currently, but isn't this what other users would do as well?

2) not sure what you mean by this?

3) Since when does CONTAINING use an index?

4) that defeats a WHERE column = '... literal value ...'

Karol Bieniaszewski added a comment - 11/Aug/10 09:02 AM - edited
I do not see difference between this and expression index
How this can optimize query?

is here any difference with this request and expression index use?

select
X.AAA, X.BBB
from
XXX X
WEHRE
SUBSTRING(X.AAA, 1, 100)='SOME_TEXT'
AND X.AAA='SOME_TEXT_LARGER_THEN_100..'

>>The difference between this index and an expression based index (eg: SUBSTRING(1, 10) to index the first 10 chars),
>>is that with this index you don't have to use the same expression in the WHERE clause

I must ask - you can not do something in this situation?
This block design?
Or you only do not need to put "SUBSTRING(X.AAA, 1, 100)='SOME_TEXT' " in where clause?

Dmitry Yemanov added a comment - 11/Aug/10 09:18 AM
The key point is that you're still writing the "unlimited" expression, e.g. WHERE AAA = 'SOME_TEXT' and it still uses the index even if it has the length limit specified.

Karol Bieniaszewski added a comment - 13/Aug/10 06:06 AM
>>The key point is that you're still writing the "unlimited" expression, e.g. WHERE AAA = 'SOME_TEXT' and it still uses the index even if it has the length limit specified.

du you mean
expression index on Substring(X.AAA, 1, 5)
and where clause
where substring(X.AAA, 1, 5)='1234' - this should use index
where substring(X.AAA, 1, 5)='123456' and this should not use index when length is specified?

if yes then what with this situation?
'123%4' - lenght is 5 but % exists ;-)

and i suppose user know length of query string '123456' in application before they put it in where clause?
In application they should only test
if length(value)>5 then
  result:= restult + 'X.AAA=' + value else
  result:= result + 'substring(X.AAA, 1, 5)=' + value





 

Dmitry Yemanov added a comment - 13/Aug/10 06:42 AM
I don't mean expressions indices or any function like substring at all. This ticket has nothing to do with them. Instead, this is a regular index but the one which truncates the keys up to the specified limit when you insert the value into the index or search for the value using the index. Of course, this means that the index lookup may return false matches (keys "abc[def]" and "abc[ghj]" are considered the same), but the full-length comparison is internally performed once the record is fetched anyway, so it's not an issue in practice.

create table tab (col varchar(10));
create index itab on tab (col) length 3;

insert into tab (col) values ('abcdef'); -- key: 'abc'
insert into tab (col) values ('abcqwe'); -- key: 'abc'
insert into tab (col) values ('defqwe'); -- key: 'def'
insert into tab (col) values ('qwerty'); -- key: 'qwe'

select * from tab where col = 'qqq' -- 0 rows returned, 0 indexed reads
select * from tab where col = 'abc' -- 0 rows returned, 2 indexed reads
select * from tab where col = 'qwe' -- 0 row returned, 1 indexed read
select * from tab where col = 'abcdef' -- 1 row returned, 2 indexed reads
select * from tab where col = 'abcqwe' -- 1 row returned, 2 indexed reads
select * from tab where col = 'defqwe' -- 1 row returned, 1 indexed read
etc

Karol Bieniaszewski added a comment - 13/Aug/10 07:26 AM - edited
>>I don't mean expressions indices or any function like substring at all. This ticket has nothing to do with them
I see that this ticket has much in common with expression index
and i see only 1 index read benefits replacing expression index by length in your sample ;-)
index read count i get from FlameRobin

create table tab (col varchar(10));
create index itab on tab COMPUTED BY(SUBSTRING(col FROM 1 FOR 3));

insert into tab (col) values ('abcdef'); -- key: 'abc'
insert into tab (col) values ('abcqwe'); -- key: 'abc'
insert into tab (col) values ('defqwe'); -- key: 'def'
insert into tab (col) values ('qwerty'); -- key: 'qwe'


select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('qqq' FROM 1 FOR 3) and COL='qqq' -- 0 rows returned, 1{0} indexed reads
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('abc' FROM 1 FOR 3) and COL='abc' -- 0 rows returned, 3{2} indexed reads
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('qwe' FROM 1 FOR 3) and COL='qwe' -- 0 row returned, 2{1} indexed read
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('abcdef' FROM 1 FOR 3) and COL='abcdef' -- 1 row returned, 3{2} indexed reads
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('abcqwe' FROM 1 FOR 3) and COL='abcqwe' -- 1 row returned, 3{2} indexed reads
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('defqwe' FROM 1 FOR 3) and COL='defqwe' -- 1 row returned, 2{1} indexed read

of course - query look worse ;-)

Dmitry Yemanov added a comment - 13/Aug/10 08:09 AM
Isn't your "query look worse" point a good enough reason? ;-) Why adding a substring call to the every query if it's more elegant to use the natural form of the query?

Karol Bieniaszewski added a comment - 13/Aug/10 08:49 AM - edited
Of course :)

But i saw above discussion that expression index can not be used here for speed up query for strings 10000 length
like MySQL do and because of that this should be implemented
what is not true ;-)

Vrinda Nayak added a comment - 15/Nov/17 04:29 PM
Hello,
We're using Firebird 3.0.2 for the project and the solution for issue reported above is what I'm looking for in my project.

The 3 table columns which are being used for creating a unique index are of type VARCHAR(255) each. But I cannot proceed as I get the error

"Resource Exception. unsuccessful metadata update; key size exceeds implementation restriction for index [SQLState:42000, ISC error code:335544351]"

Our requirement is to use first 64 chars of each of these 3 columns for indexing;
something like : "alter table tablename add constraint indexname unique (col1(64), col2(64), col3(64));"
which allows to specify length of columns for indexing in MYSQL.

Looking from the comments trail, it is reported long back; is there any fix or workaround for this (without having to use substring etc.)?

Can someone please advise/help?