Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to specify length for index segments to be indexed on char-based columns [CORE2201] #2629

Open
firebird-automations opened this issue Nov 19, 2008 · 19 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Martijn Tonies (martijntonies)

Votes: 2

When you have the ability to specify the length to be indexed for index segments, you can index very large char-based columns, yet not completely. Nevertheless, the index would help speeding things up when there's plenty of searches in the WHERE clause for smaller strings.

Example syntax (taken from MySQL);
CREATE [UNIQUE] INDEX index_name
[index_type]
ON tbl_name (index_col_name,...)
[index_type]

index_col_name:
col_name [(length)] [ASC | DESC]

Obviously, when you specify the length, it cannot be a unique index.

The difference between this index and an expression based index (eg: SUBSTRING(1, 10) to index the first 10 chars), is that with this index you don't have to use the same expression in the WHERE clause. For large char-based columns that cannot be indexed (eg: 10000 chars), this can help speeding queries up because it can actually use the data that exists in the table (and index).

RDB$INDEX_SEGMENTS should have the specified length for the segment and NULL otherwise? :-)

@firebird-automations
Copy link
Collaborator Author

Commented by: Sergey Mereutsa (green_dq)

Why just not use index by expression? You can create index by substring() function.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I believe that such an index cannot be used for ordering, so it seems making no sense to specify ASC or DESC there...

@firebird-automations
Copy link
Collaborator Author

Commented by: Martijn Tonies (martijntonies)

Sergey, an expression based index will only be used if you use the same expression in the WHERE clause, as explained in the request.

Dmitry, that might be true, yes (syntax is just an example). But if no other indices exist, can't the index be used as the basis for sorting? Meaning that it would start out by arranging data in the temporary sort table according to the index and then sort according to table data? This might not make sense at all to you though ;-)

The reason for this functionality is:
- avoid having to create larger index keys (current largest is 1/4th of page size)
- ability to still index large char-based columns (like in, for example, Oracle)

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Should we throw an error if this kind of index is created for a non-string column? Or would you prefer other datatypes to be converted to strings and at the same time truncated up to the specified length? This could be useful for e.g. text blobs, but if they're of UTF8, then the truncation could fail due to a length limit crossing the character boundary...

@firebird-automations
Copy link
Collaborator Author

Commented by: Martijn Tonies (martijntonies)

I think an error should be thrown in the case of non-char columns, yet, text blobs should be handled, I think. The truncation should be char based, not byte based?

So you think this could be a useful one?

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

> The truncation should be char based

Agreed.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

You have almost convinced me :-) Actually, I had the same idea sometime in the past. But let's see what other guys think about it.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Looks good.

But in fact there is at least two feature requests - ability to create such index and ability for optimizer to use it ;)

As for syntax - i think <length> can be specified for variable length data types only (currently [var]char and blob with subtype TEXT) and must take into account number of chars of course.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Martijn et al,

1 - Martijn, it would have better for this case to not have been opened, but a post made to the devel list to get this type of discussion out of the way first.

2 - It strikes me to be a bit of a "solution looking for a problem"

3 - what would happen in the follow situation?

- A VarChar( 10000) column for which an index is defined for only the first 1000 chars,
- A row has text which reads "...Sean is the greatest lover in the world..." but the index cuts this text to read: "...Sean is the greate".
- a query is created "WHERE Column contains "greatest"

4 - for large text fields, isn't the real problem that users are looking for a full text search functionality?

Although I would suggest that a 'simple' "word index" (an index where the column is tokenized [based on user defined chars] into words and each word separately indexed) would also be appropriate, to allow for optimized containing searches. This would meet the needs in a large number of cases.

@firebird-automations
Copy link
Collaborator Author

Commented by: Martijn Tonies (martijntonies)

Sean,

1) not subscribed currently, but isn't this what other users would do as well?

2) not sure what you mean by this?

3) Since when does CONTAINING use an index?

4) that defeats a WHERE column = '... literal value ...'

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

I do not see difference between this and expression index
How this can optimize query?

is here any difference with this request and expression index use?

select
X.AAA, X.BBB
from
XXX X
WEHRE
SUBSTRING(X.AAA, 1, 100)='SOME_TEXT'
AND X.AAA='SOME_TEXT_LARGER_THEN_100..'

>>The difference between this index and an expression based index (eg: SUBSTRING(1, 10) to index the first 10 chars),
>>is that with this index you don't have to use the same expression in the WHERE clause

I must ask - you can not do something in this situation?
This block design?
Or you only do not need to put "SUBSTRING(X.AAA, 1, 100)='SOME_TEXT' " in where clause?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

The key point is that you're still writing the "unlimited" expression, e.g. WHERE AAA = 'SOME_TEXT' and it still uses the index even if it has the length limit specified.

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>>The key point is that you're still writing the "unlimited" expression, e.g. WHERE AAA = 'SOME_TEXT' and it still uses the index even if it has the length limit specified.

du you mean
expression index on Substring(X.AAA, 1, 5)
and where clause
where substring(X.AAA, 1, 5)='1234' - this should use index
where substring(X.AAA, 1, 5)='123456' and this should not use index when length is specified?

if yes then what with this situation?
'123%4' - lenght is 5 but % exists ;-)

and i suppose user know length of query string '123456' in application before they put it in where clause?
In application they should only test
if length(value)>5 then
result:= restult + 'X.AAA=' + value else
result:= result + 'substring(X.AAA, 1, 5)=' + value

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I don't mean expressions indices or any function like substring at all. This ticket has nothing to do with them. Instead, this is a regular index but the one which truncates the keys up to the specified limit when you insert the value into the index or search for the value using the index. Of course, this means that the index lookup may return false matches (keys "abc[def]" and "abc[ghj]" are considered the same), but the full-length comparison is internally performed once the record is fetched anyway, so it's not an issue in practice.

create table tab (col varchar(10));
create index itab on tab (col) length 3;

insert into tab (col) values ('abcdef'); -- key: 'abc'
insert into tab (col) values ('abcqwe'); -- key: 'abc'
insert into tab (col) values ('defqwe'); -- key: 'def'
insert into tab (col) values ('qwerty'); -- key: 'qwe'

select * from tab where col = 'qqq' -- 0 rows returned, 0 indexed reads
select * from tab where col = 'abc' -- 0 rows returned, 2 indexed reads
select * from tab where col = 'qwe' -- 0 row returned, 1 indexed read
select * from tab where col = 'abcdef' -- 1 row returned, 2 indexed reads
select * from tab where col = 'abcqwe' -- 1 row returned, 2 indexed reads
select * from tab where col = 'defqwe' -- 1 row returned, 1 indexed read
etc

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

>>I don't mean expressions indices or any function like substring at all. This ticket has nothing to do with them
I see that this ticket has much in common with expression index
and i see only 1 index read benefits replacing expression index by length in your sample ;-)
index read count i get from FlameRobin

create table tab (col varchar(10));
create index itab on tab COMPUTED BY(SUBSTRING(col FROM 1 FOR 3));

insert into tab (col) values ('abcdef'); -- key: 'abc'
insert into tab (col) values ('abcqwe'); -- key: 'abc'
insert into tab (col) values ('defqwe'); -- key: 'def'
insert into tab (col) values ('qwerty'); -- key: 'qwe'

select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('qqq' FROM 1 FOR 3) and COL='qqq' -- 0 rows returned, 1{0} indexed reads
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('abc' FROM 1 FOR 3) and COL='abc' -- 0 rows returned, 3{2} indexed reads
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('qwe' FROM 1 FOR 3) and COL='qwe' -- 0 row returned, 2{1} indexed read
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('abcdef' FROM 1 FOR 3) and COL='abcdef' -- 1 row returned, 3{2} indexed reads
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('abcqwe' FROM 1 FOR 3) and COL='abcqwe' -- 1 row returned, 3{2} indexed reads
select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('defqwe' FROM 1 FOR 3) and COL='defqwe' -- 1 row returned, 2{1} indexed read

of course - query look worse ;-)

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Isn't your "query look worse" point a good enough reason? ;-) Why adding a substring call to the every query if it's more elegant to use the natural form of the query?

@firebird-automations
Copy link
Collaborator Author

Commented by: @livius2

Of course :)

But i saw above discussion that expression index can not be used here for speed up query for strings 10000 length
like MySQL do and because of that this should be implemented
what is not true ;-)

@firebird-automations
Copy link
Collaborator Author

Commented by: Vrinda Nayak (vrinda.nayak)

Hello,
We're using Firebird 3.0.2 for the project and the solution for issue reported above is what I'm looking for in my project.

The 3 table columns which are being used for creating a unique index are of type VARCHAR(255) each. But I cannot proceed as I get the error

"Resource Exception. unsuccessful metadata update; key size exceeds implementation restriction for index [SQLState:42000, ISC error code:335544351]"

Our requirement is to use first 64 chars of each of these 3 columns for indexing;
something like : "alter table tablename add constraint indexname unique (col1(64), col2(64), col3(64));"
which allows to specify length of columns for indexing in MYSQL.

Looking from the comments trail, it is reported long back; is there any fix or workaround for this (without having to use substring etc.)?

Can someone please advise/help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants