New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to specify length for index segments to be indexed on char-based columns [CORE2201] #2629
Comments
Commented by: Sergey Mereutsa (green_dq) Why just not use index by expression? You can create index by substring() function. |
Commented by: @dyemanov I believe that such an index cannot be used for ordering, so it seems making no sense to specify ASC or DESC there... |
Commented by: Martijn Tonies (martijntonies) Sergey, an expression based index will only be used if you use the same expression in the WHERE clause, as explained in the request. Dmitry, that might be true, yes (syntax is just an example). But if no other indices exist, can't the index be used as the basis for sorting? Meaning that it would start out by arranging data in the temporary sort table according to the index and then sort according to table data? This might not make sense at all to you though ;-) The reason for this functionality is: |
Commented by: @dyemanov Should we throw an error if this kind of index is created for a non-string column? Or would you prefer other datatypes to be converted to strings and at the same time truncated up to the specified length? This could be useful for e.g. text blobs, but if they're of UTF8, then the truncation could fail due to a length limit crossing the character boundary... |
Commented by: Martijn Tonies (martijntonies) I think an error should be thrown in the case of non-char columns, yet, text blobs should be handled, I think. The truncation should be char based, not byte based? So you think this could be a useful one? |
Commented by: @asfernandes > The truncation should be char based Agreed. |
Commented by: @dyemanov You have almost convinced me :-) Actually, I had the same idea sometime in the past. But let's see what other guys think about it. |
Modified by: @dyemanovassignee: Dmitry Yemanov [ dimitr ] |
Commented by: @hvlad Looks good. But in fact there is at least two feature requests - ability to create such index and ability for optimizer to use it ;) As for syntax - i think <length> can be specified for variable length data types only (currently [var]char and blob with subtype TEXT) and must take into account number of chars of course. |
Commented by: Sean Leyne (seanleyne) Martijn et al, 1 - Martijn, it would have better for this case to not have been opened, but a post made to the devel list to get this type of discussion out of the way first. 2 - It strikes me to be a bit of a "solution looking for a problem" 3 - what would happen in the follow situation? - A VarChar( 10000) column for which an index is defined for only the first 1000 chars, 4 - for large text fields, isn't the real problem that users are looking for a full text search functionality? Although I would suggest that a 'simple' "word index" (an index where the column is tokenized [based on user defined chars] into words and each word separately indexed) would also be appropriate, to allow for optimized containing searches. This would meet the needs in a large number of cases. |
Commented by: Martijn Tonies (martijntonies) Sean, 1) not subscribed currently, but isn't this what other users would do as well? 2) not sure what you mean by this? 3) Since when does CONTAINING use an index? 4) that defeats a WHERE column = '... literal value ...' |
Commented by: @livius2 I do not see difference between this and expression index is here any difference with this request and expression index use? select >>The difference between this index and an expression based index (eg: SUBSTRING(1, 10) to index the first 10 chars), I must ask - you can not do something in this situation? |
Commented by: @dyemanov The key point is that you're still writing the "unlimited" expression, e.g. WHERE AAA = 'SOME_TEXT' and it still uses the index even if it has the length limit specified. |
Commented by: @livius2 >>The key point is that you're still writing the "unlimited" expression, e.g. WHERE AAA = 'SOME_TEXT' and it still uses the index even if it has the length limit specified. du you mean if yes then what with this situation? and i suppose user know length of query string '123456' in application before they put it in where clause? |
Commented by: @dyemanov I don't mean expressions indices or any function like substring at all. This ticket has nothing to do with them. Instead, this is a regular index but the one which truncates the keys up to the specified limit when you insert the value into the index or search for the value using the index. Of course, this means that the index lookup may return false matches (keys "abc[def]" and "abc[ghj]" are considered the same), but the full-length comparison is internally performed once the record is fetched anyway, so it's not an issue in practice. create table tab (col varchar(10)); insert into tab (col) values ('abcdef'); -- key: 'abc' select * from tab where col = 'qqq' -- 0 rows returned, 0 indexed reads |
Commented by: @livius2 >>I don't mean expressions indices or any function like substring at all. This ticket has nothing to do with them create table tab (col varchar(10)); insert into tab (col) values ('abcdef'); -- key: 'abc' select * from tab where SUBSTRING(COL FROM 1 FOR 3)=SUBSTRING('qqq' FROM 1 FOR 3) and COL='qqq' -- 0 rows returned, 1{0} indexed reads of course - query look worse ;-) |
Commented by: @dyemanov Isn't your "query look worse" point a good enough reason? ;-) Why adding a substring call to the every query if it's more elegant to use the natural form of the query? |
Commented by: @livius2 Of course :) But i saw above discussion that expression index can not be used here for speed up query for strings 10000 length |
Commented by: Vrinda Nayak (vrinda.nayak) Hello, The 3 table columns which are being used for creating a unique index are of type VARCHAR(255) each. But I cannot proceed as I get the error "Resource Exception. unsuccessful metadata update; key size exceeds implementation restriction for index [SQLState:42000, ISC error code:335544351]" Our requirement is to use first 64 chars of each of these 3 columns for indexing; Looking from the comments trail, it is reported long back; is there any fix or workaround for this (without having to use substring etc.)? Can someone please advise/help? |
Submitted by: Martijn Tonies (martijntonies)
Votes: 2
When you have the ability to specify the length to be indexed for index segments, you can index very large char-based columns, yet not completely. Nevertheless, the index would help speeding things up when there's plenty of searches in the WHERE clause for smaller strings.
Example syntax (taken from MySQL);
CREATE [UNIQUE] INDEX index_name
[index_type]
ON tbl_name (index_col_name,...)
[index_type]
index_col_name:
col_name [(length)] [ASC | DESC]
Obviously, when you specify the length, it cannot be a unique index.
The difference between this index and an expression based index (eg: SUBSTRING(1, 10) to index the first 10 chars), is that with this index you don't have to use the same expression in the WHERE clause. For large char-based columns that cannot be indexed (eg: 10000 chars), this can help speeding queries up because it can actually use the data that exists in the table (and index).
RDB$INDEX_SEGMENTS should have the specified length for the segment and NULL otherwise? :-)
The text was updated successfully, but these errors were encountered: