New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"SuperIndex" for string serching [CORE4687] #4995
Comments
Commented by: @sim1984 It's too wasteful. Such an index would be very slow INSERT / UPDATE / DELETE. This wish can be reduced to one of two points: |
Commented by: @livius2 >>>It's too wasteful. Such an index would be very slow INSERT / UPDATE / DELETE. yes i know i write about this "change to table is also expensive because of many indexes" but as any index people should know when use it and when not in provided case it bring new life to query on this big database >>This wish can be reduced to one of two points: but how if not in this direction like above? If you have concept i'am very interested :) >> 2. Full-text search |
Commented by: @romansimakov I see the following: Does anybody know how such problems solved in other DBMS? Oracle? |
Commented by: Omacht András (aomacht) Not a DBMS, but a working solution from Apache. |
Commented by: Sean Leyne (seanleyne) Edited "Description" for brevity and readability. |
Modified by: Sean Leyne (seanleyne)description: This is big task but can optimize search operation on big tables and case studies: last year some customer with very big Firebird database near 1 bilion of records in biggest table ask for optimising search on field VarChar(127) FullText indexing words and you can ask like "starting with" but not "containing" because in this case there was only one field and short VarChar(127) create expression indexes like CREATE ASCENDING INDEX IXAE_SMS__BODY1 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1)); and when we search then we do SELECT and this query answer in few milliseconds with 100% of records :) Will be good to see build in Firebird creation of something what i call SuperIndex and then we can and then we can do simple search: SELECT and then engine do for us work like above I do not know whether sufficient human resources are in FB team => This is big task but can optimize search operation on big tables and answer quality will be 100% of searched records for query "like '%something%'" Context: Last year some customer with very big Firebird database near 1 bilion of records in biggest table ask for optimising search on field VarChar(127) They used full text search engine and it work ok as fulltext but, as you know, how it work then answer quality is not 100%. Benefit was that it answer in few seconds instead of many seconds if we do search like UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1) LIKE 'SOMETHING%' full table scan FullText indexing words and you can ask like "starting with" but not "containing" It find record with text like "case studies" if you search for "studi" but not find if you search for "ies" because in this case there was only one field and short VarChar(127), Then i decided to rethink the problem and I got idea to create a series of expression indexes like: CREATE ASCENDING INDEX IXAE_SMS__BODY1 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1)); and when we search then we do SELECT This query ran in few milliseconds with 100% of records :) Of course it is storage expensive and any change to table is also expensive because of many indexes. But benefit is much more bigger for this customer and he was very surprised when he saw answer in few ms - and got 100% quality answer. Will be good to see build in Firebird creation of something what I call "SuperIndex" This can be created poorly internally with many virtual indexes like above - and visible outside as single index but mayby someone have concept how to store index info optimal without redundant information (i suppose that this can be patented then :) and then we can: CREATE SuperIndex IXAS_SMS__BODY ON SMS(BODY); and then we can do simple search: SELECT and then engine do for us work like above I do not know whether sufficient human resources are in FB team and of course if the same concept will be of interest |
Commented by: Sean Leyne (seanleyne) Karol, It seems that you are describing a "word" index, where each word would be indexed and the LIKE and CONTAINING operator would use the word index. Am I correct? |
Commented by: @livius2 not exactly "the best FIREBIRD" with only word index it can find record in case like "FIR% but not find any record if you looking for "%BIRD" It should find also ending text in word also inner text in word |
Commented by: Sean Leyne (seanleyne) Karol, The issue of finding inner and ending text is the result of the current "index prefix compression" logic used by the engine. This results in indexed values not stored as full values/strings so that the index values cannot be "walked"/scanned to determine which values meet the criteria. The index storage structure is a by-product of a long ago decision which focused on minimizing storage footprint at the "cost" of limited search functionality. Arguably with current abundance of storage space, it might be appropriate to add option to CREATE INDEX syntax to disable the compression and enable index use for all CONTAINING, LIKE and ENDING operations. |
Submitted by: @livius2
Votes: 1
This is big task but can optimize search operation on big tables and answer quality will be 100% of searched records for query "like '%something%'"
Context:
Last year some customer with very big Firebird database near 1 bilion of records in biggest table ask for optimising search on field VarChar(127)
They used full text search engine and it work ok as fulltext but, as you know, how it work then answer quality is not 100%.
Benefit was that it answer in few seconds instead of many seconds if we do search like UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1) LIKE 'SOMETHING%' full table scan
FullText indexing words and you can ask like "starting with" but not "containing" It find record with text like "case studies" if you search for "studi" but not find if you search for "ies" because in this case there was only one field and short VarChar(127),
Then i decided to rethink the problem and I got idea to create a series of expression indexes like:
CREATE ASCENDING INDEX IXAE_SMS__BODY1 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1));
CREATE ASCENDING INDEX IXAE_SMS__BODY2 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 2));
CREATE ASCENDING INDEX IXAE_SMS__BODY3 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 3));
CREATE ASCENDING INDEX IXAE_SMS__BODY4 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 4));
CREATE ASCENDING INDEX IXAE_SMS__BODY5 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 5));
... more
and when we search then we do
SELECT
...
from
...
WHERE
(
SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1) LIKE 'SOMETHING%' OR
SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 2) LIKE 'SOMETHING%' OR
SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 3) LIKE 'SOMETHING%' OR
SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 4) LIKE 'SOMETHING%' OR
SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 5) LIKE 'SOMETHING%' OR
...
)
This query ran in few milliseconds with 100% of records :)
Of course it is storage expensive and any change to table is also expensive because of many indexes. But benefit is much more bigger for this customer and he was very surprised when he saw answer in few ms - and got 100% quality answer.
Will be good to see build in Firebird creation of something what I call "SuperIndex"
This can be created poorly internally with many virtual indexes like above - and visible outside as single index but mayby someone have concept how to store index info optimal without redundant information (i suppose that this can be patented then :)
and then we can:
CREATE SuperIndex IXAS_SMS__BODY ON SMS(BODY);
and then we can do simple search:
SELECT
...
from
...
WHERE
SMS_BODY LIKE '%SOMETHING%'
and then engine do for us work like above
I do not know whether sufficient human resources are in FB team and of course if the same concept will be of interest
The text was updated successfully, but these errors were encountered: