New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize disk space consumption for BLOB concatenation operations [CORE3948] #4281
Comments
Commented by: @asfernandes If blob concatenation is moved to temporary space, then the cost of concatenate followed be its write to database will double. After commit/rollback, this wasted database space will be filled again. |
Commented by: Sean Leyne (seanleyne) Based on Adriano's comments, it seems the current functionality is "as designed/won't fix". It is a temporary condition, the un-needed space/pages will be released to the engine for re-use for other purposes. So, this case should be closed. Objections? |
Commented by: @asfernandes Well, there is room for improvements IMO. But this involves a kind of garbage colletor for blobs in PSQL. It's far from be easy! |
Modified by: Sean Leyne (seanleyne)priority: Major [ 3 ] => Minor [ 4 ] issuetype: Bug [ 1 ] => New Feature [ 2 ] summary: Excessive disk space consumption on BLOB concatenation => Optimize disk space consumption for BLOB concatenation operations |
Commented by: Sean Leyne (seanleyne) I have edited the case description to better reflect the nautre of the case, changes required as well as to turn the case into a feature request. |
Commented by: André Schwabe (a.schwabe) In the case that we noticed this with a customer database went from 2,2Gb to 128Gb until it crashed because there was no more space. Even adding 500 Characters to a Blob 5000 times increases the database size by 2 Gb permanently. Surely there is room for improvement. |
Commented by: @mrotteveel How can concatenating 4000 * 59 characters = 236000 characters - best case 236kb, worst case 944kb - result in a 1 GB increase. More importantly, how can that kind of excessive allocation not be considered a bug? |
Commented by: Hilmar Brodner (syn) I agree with Mark that this surely cannot be a feature request. What feature would be requested here? "Add characters to a blob without using 1000 times the necessary space"? Sorry, Sean, not marking this as a bug but as a feature request would need some more explanation. As I understand Adriano this is also not "as designed" and we should see about "won't fix". I understand Adriano that evidently all of the blob operations are done not in memory but on disc. One cannot do that in memory because of the huge size a blob could have. So optimum solution imho would be to do it on disc like with the temp files that are created when the sort buffer runs out of space. As I take it changing that is a rather large chunk of work so that we are going to have to live a while longer with on disc operations. |
Commented by: @asfernandes I'll recheck it, as I implemented solution for CORE1658, which is a related problem. |
Modified by: @asfernandes |
Modified by: @asfernandes |
Modified by: @asfernandes |
Commented by: @hvlad > Even adding 500 Characters to a Blob 5000 times increases the database size by 2 Gb permanently The real solution, as Adriano said, includes kind of garbage collector for blobs, also it probably includes reference counting for temporary blobs and ability to append same instance of temporary blob without creating new one. PS Hilmar, you guesses are correct |
Commented by: Hilmar Brodner (syn) We played around with this a bit more. In order to get around this problem we though we might actually write the concatenation to disc. So instead of doing this in "memory" we are doing updates on a table (terribly slow, but at least without taking 120 Gb of disc space). We however need to return more than one record so we have the following constellation: CREATE TABLE BBLOB ( SET TERM ^ ; INSERT INTO BBLOB( ID, BLOBBY) VALUES ( 1,''); COUNTER = 3;
END GRANT SELECT,INSERT,DELETE,UPDATE ON BBLOB TO PROCEDURE P_TEST2; Now if you do a "SELECT * FROM P_TEST2" you get a "BLOB not found" error message. Why? Well, seemingly the blob is not in memory if you do a SELECT (we recall: it sort of works if we do this in "memory"). So we went a bit further and added a single line that shouldn't do anything:
ending up with SET TERM ^ ; INSERT INTO BBLOB( ID, BLOBBY) VALUES ( 1,''); COUNTER = 3;
SUSPEND; END GRANT SELECT,INSERT,DELETE,UPDATE ON BBLOB TO PROCEDURE P_TEST2; If we do our "SELECT * FROM P_TEST2" again, everything works fine. I'm unsure if I should open a new thread with this as it might be a related issue. This should qualify as a bug however, shouldn't it? |
Commented by: @hvlad > Now if you do a "SELECT * FROM P_TEST2" you get a "BLOB not found" error message. Why? This is definitely support question. PS Blob's is not varchar's, its was not designed for manipulation at server side. |
Commented by: Hilmar Brodner (syn) Sorry, but did you read my post? The problem is exactly the other way around: If you do server side manipulation it works, if you do not do it you get an error. Next iteration removes it only if you do NOT do this: BLOB_TEXT = BLOB_TEXT || ''; If you do this server side manipulation it does work... The "Why?" was a rhethorical question, answered in the same line. The problem is the other way around to your answer which thus points it to be a bug and not a support issue. "...not designed for manipulation at server side" Well, http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/Firebird-2.5-LangRef-Update.pdf says that this has changed and all the operations we do above should work. Page 20: " |
Commented by: @hvlad Hilmar, this is *support* question, and - yes, i've read your post. Ok, last try to explain how things works internally. - engine *can't* change blobs, it can only *create new* blob, this is as designed many years ago Back to your procedure: - at 1st UPDATE engine creates new permanent blob P2 and placed P1 into undo-log - at 2nd UPDATE engine creates new permanent blob P3 and immediately removed P2 as it is impossible to undo to the point where P2 exists when client application opens P2 it receives "blob not found" error, obviously. When you add BLOB_TEXT = BLOB_TEXT || ''; before the SUSPEND - you completely changed the game, look: - at INSERT engine created permanent blob, with blob_id, say P1 - at 1st UPDATE engine creates new permanent blob P2 and placed P1 into undo-log - at 2nd UPDATE engine creates new permanent blob P3 and immediately removed P2 as it is impossible to undo to the point where P2 exists I hope now it is more clear. |
Commented by: Hilmar Brodner (syn) Vlad, I understand how Firebird works in this respect perfectly well. My problem is with the way it does work: 1. Firebird documentation says Blobs now support a lot of the operations that Varchars can do. When you use them you get strange results and memory leaks (having a database go from 2 Gb to 140 Gb while working with Blobs and then crashing because of having run out of space is NOT okay and it is NOT a problem that can be solved by support, thus not for the support forum). 2. Failure to be able to do a sort of garbage collection on Blobs so the database will stay huge until a restore is done: Will not be able to be fixed by support. I also fail to see how you can argue that this is "as designed" and works perfectly well. 3. I understand why things work like they do when I use a concatenation. Nonetheless it is wrong. It is like having to use "iInteger = iInteger + 0;" in order to be able to suspend it. (Yes, I know it is not the same and you cannot 100% compare it but I think you see my point). So if you can tell me how to join a few thousand lines to a Blob without crashing a live production system database and without increasing the database by factor 100 or more I will gladly accept this as a support question. Otherwise I would gladly appreciate this point not being closed because it's "as designed" or a "support issue". And maybe somewould should think it over to say concatenation of Blobs works like a charm in the release notes. |
Commented by: @hvlad Hilmar, > I understand how Firebird works in this respect perfectly well. ... > So if you can tell me how to join a few thousand lines to a Blob PS It is not productive discussion and i see no point to make arguments which is not used by other side. PPS There is room for improvements in blob handling at engine side but it will never support crazy usage. |
Commented by: André Luís de Oliveira Pereira (andreluisop) Temos mais de 200 clientes utilizando o Firebird, sendo que neste momento todos que utilizam o sistema de Contabilidade, Receitas, RH e Saneamento estão com a versão 2.1 e 2.5, e estão apresentando problemas de espaço em disco quando executado procedimentos que manipulam campos BLOB e principalmente quando concatenam os mesmos, houve vários casos neste inicio de ano ao calcular IPTU no sistema de Receitas, no qual um banco de dados de 60GB saltou para 135GB parando o Firebird e causando um problema enorme com um dos clientes TOP 10, gostaria de saber quando este problema será resolvido, estamos tentando contornar criando campos do tipo VARCHAR com o tamanho de 32000, mas ficaram muito lento, existe outra solução, porque não usar o temporário para realizar estas manipulações com procedimentos e campos BLOB e após serem descartados, podem criar arquivos de cache grandes mas após seriam eliminados. |
Commented by: Sean Leyne (seanleyne) Edited the description for readability |
Modified by: Sean Leyne (seanleyne)description: When a blob is being concatenated mulitple times, the disk space for the database file grows massively. The only way to get the file size back to normal is a backup and restore of the database. The issue can be reproduced as described below: - Create an empty database SET TERM ^ ; CREATE OR ALTER PROCEDURE P_TEST ( WHILE (COUNTER < ITERATIONS) SUSPEND; SET TERM ; ^ GRANT EXECUTE ON PROCEDURE P_TEST TO "PUBLIC"; - run the procedure: SELECT BLOBTEXT FROM P_TEST('This is a test sentence, which is going to be concatenated.', 4000); After the procedure has been executed, the disk space used by the database file increases to over 1 GB. It makes no difference, if the transaction is committed or not. => When a blob is being concatenated mulitple times, the disk space for the database file grows massively. The only way to get the file size back to normal is a backup and restore of the database. The issue can be reproduced as described below: - Create an empty database WHILE (COUNTER < ITERATIONS) SUSPEND; GRANT EXECUTE ON PROCEDURE P_TEST TO "PUBLIC"; - run the procedure: After the procedure has been executed, the disk space used by the database file increases to over 1 GB. It makes no difference, if the transaction is committed or not. |
Commented by: @hvlad Andre, use english, please |
Commented by: Agris (agxagx) Today I encountered this issue while writing procedure that had to generate and return CSV file with approx. 5000 rows. As you are quessing that - I made for select loop and was concatenating selected field values into blob variable. Example code: On small data amounts (~100 rows) it was working fast - less than 100ms. But running in production database - on 5000 rows - I got execution time almost 4minutes. Query itself was was executed in less than 100ms. I started googling and found this thread. Note that in LIST documentation it is specified, that "The ordering of the list values is undefined". In my case I didn't encounter any crazy data ordering. |
Commented by: @livius2 I see simple solution for this. CREATE OR ALTER PROCEDURE P_TEST ( WHILE (COUNTER < ITERATIONS) BLOBTEXT = VAR_TMP; SUSPEND; i supose this can be simply implemented by FB team or maybe even simpler - only SUSSPEND should produce new blob id and reallocate it - all other operations inside stored proc should be without new and new temporary blobs. |
Commented by: @hvlad Karol, no, it is not as simple as we all wish, unfortunately. Else it would have been implemented years ago. So far, I can only repeat myself: > The real solution, as Adriano said, includes kind of garbage collector for blobs, also it probably includes reference counting for temporary blobs BTW, ticket type should be "Improvement", not "New feature", imo |
Commented by: @asfernandes I do also have half-developed RDB$BLOB_UTIL system package which solve this problem in PSQL. However it probably will not be in v4.0. |
AFAIU, with |
Agree. |
Submitted by: Jörg Fröber (j_froeber)
Relate to CORE1658
Relate to CORE3838
Relate to CORE1273
Votes: 7
When a blob is being concatenated mulitple times, the disk space for the database file grows massively. The only way to get the file size back to normal is a backup and restore of the database.
The issue can be reproduced as described below:
- Create an empty database
- Create the following procedure:
SET TERM ^ ;
CREATE OR ALTER PROCEDURE P_TEST (
TEST VARCHAR(200),
ITERATIONS INTEGER)
RETURNS (
BLOBTEXT BLOB SUB_TYPE 0 SEGMENT SIZE 80)
AS
DECLARE COUNTER INTEGER;
BEGIN
COUNTER = 0;
BLOBTEXT = '';
WHILE (COUNTER < ITERATIONS)
DO BEGIN
BLOBTEXT = BLOBTEXT || ' ' || TEST;
COUNTER = COUNTER + 1;
END
SUSPEND;
END^
SET TERM ; ^
GRANT EXECUTE ON PROCEDURE P_TEST TO "PUBLIC";
GRANT EXECUTE ON PROCEDURE P_TEST TO SYSDBA;
- run the procedure:
SELECT BLOBTEXT FROM P_TEST('This is a test sentence, which is going to be concatenated.', 4000);
After the procedure has been executed, the disk space used by the database file increases to over 1 GB. It makes no difference, if the transaction is committed or not.
When you run the procedure another time using the SQL-command EXECUTE PROCEDURE P_TEST('This is a test sentence, which is going to be concatenated.', 4000); the file size increases again.
The text was updated successfully, but these errors were encountered: