New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing big table causes sort error: not enough memory [CORE2525] #2935
Comments
Modified by: @hvladassignee: Vlad Khorsun [ hvlad ] |
Commented by: Sean Leyne (seanleyne) How big do you estimate that the temp file would be for the index? (i.e. how many rows and what is the total size of the index fields) What is a "lot of free space"? What are the sort memory settings in firebird.conf? What version of FB are you running? Classic/SuperServer? |
Commented by: @ibaseru Vlad made special build of FB 2.1.2, so I could create this index. See comments below. >How big do you estimate that the temp file would be for the index? >(i.e. how many rows and what is the total size of the index fields) >What is a "lot of free space"? >What are the sort memory settings in firebird.conf? >What version of FB are you running? Classic/SuperServer? I think Vlad can make comment on this, because his source change allowed this index to be created, but it slowered index creation itself for about ~30%. So, this temporary "fix" can't be considered as a fix for that problem. Of course, if indexing such a huge tables is the problem at all :-) |
Commented by: @asfernandes Hum... 32 bits version? |
Commented by: @ibaseru yes, 32 bit Firebird. Windows XP SP3, 32bit too. |
Commented by: @hvlad Sorting have 3 main stages : a) accumulate data from the data source (SORT_put) At the stage (a) data is sorted in some blocks (sort run's) and this run's is stored in temporary space. Initially sort run's have size of 1MB each. When 8 such run's is created they merged into one bigger run of size 8MB. Again, when 8 run's of 8MB is created they merged into bigger run of 64MB. Therefore after the stage (a) we have a set of sorted blocks (run's) with size 1, 8, and 64MB. Few key parameters of this algorithm is hardcoded constants : const USHORT RUN_GROUP = 8; // number of run's to merge At the stage (b) the merge tree is created for all run's created at stage (a). For every tree item we need a buffer to read part of the corresponding run from disk. By default this buffers first allocated from the existing free sort memory used in our temporary space (and limited by TempCacheLimit setting). Size of each buffer is 1MB. For the run's which have no buffers we allocate additional memory from the OS. If allocation fails, we divide buffer size by 2 and try again. Until buffer size >= record_size * 2. At this point we throw out of memory error At the stage (c) sorted data is read from run's, merged on the fly and returned to the caller Look at current issue. Temporary space after stage (a) was about 180GB. This is near 180GB/64MB = 2880 runs of level 2 (maximum allowed run level). I see as expected that additional memory allocation failed for such big amount of run's. Even if it was not fail i doubt fetching phase (c) will run quickly, as we'll have very random IO - imagine 2880 IO requests at the start of the phase (c) and during it. So, all i made for KDV's private build was to raise MAX_MERGE_LEVEL constant up to 4, allowing runs of size 512MB and 4GB. Also i added some logging at phase (b) to know number of run's (it was 56) and how many additional memory was allocated (0). As you know it allows sorting to proceed with success. But allowing additional merge level introduced additional work fro both IO and CPU. This is the reason why whole process is slower than before. Therefore we can't just raise MAX_MERGE_LEVEL constant. We need more adaptive dynamic algorithm to determine optimal number of run's and its sizes. This is a task for the next Firebird version. |
Commented by: @asfernandes Vlad, your explanation of how sort works would be appreciated as a comment in sort.cpp (after the license header). It's just described as per function comments there. |
Commented by: @AlexPeshkoff Vlad, what do you think about merging not 8, but 64 runs into next level run? |
Commented by: @hvlad When i worked on SORT for v2.1 i tried to change RUN_GROUP to different values but results was not good\stable. I think at SORT_sort stage (when we know amount of data to sort) we should compute RUN_GROUP and MAX_MERGE_LEVEL dynamically. |
Commented by: @AlexPeshkoff I have nothing against dynamic calculation, but looks like we need to know this values before we know the amount of data to sort. We need to merge first group of level0 runs into level1 run right after fetching first N records from the stream. How can we know that stream size? (Imagine stream is generated by SP). |
Commented by: @hvlad My point is that except the final merge current hard-coded values of RUN_GROUP and MAX_MERGE_LEVEL is OK in most cases. The problem is when at final stage we have a big amount of run's to merge. It can require additional memory to buffer disk reads for every run (as in this ticket case) and also make IO very random and thus not effective. I'd said that maximum effective value of number of run's to merge depends on random IO capacity of device with temp files and amout of free memory. Before the SORT_sort i see no good reason to change RUN_GROUP or MAX_MERGE_LEVEL values. Also, i think, we can try to define different RUN_GROUP values for different MERGE_LEVEL's. It seems good if lower-level run's will be merged later (i.e. make RUN_GROUP bigger for low-level run's) as in this case most of them reside in memory and we have almost no IO penalty. Also it allow to eliminate not needed merge's for small and medium sorts. |
Modified by: @dyemanovFix Version: 3.0 Alpha 1 [ 10331 ] |
Modified by: @dyemanovFix Version: 3.0 RC2 [ 10048 ] => |
Submitted by: @ibaseru
Votes: 2
Creating index on a huge table (3.7 billion records). After few hours error happens:
Statement failed, SQLCODE = -904
sort error: not enough memory
At that time there are lot of free space in TEMP, and also 1.5gb RAM available.
Index on a table with less records was created successfuly, with temp file 43gb size.
The text was updated successfully, but these errors were encountered: