Indexing big table causes sort error: not enough memory [CORE2525] #2935

firebird-automations · 2009-06-25T11:01:18Z

Submitted by: @ibaseru

Votes: 2

Creating index on a huge table (3.7 billion records). After few hours error happens:

Statement failed, SQLCODE = -904
sort error: not enough memory

At that time there are lot of free space in TEMP, and also 1.5gb RAM available.

Index on a table with less records was created successfuly, with temp file 43gb size.

firebird-automations · 2009-06-25T11:39:04Z

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

firebird-automations · 2009-06-25T20:35:05Z

Commented by: Sean Leyne (seanleyne)

How big do you estimate that the temp file would be for the index? (i.e. how many rows and what is the total size of the index fields)

What is a "lot of free space"?

What are the sort memory settings in firebird.conf?

What version of FB are you running? Classic/SuperServer?

firebird-automations · 2009-06-25T21:28:49Z

Commented by: @ibaseru

Vlad made special build of FB 2.1.2, so I could create this index. See comments below.

>How big do you estimate that the temp file would be for the index?
it was 183 gigabytes. The error on released FB 2.1.2 happens I think near 100-120 gigabytes of temp file size (at 5 hours from the start). Unfortunately, I wasn't able to watch on TEMP directory to cach the problematic file size, and also, when error happens, temp file is being deleted immediately.
Successful index creation took 12 hours.

>(i.e. how many rows and what is the total size of the index fields)
3.7 billion. 4 fields - integer, smallint, smallint, smallint. It's a primary key.

>What is a "lot of free space"?
375 gigabytes at first try, and ~500 gigabytes at second try. After that I understood that the cause of the error is not TEMP space or RAM.

>What are the sort memory settings in firebird.conf?
Default. Only database cache pages was changed to 16384 (db has 16k page size), and TEMP. Nothing else.
Vlad said that this problem can't be fixed by changing firebird.conf parameters.

>What version of FB are you running? Classic/SuperServer?
Superserver.

I think Vlad can make comment on this, because his source change allowed this index to be created, but it slowered index creation itself for about ~30%. So, this temporary "fix" can't be considered as a fix for that problem. Of course, if indexing such a huge tables is the problem at all :-)

firebird-automations · 2009-06-25T21:50:14Z

Commented by: @asfernandes

Hum... 32 bits version?

firebird-automations · 2009-06-25T23:12:37Z

Commented by: @ibaseru

yes, 32 bit Firebird. Windows XP SP3, 32bit too.

firebird-automations · 2009-06-26T10:43:35Z

Commented by: @hvlad

Sorting have 3 main stages :

a) accumulate data from the data source (SORT_put)
b) prepare results for fetch (SORT_sort)
c) fetch sorted data (SORT_get)

At the stage (a) data is sorted in some blocks (sort run's) and this run's is stored in temporary space. Initially sort run's have size of 1MB each. When 8 such run's is created they merged into one bigger run of size 8MB. Again, when 8 run's of 8MB is created they merged into bigger run of 64MB. Therefore after the stage (a) we have a set of sorted blocks (run's) with size 1, 8, and 64MB. Few key parameters of this algorithm is hardcoded constants :

const USHORT RUN_GROUP = 8; // number of run's to merge
const USHORT MAX_MERGE_LEVEL = 2; // max size of run which could be merged

At the stage (b) the merge tree is created for all run's created at stage (a). For every tree item we need a buffer to read part of the corresponding run from disk. By default this buffers first allocated from the existing free sort memory used in our temporary space (and limited by TempCacheLimit setting). Size of each buffer is 1MB. For the run's which have no buffers we allocate additional memory from the OS. If allocation fails, we divide buffer size by 2 and try again. Until buffer size >= record_size * 2. At this point we throw out of memory error

At the stage (c) sorted data is read from run's, merged on the fly and returned to the caller

Look at current issue. Temporary space after stage (a) was about 180GB. This is near 180GB/64MB = 2880 runs of level 2 (maximum allowed run level). I see as expected that additional memory allocation failed for such big amount of run's. Even if it was not fail i doubt fetching phase (c) will run quickly, as we'll have very random IO - imagine 2880 IO requests at the start of the phase (c) and during it.

So, all i made for KDV's private build was to raise MAX_MERGE_LEVEL constant up to 4, allowing runs of size 512MB and 4GB. Also i added some logging at phase (b) to know number of run's (it was 56) and how many additional memory was allocated (0). As you know it allows sorting to proceed with success.

But allowing additional merge level introduced additional work fro both IO and CPU. This is the reason why whole process is slower than before. Therefore we can't just raise MAX_MERGE_LEVEL constant. We need more adaptive dynamic algorithm to determine optimal number of run's and its sizes. This is a task for the next Firebird version.

firebird-automations · 2009-06-26T13:22:28Z

Commented by: @asfernandes

Vlad, your explanation of how sort works would be appreciated as a comment in sort.cpp (after the license header). It's just described as per function comments there.

firebird-automations · 2009-07-06T15:05:33Z

Commented by: @AlexPeshkoff

Vlad, what do you think about merging not 8, but 64 runs into next level run?

firebird-automations · 2009-07-09T07:20:00Z

Commented by: @hvlad

When i worked on SORT for v2.1 i tried to change RUN_GROUP to different values but results was not good\stable.
Also think about IO - the more run's we have to merge, the more randomness we have.

I think at SORT_sort stage (when we know amount of data to sort) we should compute RUN_GROUP and MAX_MERGE_LEVEL dynamically.

firebird-automations · 2009-07-09T08:47:06Z

Commented by: @AlexPeshkoff

I have nothing against dynamic calculation, but looks like we need to know this values before we know the amount of data to sort. We need to merge first group of level0 runs into level1 run right after fetching first N records from the stream. How can we know that stream size? (Imagine stream is generated by SP).

firebird-automations · 2009-07-09T13:18:05Z

Commented by: @hvlad

My point is that except the final merge current hard-coded values of RUN_GROUP and MAX_MERGE_LEVEL is OK in most cases.

The problem is when at final stage we have a big amount of run's to merge. It can require additional memory to buffer disk reads for every run (as in this ticket case) and also make IO very random and thus not effective. I'd said that maximum effective value of number of run's to merge depends on random IO capacity of device with temp files and amout of free memory.

Before the SORT_sort i see no good reason to change RUN_GROUP or MAX_MERGE_LEVEL values.

Also, i think, we can try to define different RUN_GROUP values for different MERGE_LEVEL's. It seems good if lower-level run's will be merged later (i.e. make RUN_GROUP bigger for low-level run's) as in this case most of them reside in memory and we have almost no IO penalty. Also it allow to eliminate not needed merge's for small and medium sorts.

firebird-automations · 2009-10-24T18:11:57Z

Modified by: @dyemanov

Fix Version: 3.0 Alpha 1 [ 10331 ]

firebird-automations · 2012-10-23T10:21:44Z

Modified by: @dyemanov

Fix Version: 3.0 Beta 1 [ 10332 ]

Fix Version: 3.0 Alpha 1 [ 10331 ] =>

firebird-automations · 2014-04-24T15:15:34Z

Modified by: @dyemanov

Fix Version: 3.0 Beta 2 [ 10586 ]

Fix Version: 3.0 Beta 1 [ 10332 ] =>

firebird-automations · 2015-03-18T16:29:08Z

Modified by: @dyemanov

Fix Version: 3.0 RC 1 [ 10584 ]

Fix Version: 3.0 Beta 2 [ 10586 ] =>

firebird-automations · 2015-10-27T12:51:26Z

Modified by: @dyemanov

Fix Version: 3.0.0 [ 10048 ]

Fix Version: 3.0 RC 1 [ 10584 ] =>

firebird-automations · 2016-01-25T09:37:25Z

Modified by: @dyemanov

Fix Version: 3.0 RC2 [ 10048 ] =>

firebird-automations added affect-version: 2.1.2 priority: major component: engine type: bug labels Apr 25, 2021

firebird-automations assigned hvlad Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing big table causes sort error: not enough memory [CORE2525] #2935

Indexing big table causes sort error: not enough memory [CORE2525] #2935

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 26, 2009

firebird-automations commented Jun 26, 2009

firebird-automations commented Jul 6, 2009

firebird-automations commented Jul 9, 2009

firebird-automations commented Jul 9, 2009

firebird-automations commented Jul 9, 2009

firebird-automations commented Oct 24, 2009

firebird-automations commented Oct 23, 2012

firebird-automations commented Apr 24, 2014

firebird-automations commented Mar 18, 2015

firebird-automations commented Oct 27, 2015

firebird-automations commented Jan 25, 2016

Indexing big table causes sort error: not enough memory [CORE2525] #2935

Indexing big table causes sort error: not enough memory [CORE2525] #2935

Comments

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 25, 2009

firebird-automations commented Jun 26, 2009

firebird-automations commented Jun 26, 2009

firebird-automations commented Jul 6, 2009

firebird-automations commented Jul 9, 2009

firebird-automations commented Jul 9, 2009

firebird-automations commented Jul 9, 2009

firebird-automations commented Oct 24, 2009

firebird-automations commented Oct 23, 2012

firebird-automations commented Apr 24, 2014

firebird-automations commented Mar 18, 2015

firebird-automations commented Oct 27, 2015

firebird-automations commented Jan 25, 2016