DIscussed in fb-devel (see "RFC: Data page allocation algorithm" at 26 Dec 2013)
I offer to allocate data pages not one-by-one (as currently) but in group of sequential ordered pages. Such group of pages is often called "extent".
I offer to change page allocation algorithm for tables as follows:
- if table is empty or small (have no full extent allocated) then data pages is allocated one-by-one (as currently)
- if table already have at least one full extent allocated, next request for new page will allocate the whole extent of pages
- size of extent is 8 pages
- every such extent is aligned at 8-pages boundary
This new algorithm will:
- reduce page-level fragmentation (all pages in extent are adjacent),
- allow OS-level prefetch to work more efficient (it will read not just a bunch of pages of random objects but pages related to the same table), and
- allow in the future to read and write in a large chunks making IO more efficient.