Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher memory consumption for prepared statements [CORE5611] #5877

Closed
firebird-automations opened this issue Sep 12, 2017 · 18 comments
Closed

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @reevespaul

Customer has reported about noticably higher memory consumption with FB 3.0 than they see with FB 2.5. This is somewhat expected, given the metadata cache is per-attachment in FB3 and their production database is complex. But then Paul Reeves and I were able to reproduce this with TPCC which database has just a few tables and procedures, so the metadata cache should not be a big issue.

This is what I've got looking at mon$memory_usage (values are avg/max):

v2.5 SS
-----------
database: 22MB
attachment: 150KB
transaction: 1.7KB / 18KB
statement: 25KB / 90KB

v3.0 SS
-----------
database: 160MB
attachment: 2.4MB
transaction: 1.1KB / 13KB
statement: 88KB / 154KB

Database counter is aggregated among all attachments, it doesnt show where the problem is, but it does show that the problem exists. Attachment counter clearly shows noticably higher memory usage, but we still cannot say for sure whether it's effect of the metadata caching or not. Transaction counter show only a little increase, it shouldn't be a problem. Finally, statement counter does indicate a problem. We see a 2x-3x more memory used for DSQL statements in v3.0 as compared to v2.5.

Quick debugging shows that even trivial statements like "execute procedure X (A, B, C)" consume 3x more memory: 10KB vs 32KB (with the same impure size). Here are examples from the memory pool dumps for the same statements:

v2.5
0x7ffff2b5e530 USED: size=80 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e5a0 USED: size=96 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e620 USED: size=80 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e690 USED: size=72 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e6f8 USED: size=80 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e768 USED: size=80 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e7d8 USED: size=80 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e848 USED: size=80 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e8b8 USED: size=72 allocated at ../src/jrd/par.cpp:686
0x7ffff2b5e920 USED: size=80 allocated at ../src/jrd/par.cpp:686

v3.0
USED 0x7fffe3405d20: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe3405d58: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe3405d90: size=56 allocated at /work/v3-release/src/jrd/../jrd/../common/classes/array.h:464
USED 0x7fffe3405dc8: size=232 allocated at /work/v3-release/src/jrd/par.cpp:1234
USED 0x7fffe3405eb0: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe3405ee8: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe3405f20: size=56 allocated at /work/v3-release/src/jrd/../jrd/../common/classes/array.h:464
USED 0x7fffe3405f58: size=232 allocated at /work/v3-release/src/dsql/ExprNodes.cpp:7783
USED 0x7fffe3406040: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe3406078: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe34060b0: size=56 allocated at /work/v3-release/src/jrd/../jrd/../common/classes/array.h:464
USED 0x7fffe34060e8: size=232 allocated at /work/v3-release/src/dsql/ExprNodes.cpp:7795
USED 0x7fffe34061d0: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe3406208: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe3406240: size=56 allocated at /work/v3-release/src/jrd/../jrd/../common/classes/array.h:464
USED 0x7fffe3406278: size=232 allocated at /work/v3-release/src/jrd/par.cpp:1230
USED 0x7fffe3406360: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe3406398: size=56 allocated at /work/v3-release/src/jrd/../jrd/../jrd/../jrd/../dsql/Nodes.h:681
USED 0x7fffe34063d0: size=56 allocated at /work/v3-release/src/jrd/../jrd/../common/classes/array.h:464
USED 0x7fffe3406408: size=232 allocated at /work/v3-release/src/jrd/par.cpp:1234

Besides the fact that the execution tree node itself is bigger in v3.0 (e.g. 232 vs 80 bytes), we also see 3 extra allocations by 56 bytes that didn't exist in v2.5. This makes a huge total difference. Given that the execution trees exist not only for DSQL statements but also for every cached PSQL object, this issue affects also the metadata cache thus making per-attachment memory usage even bigger.

I can see several reasons:
1) Every node now contains vtable with a dozen of virtual methods
2) Nodes now contain both DSQL and JRD parts, even if one of them is unused
3) Child nodes are now wrapped in NodeRefImpl which is way too fat for a pointer wrapper (56 bytes due to its vtable)

Item (1) is by design and supposedly cannot be fixed/improved.

Regarding item (2): perhaps nodes could be spitted into two separate categories of objects (e.g. Dsql*Node, Jrd*Node), at least as a temporary solution, but this a big refactoring work which I'm afraid is not really suitable for v3 point releases.

As for item (3), I'd suggest to rework this implementation. The current solution is designed with a type safety in mind, but the cost is just too high. Hopefully, some clever alternative could be invented instead. Or we'll have to live with a couple of unsafe casts.

Additionally, I don't see where the DSQL parser tree is destroyed after being compiled into BLR. It's allocated out of the DSQL statement pool and seems to remain alive until the DSQL statement is freed and delete-by-pool happens. Do we really need to preserve the DSQL part of the tree? Or does it occupy memory for nothing and doubles the undesired memory overhead? This is not the v3 regression, but if we'd implement a way to clear unnecessary DSQL resources early, perhaps it could somewhat compensate the increased "by-design" part of the memory consumption.

Adriano and others are welcome to share their opinions. We may change this ticket to "improvement" if necessary.

Commits: 9f834ab

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

reporter: Dmitry Yemanov [ dimitr ] => Paul Reeves [ paul_reeves ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @aafemt

vtable is a static thing and is not allocated for each object separately (not sure about pseudo-vtable from CLOOP).

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

That was a quick analysis, someone needs to dig deeper. AFAIK, vtable costs sizeof(pointer) per object instance, but we clearly see a bigger overhead. Maybe something sub-optimal happens in arrays.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

I'm doing some work here: https://github.com/FirebirdSQL/firebird/commits/work/nodes-memory-core-5611

There is also a pool reference in each node (it's inherited from PermanentStorage). it should be possible but a bit difficult to remove it.

I'll also investigate about the DSQL statement pool. I remember to be surprised in the past about it being live.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

I'm now deleting the scratch pool. I did run tcs without any crash but did not reviewed whole code.

Could you please test and show the results with this branch?

@firebird-automations
Copy link
Collaborator Author

Commented by: @reevespaul

My original tests compared a snapshot of FB 3.0.3 with Fb 2.5.7. It would be easier ( better ?) if I could test a patch based on B3_0_Release rather than HEAD.

Alternatively, I could build this branch and compare against HEAD. But that would not necessarily prove that the original problem will be fixed, because there may be other issues introduced in HEAD.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

I prefer to do major work first against master branch as it is more volatile codebase.

@firebird-automations
Copy link
Collaborator Author

Commented by: @reevespaul

I understand that, but I'm not sure whether my tests (using TPCC) will be useful. I will need to set up tests for HEAD and the nodes-memory-core branch but the original bug report was for 3.0.3.

My worry is that there are too many other variables that have changed and the tests will just be a waste of time because I won't be comparing like with like.

Anyway, I'll do the builds (tomorrow probably) and let you know the results.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Paul, I believe you can easily cherry-pick the commits onto your local 3.0 branch to test.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

summary: Higher memory consumption for prepared statements in FB3 => Higher memory consumption for prepared statements

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Beta 1 [ 10750 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: No test => Not enough information

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @sim1984

Are there any test results? If they are positive, can this fix be applied to 3.0?

@firebird-automations
Copy link
Collaborator Author

Commented by: @reevespaul

I did some tests.

The results were OK, but not spectacular. Less memory was consumed but overall consumption was still significantly higher than in comparison to 2.5. However, the tests were on an alpha of v4 so it is not possible to be sure that something else was distorting the results.

I would also like this to be back-ported to v3, but I haven't had time to do anything about it. The required changes are important and need to be well tested before including in an official release.

@asfernandes
Copy link
Member

Backported to 3.0.8 in #6859.

@pavel-zotov
Copy link

Decided not to implement test for this issue after discussion with FB team, letter from dimitr, 11-oct-2021 09:12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment