Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server is Crashing after Building up Memory when One or More Clients are Connected Concurrently [CORE3385] #3751

Open
firebird-automations opened this issue Mar 15, 2011 · 24 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Andre van Zuydam (andrevanzuydam)

Attachments:
firebird.log

Votes: 1

We have a test system running on 2.5.1 snapshot, each week (plus minus 7 - 8 days) apart we have to manually restart / initialize the Firebird service which crashes out.

Looking into the logs points to a possible network problem which we have had a look at, unfortunately I get INET/inet_error: read errno = 10054 on local host development machine every time I work.

One possible cause is perhaps the sweep that failed ? The five clients were connected but could not query the server after such an incident which also makes the problem strange.

The logs of the past week since the last crash and today are below.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

I removed the over-longish firebird.log contents, please attach it as a separate file.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

security: Developers [ 10012 ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

What exactly v2.5.1 build do you use? Also, what platform (win32 / win64)?

All the errors in the log are related to the out-of-memory condition, the server process is out of virtual memory. I suppose this is the 32-bit build and you experience some kind of memory leak. What FB version did you run before trying v2.5.1?

@firebird-automations
Copy link
Collaborator Author

Commented by: Andre van Zuydam (andrevanzuydam)

Log file with the errors before a crash of Firebird

@firebird-automations
Copy link
Collaborator Author

Modified by: Andre van Zuydam (andrevanzuydam)

Attachment: firebird.log [ 11918 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: Andre van Zuydam (andrevanzuydam)

description: We have a test system running on 2.5.1 snapshot, each week (plus minus 7 - 8 days) apart we have to manually restart / initialize the Firebird service which crashes out.

Looking into the logs points to a possible network which we have had a look at, unfortunatelly I get INET/inet_error: read errno = 10054 on localhost development machine every time I work.

One possible cause is perhaps the sweep that failed ? The five clients were connected but could not query the server after such an incident which also makes the problem strange.

The logs of the past week since the last crash and today are below.

=>

We have a test system running on 2.5.1 snapshot, each week (plus minus 7 - 8 days) apart we have to manually restart / initialize the Firebird service which crashes out.

Looking into the logs points to a possible network problem which we have had a look at, unfortunately I get INET/inet_error: read errno = 10054 on local host development machine every time I work.

One possible cause is perhaps the sweep that failed ? The five clients were connected but could not query the server after such an incident which also makes the problem strange.

The logs of the past week since the last crash and today are below.

environment: Windows 7 Professional => Windows 7 Professional, Firebird Snapshot 2.5.1.26208

@firebird-automations
Copy link
Collaborator Author

Commented by: Andre van Zuydam (andrevanzuydam)

We have now tested with Super Classic version of Firebird and the memory on the system is being released correctly! How do we debug memory leaks on Firebird Super Server ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Have you looked into the MON$MEMORY_USAGE table?

@firebird-automations
Copy link
Collaborator Author

Commented by: Andre van Zuydam (andrevanzuydam)

Ok, we've done some extensive testing and can duplicate the problem now at whim,

The problem happens when two client application connect to the Super Server concurrently. As each client is connected a small increase in memory occurs on the fbserver.exe in Task Manager. (Over a week this produces a crash)

I'm sure this will not happen in standard circumstances but we have a service which polls the database engine every 10 seconds and it is this service that is causing a build up of memory once another connection happens. If the service is running by itself then it would be happy indefinitely. If a local client to the service or a remote client connects, Firebird starts building up memory.

Assuming these clients stay connected for a long time or perform reconnections to the database what we find is that the memory is freed up but about 100 - 200K memory always stays occupied. Only once all the client applications have disconnected does the Firebird Server go back to its normal memory state (about 4600K on our system). As long as two of the clients remains connected the memory builds up, all clients must then disconnect and Firebird memory goes back to normal.

If only one client is connected the server is stable and memory does not increase. Does this sound like a shared memory problem ? Super Classic does not have any of these draw backs and behaves correctly.

What can I do to help debug this ?

@firebird-automations
Copy link
Collaborator Author

Modified by: Andre van Zuydam (andrevanzuydam)

summary: Server is Crashing after cannot start sweep thread (0) => Server is Crashing after Building up Memory when One or More Clients are Connected Concurrently

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Thanks for the information, hopefully it will help us to find this memory leak. I will report back if more input would be required from your side.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: Open [ 1 ] => In Progress [ 3 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

While I'm searching for the possible memory leak, could you please re-try SuperServer and monitor the OST (oldest snapshot transaction) counter with gstat -h -- whether it gets stuck or not.

@firebird-automations
Copy link
Collaborator Author

Commented by: Andre van Zuydam (andrevanzuydam)

Hi Dmitry, sorry for the delay in posting, here is a sample of the gstat -h

What exactly should I be looking for here?

Database header page information:
Flags 0
Checksum 12345
Generation 59776
Page size 4096
ODS version 11.2
Oldest transaction 57559
Oldest active 57592
Oldest snapshot 57592
Next transaction 57597
Bumped transaction 1
Sequence number 0
Next attachment ID 2177
Implementation ID 16
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Apr 12, 2011 15:25:57
Attributes force write

Variable header data:
    Sweep interval:         20000
    \*END\*

I'm getting a lot of INET/inet_error: read errno = 10054 in my logs which I do not think is network hardware related, after this happens the clients disconnect off the database and we have to restart the engine. Is this something that I can prevent or is this a bug ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Andre van Zuydam (andrevanzuydam)

Another log, super server version at another site, memory is building up at a regular pace, about 100KB per transaction, only occasionally seems to free up some, 2 days later Firebird is now using 245MB of RAM, 6 clients connected permanently 24 X 7.

Database "----.FDB"
Database header page information:
Flags 0
Checksum 12345
Generation 135053
Page size 4096
ODS version 11.2
Oldest transaction 78088
Oldest active 120709
Oldest snapshot 96910
Next transaction 121275
Bumped transaction 1
Sequence number 0
Next attachment ID 34393
Implementation ID 16
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date May 4, 2011 15:40:28
Attributes force write

Variable header data:
    Sweep interval:         20000
    \*END\*

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Transactions management is far from perfect.
Are your transactions performs many INSERT, UPDATE or DELETE operations ?
Do you see same memory consumption if you set GCPolicy = cooperative ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Andre van Zuydam (andrevanzuydam)

Hi Vlad

I've set the cooperative policy on, I suppose this is how Classic server runs? We do get more performance out of Super Server though and this is why we want to run this. We do perform many inserts while operating as our system is transactional in terms of how the data is stored, updates are very few, delete operations are limited to archiving of a single table to another database. We are using a stored proc to connect to the other database to send the data, could this be where the memory is leaking ?

Some other things we have tested is that a normal RAM cleaner app will bring the memory use down on the Firebird server, this is not ideal.

I am also open to poor programming on my side, how can I test if my transactions are really getting closed ? I definitely call close transaction after I do a query and statement, there is something that bothered me on Firebird 2.5, some of the transactions I opened reported a 501 error of attempting to close an already closed cursor which the same code / client did not report in 2.1, I changed my transaction closing method to use the DSQL_UNPREPARE from a DSQL_DROP or DSQL_CLOSE parameter which "seemed" to fix this problem.

These transactions which returned cursor errors were update or execute statements for stored procs which in most cases do not return results, I had similar problems with update insert statements with returning values, something definitely changed in the client after 2.1 which started this. Perhaps there is a simple explanation for these changes which will allow me to correct my code too ?

Thank you for your help so far.

@firebird-automations
Copy link
Collaborator Author

Commented by: Andre van Zuydam (andrevanzuydam)

The cooperative policy on Firebird Super Server does not seem to work as the memory is still building on Super Server. Classic server works perfectly I might add and is still stable.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> I've set the cooperative policy on,

Are you restarted Firebird after edit of firebird.conf ?

> I suppose this is how Classic server runs?

Not exactly. It disabled background garbage collection and corresponding in-memory structures. As you have stuck OST number, these in-memory structures are not cleaned up. Suggestion to switch to cooperative gcpolicy was given to confirm this idea.

> We are using a stored proc to connect to the other database to send the data, could this be where the memory is leaking ?

I doubt it

> Some other things we have tested is that a normal RAM cleaner app will bring the memory use down on the Firebird server, this is not ideal.

RAM cleaner on database server machine ? Is it joke ?

> I am also open to poor programming on my side, how can I test if my transactions are really getting closed ?

At the program side you can ensure that transaction handle becames zero.
Or you could inspect MON$ tables to see not released transactions and\or statements.
Also you could try Trace API and see all interesting events at server side.

> I definitely call close transaction after I do a query and statement, there is something that bothered me on Firebird 2.5, some of the transactions I opened reported a 501 error of attempting to close an already closed cursor which the same code / client did not report in 2.1, I changed my transaction closing method to use the DSQL_DROP which "seemed" to fix this problem.
>
> These transactions which returned cursor errors were update or execute statements for stored procs which in most cases do not return results, I had similar problems with update insert statements with returning values, something definitely changed in the client after 2.1 which started this. Perhaps there is a simple explanation for these changes which will allow me to correct my code too ?

Sure. In v2.5 it is not allowed to close cursor when you have no cursor :) This is exactly your case : nor UPDATE, nor EXECUTE PROCEDURE doesn't returns cursor. But this shouldn't affect transaction state, except of your code flow is not called commit (or rollback) after such error.

BTW, DSQL_DROP is NOT a "transaction closing method". This is option of *statement* close.

> The cooperative policy on Firebird Super Server does not seem to work as the memory is still building on Super Server.

Again, are you restarted Firebird after edit of firebird.conf ?

> Classic server works perfectly I might add and is still stable.

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

status: In Progress [ 3 ] => Open [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ] =>

@firebird-automations
Copy link
Collaborator Author

Commented by: Andre van Zuydam (andrevanzuydam)

Hi Vlad

Definitely restarting Firebird on each config change, memory still building, (it is very small 100K fore each instance that gets run). After a week the server will crash or not respond.

The RAM cleaner on the database server was not a joke, only a test to see if the memory was still being accessed, unfortunately the machines we deploy on are not dedicated servers, we may not have come across this problem on a dedicated server. I must add that I do not have this problem on a Linux server, so must be a windows thing ?

Thank you for your replies with regard to the coding, and again, always restarting Firebird when making conf changes. I have also tried running Super Server without Guardian just as a extra test, to no avail.

My next resort is to build a standalone exe to replicate the problem, I think this is something that will help troubleshoot this, so please wait for this as I need to simulate what is happening and then we can work from there ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Please test the next (tomorrow's) snapshot build. It will have CORE3533 fixed and it could be related to your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant