Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto sweeper kills its own connection [CORE4745] #5050

Closed
firebird-automations opened this issue Apr 10, 2015 · 17 comments
Closed

Auto sweeper kills its own connection [CORE4745] #5050

firebird-automations opened this issue Apr 10, 2015 · 17 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Omacht András (aomacht)

On different databases at different time auto sweeper dies and restarts three or four times and finally complete the job.

Log:
xxx Thu Apr 9 15:51:31 2015
Sweep is started by SWEEPER
Database "/u1/firebird/db/yyy.fdb"
OIT 229037, OAT 229038, OST 229038, Next 263970

xxx Thu Apr 9 15:51:53 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

xxx Thu Apr 9 15:51:54 2015
Error during sweep:
connection shutdown

xxx Thu Apr 9 15:52:55 2015
Sweep is started by SWEEPER
Database "/u1/firebird/db/yyy.fdb"
OIT 232570, OAT 253894, OST 253894, Next 263979

xxx Thu Apr 9 15:52:55 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

xxx Thu Apr 9 15:52:55 2015
Error during sweep:
connection shutdown

xxx Thu Apr 9 15:53:16 2015
Sweep is started by SWEEPER
Database "/u1/firebird/db/yyy.fdb"
OIT 232570, OAT 253894, OST 253894, Next 263985

xxx Thu Apr 9 15:53:16 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

xxx Thu Apr 9 15:53:16 2015
Error during sweep:
connection shutdown

xxx Thu Apr 9 15:53:23 2015
Sweep is started by SWEEPER
Database "/u1/firebird/db/yyy.fdb"
OIT 232570, OAT 253894, OST 253894, Next 263989

xxx Thu Apr 9 15:57:44 2015
Sweep is finished
Database "/u1/firebird/db/yyy.fdb"
OIT 253893, OAT 253894, OST 253894, Next 264042

After backup/restore randomly the same happen. Backup without -g works fine.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Sweep can't shutdown own attachment. It could stop because of error, but we see no error message in firebird.log.
What we see looks like someone else kill's sweep attachment (using monitoring, for example)

@firebird-automations
Copy link
Collaborator Author

Commented by: Omacht András (aomacht)

Vlad, 100% percent the users are unable to kill the process with monitoring tables, because they have no rights to do it. And the administrator (my collegues and me) didn't do that too. Anyway you are right, there are no other messages in the log file.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Could it be the Linux kernel who kills "too heavy" process ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Omacht András (aomacht)

No, this is a very unused server at least 16 cores and 32 gb ram free (used for disk cache) all the time. This database size is ~1gb.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Was server started as classic or superclassic?

@firebird-automations
Copy link
Collaborator Author

Commented by: Omacht András (aomacht)

Alexander, as classic

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

In that case the reason is almost obvious. After backup without garbage collection firebird detects that sweep pass is needed for database and starts it as a separate thread in one of the processes. But classic process gets terminated when the user detachs from database. At this moment we see a record in the log:

xxx Thu Apr 9 15:53:16 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

This connection is sweep. And cause engine brings on terminate all connections to shutdown mode next what we see is:

xxx Thu Apr 9 15:53:16 2015
Error during sweep:
connection shutdown

This is not dangerous, but definitely annoying.

I see the following possible ways to fix an issue:
- let classic server process continue running sweep after user disconnect,
- or from the most beginning start sweep as separate process on classic.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Alex,

> I see the following possible ways to fix an issue:
> - let classic server process continue running sweep after user disconnect,

This is how it should work and i believe i've tested it before 2.5.0 release - when made auto-sweep run non-syncronous in Classic...

- or from the most beginning start sweep as separate process on classic.

It can't help, as user attachment which triggered auto-sweep still act as a client and its termination will terminate sweep process too.

@firebird-automations
Copy link
Collaborator Author

Commented by: Omacht András (aomacht)

Alex, I can imagine this happend at the first time:

Starts: Thu Apr 9 15:51:31 2015
Killed: Thu Apr 9 15:51:53 2015,

but the 2nd and 3rd time between start and kill time is 0.

Thu Apr 9 15:52:55 2015 vs. Thu Apr 9 15:52:55 2015
Thu Apr 9 15:53:16 2015 vs. Thu Apr 9 15:53:16 2015

We made backup WITHOUT -g options (I know this is slower), so the gargabe collector is running, and the log shows everything is ok during backup time.

@firebird-automations
Copy link
Collaborator Author

Commented by: Attila Molnár (e_pluribus_unum)

Hi!

Same error in log here. I've started a long running process at 14 Apr 2015 07:34:38 +0200, which maintains the database schema and data by running a lot of DDL, DQL, DML commands.

Error in client side : "Database is probably already opened by another engine instance in another Windows session." at 14 Apr 2015 07:37:53 +0200

No concurrency in client side, calls are made from apps main thread.
Looks like sweep concurrency handling is flawed.

Logs in server side (2.5.4 Classic Server)

L3S-4 Tue Apr 14 07:35:45 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 227897, OST 227897, Next 227898

L3S-4 Tue Apr 14 07:36:03 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

L3S-4 Tue Apr 14 07:36:03 2015
Error during sweep:
connection shutdown

L3S-4 Tue Apr 14 07:36:03 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 228222, OST 228222, Next 228227

L3S-4 Tue Apr 14 07:36:59 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

L3S-4 Tue Apr 14 07:36:59 2015
Error during sweep:
connection shutdown

L3S-4 Tue Apr 14 07:36:59 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 228232, OST 228232, Next 229499

L3S-4 Tue Apr 14 07:37:01 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

L3S-4 Tue Apr 14 07:37:01 2015
Error during sweep:
connection shutdown

L3S-4 Tue Apr 14 07:37:01 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 229502, OST 229502, Next 229504

L3S-4 Tue Apr 14 07:37:19 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

L3S-4 Tue Apr 14 07:37:19 2015
Error during sweep:
connection shutdown

L3S-4 Tue Apr 14 07:37:19 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 229508, OST 229508, Next 229511

L3S-4 Tue Apr 14 07:37:20 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

L3S-4 Tue Apr 14 07:37:20 2015
Error during sweep:
connection shutdown

L3S-4 Tue Apr 14 07:37:20 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 229537, OST 229537, Next 229539

L3S-4 Tue Apr 14 07:37:21 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

L3S-4 Tue Apr 14 07:37:21 2015
Error during sweep:
connection shutdown

L3S-4 Tue Apr 14 07:37:21 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 229671, OST 229671, Next 229673

L3S-4 Tue Apr 14 07:37:25 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

L3S-4 Tue Apr 14 07:37:25 2015
Error during sweep:
connection shutdown

L3S-4 Tue Apr 14 07:37:25 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 229990, OST 229990, Next 229994

L3S-4 Tue Apr 14 07:37:53 2015
Shutting down the server with 1 active connection(s) to 1 database(s), 0 active service(s)

L3S-4 Tue Apr 14 07:37:53 2015
Error during sweep:
connection shutdown

@firebird-automations
Copy link
Collaborator Author

Commented by: Attila Molnár (e_pluribus_unum)

Sweep starts when I connect to the DB. Looks like "Shutting down" messages came, when sweep is not finished and I disconnect from the DB.
In my case sweep was needed nearly 3 min to finish. But the process does connect/disconnect many times during the process.

L3S-4 Tue Apr 14 08:09:33 2015
Sweep is started by SWEEPER
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 207896, OAT 230094, OST 230094, Next 230096

L3S-4 Tue Apr 14 08:12:14 2015
Sweep is finished
Database "D:\DATABASES\AMOLNAR_ELESKER.GDB3S"
OIT 230095, OAT 230104, OST 230104, Next 230105

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

assignee: Vlad Khorsun [ hvlad ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Omacht András (aomacht)

Vlad, I can accept Alex's explanation. We have some maintainer process running ~0 secs (connect and disconnect in less than a sec), these processes can cause the problems. By log it seems if a longer connection starts the GC it can do its job well.
Anyway there is no error on client side, and the datasbase works well too.
So, please close this ticket.

(Also Attila's ticket is more interesting where we had error on client's side. CORE4751)

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

@vlad and Alex,

It seems wrong that the engine would trigger anything on Classic as the result of a backup process, since once the backup is complete the connection will be closed.

Is there anyway to suppress the sweep in that case?

(IMO the current error messages need to be trapped -- as they suggest a significant operational issue)

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> Vlad, I can accept Alex's explanation.
I know that it is correct ;)

> Anyway there is no error on client side, and the datasbase works well too.
This is also known

> So, please close this ticket.
I need to investigate it a bit more (and CORE4751 too) and then will decide.
Anyway, thanks for the feedback

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Sean,

> It seems wrong that the engine would trigger anything on Classic as the result of a backup process, since once the backup is complete the connection will be closed.
Backup is the same attachment as any other attachment. I see nothing wrong that it works as any other attachment.

> Is there anyway to suppress the sweep in that case?
Use -g switch. Attachments with no_garbage_collect flag does not run garbage collection and does not trigger sweep.

> (IMO the current error messages need to be trapped -- as they suggest a significant operational issue)
What significant operational issue do you see here ? Note, this behavior is not new, it was not visible before we started to log sweep runs.

@omachtandras
Copy link

We haven’t experienced this error in the last few years, so the issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants