Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible memory leak ? INET/select_wait: select failed, errno = 10055 followed by SRVR_multi_thread/RECEIVE: error on main_port, shutting down [CORE3316] #3683

Open
firebird-automations opened this issue Jan 28, 2011 · 23 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: vander clock stephane (arkadia)

Is duplicated by CORE3439

Attachments:
fullcpu.jpg
firebird.log
firebird.log

hello,

i receive today this error in the log file :

SERVER12 Fri Jan 28 01:31:16 2011
INET/select_wait: select failed, errno = 10055

followed by

SERVER12 Fri Jan 28 01:31:16 2011
SRVR_multi_thread/RECEIVE: error on main_port, shutting down

and the server become to be "frozen". he not stop, he simply continue to accept new connection but never answer to them making all the client application look like "frozen"

INET/select_wait: select failed, errno = 10055 mean
An operation on a socket or pipe was not performed because the system lacked sufficient buffer space or because a queue was full.

this is generally because of a memory leak somewhere ?

yesterday the server crash also in the same way, but with nothing in the firebird.log (frozen, not answer any connection, but not refuse them)

but this time i have taken from the crash of yesterday and today the crash dump file that probably can explain in with loop the server was to not shuntdow or not answer to client

stephane

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

it's crash again just now... this time nothing in the firebird.log and all the processor go to 100% CPU utilisation (see attached picture). i was force to manually kill the process fb_inet_server.exe. i take the memory dump (4 GO in size) before to kill the process

@firebird-automations
Copy link
Collaborator Author

Modified by: vander clock stephane (arkadia)

Attachment: fullcpu.jpg [ 11893 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

ok, i can now reproduce the bug, il look like than simply doing :
select count(*) from pictures where name like '3#⁠%';
(pictures have millions of rows) do this !

at the beginning, for the first 10 min, only one CPU is at 100% (that look normal), but after 10 min, the cpu go down for few second, and imediatly after all the 8 CPU go up to 100 % like on the picture !
no other connection are possible on the server :(

i will download the database on local do to some test, but it's a huge database (+100Go)

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

What is the "http://pictures.name"'s data type?

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

VARCHAR(50) CHARACTER SET ISO8859_1 COLLATE ISO8859_1
their is no index on it, so the select count(*) from pictures where name like '3#⁠%' must not use any index (so it's can not be a probleme of index corruption)

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

ok, the validation of the database just finish (it's take 24hours)

the result are :

Number of record level errors: 1
Number of index page errors: 501
Number of database page errors: 80

now why so much error in the database it's another story ...

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Stephane,

i'm sure most of errors is not significant and caused by killing of firebird process.
Send to me firebird.log and i'll comment it for you.

Also, attach here crash dump you have.

About 10055 errors - probalby this will help : http://support.microsoft.com/default.aspx?scid=kb;EN-US;196271

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

Dear Vlad,

> i'm sure most of errors is not significant and caused by killing of firebird process.
> Send to me firebird.log and i'll comment it for you.

yes that possible... the Firebird.log contain only

SERVER12 Fri Jan 28 01:31:16 2011
INET/select_wait: select failed, errno = 10055

followed by

SERVER12 Fri Jan 28 01:31:16 2011
SRVR_multi_thread/RECEIVE: error on main_port, shutting down

about the crash dump, it's 100 mb, and as it can contain sensible data i rather prefere to send it to you in private ... is it possible ?

stephane

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I mean firebird.log located at the computer where you run validation. It contains descritions fro every (1+501+80) errors detected.

As for crash dump - could you send to me download link (privatly, of course) ?

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

dear vlad,

for the firebird.log, i was mistaken because i do the gfix on a temp machine and forget to take the firebird.log at the end of the process. i still have the temp machine but in other office and when i will go again i will take it.

but anyway i don't thing that it's was because of a database corruption, because today, on a fresh backuped/restored database we encountered the exact same mistake

INET/select_wait: select failed, errno = 10055

followed by

SRVR_multi_thread/RECEIVE: error on main_port, shutting down

and the server become to be "frozen".

i read carrefully the http://support.microsoft.com/default.aspx?scid=kb;EN-US;196271 about the 10055 but i stay doubtful because they say the default maximum number of ephemeral TCP ports is 5000. so it's mean that i use all the 5000 TCP connections ? how it's possible i have max 100 clients connected to firebird at that time, and it's a dedicated Firebird Server ! and as i know Firebird use ephemeral TCP port only for the event ?

thanks by advance
stéphane

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

the Firebird.log file when i do the gfix.

but as i receive also yesterday the 10055 error after a fresh backup/restore i m sure that the "corruption" of the database was not connected to it ...

also as i still use the official release without the BugFix on the Event probleme, is it possible that it's connected to it ? i don't know some loop inside the FB_inet_server.exe that try to open all TCP_port ? because we use lot of event connection (one by client)

thanks by advance
stéphane

@firebird-automations
Copy link
Collaborator Author

Modified by: vander clock stephane (arkadia)

Attachment: firebird.log [ 11904 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

Today the sweep of the database using gfix never return :( more than 12 hours it's run (yesterday it's take only 2 hours)... i become to desesperate :(

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

> i read carrefully the http://support.microsoft.com/default.aspx?scid=kb;EN-US;196271 about the 10055 but i stay doubtful because
> they say the default maximum number of ephemeral TCP ports is 5000. so it's mean that i use all the 5000 TCP connections ?
> how it's possible i have max 100 clients connected to firebird at that time, and it's a dedicated Firebird Server ! and as i know Firebird
> use ephemeral TCP port only for the event ?

Every TCP connections have 2 endpoints. The ports, we talking about, is from "remote" endpoint where "local" endpoint is <server_ip>:<server_port> (localhost:gds_db, for example)

Port number is assigned by OS and don't indicate count of connections. More, sockets after disconnection are not reused by OS immediately therefore new sockets allocated for new connections gets increased port number. If there is a lot of short-lived connections it is easy to obtain "high" port numers for new connections.

Could you look how many TCP connections in system exists at time when you have problem ? Either using netstat -p tcp -n or TcpView by SysInternals

> also as i still use the official release without the BugFix on the Event probleme, is it possible that it's connected to it ? i don't know some
> loop inside the FB_inet_server.exe that try to open all TCP_port ? because we use lot of event connection (one by client)

It is possible while i don't remember issues with 1055 error. Anyway, few bugs related with events was fixed (for ex. CORE3119 and CORE3170), so using current snapshot of 2.5.1 is highly desirable.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Searching tracker i found one issue re. error 10055, this is CORE1791.
In that case the reason for error was antivirus NOD32 which makes impossible to call "select" (winsock function) on more than 64 sockets.
So, make sure you have no antivirus or firewall's installed at your server host.
Note, even disabling NOD32 is not enough, only full uninstall.

Hope this helps.

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

actually (but i don't have any problem) netstat -p tcp -n show me this

TCP 61.213.12.116:3389 95.214.25.89:3472 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:49289 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:50434 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:50593 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:50594 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:50595 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:50770 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:50774 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:51162 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:51209 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:52608 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:52683 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:52822 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53104 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53106 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53108 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53110 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53355 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53734 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53735 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53736 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53737 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53738 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53872 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:53881 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:54705 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:54706 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:54707 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:54708 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:54709 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:54710 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:54711 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:54955 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:55090 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:56571 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:57162 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:57538 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:59206 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:59207 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:60513 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:61875 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:63813 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:63814 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:64150 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:64257 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:64645 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.114:64788 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:54269 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:56409 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:58752 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:58867 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:61701 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:63073 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:63128 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:63137 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:63199 ESTABLISHED
TCP 61.213.12.120:3050 61.213.12.118:63384 ESTABLISHED
TCP 61.213.12.120:49626 61.213.12.114:53105 ESTABLISHED
TCP 61.213.12.120:49627 61.213.12.114:53107 ESTABLISHED
TCP 61.213.12.120:49628 61.213.12.114:53109 ESTABLISHED
TCP 61.213.12.120:49629 61.213.12.114:53111 ESTABLISHED
TCP 61.213.12.120:49662 61.213.12.118:56410 ESTABLISHED
TCP 61.213.12.120:49693 61.213.12.118:63074 ESTABLISHED
TCP 61.213.12.120:49695 61.213.12.118:63138 ESTABLISHED
TCP 61.213.12.120:49707 61.213.12.118:61702 ESTABLISHED

not look like too much ?
but to do it at the time i have some probleme it's a little more harder, because most of the time it's happen in the middle of the night and i m not in front of the server at this time :(
probably i can do a script that will do it in a loop and store the result in a log file ... i will try it

anyway this night i will deploy also the current snapshot !

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

> Searching tracker i found one issue re. error 10055, this is CORE1791.
> In that case the reason for error was antivirus NOD32 which makes impossible to call "select" (winsock function) on more than 64 sockets.
> So, make sure you have no antivirus or firewall's installed at your server host.
> Note, even disabling NOD32 is not enough, only full uninstall.

of course no antivirus (dedicated firebird server win 2008 R2) and in the windows firewall the exception on the fb_inet_server.exe process ...

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

I counted 56 connections at 61.213.12.120:3050
Could you ensure that more than 63 TCP connections is possible at your server ?

@firebird-automations
Copy link
Collaborator Author

Commented by: vander clock stephane (arkadia)

61.213.12.120 => the IP of our Firebird Server
61.213.12.114 => the IP of our web frontal server

on 61.213.12.120, it's a default installation of windows 2008 R2 64 bit Standard, so i thing by default it's possible to have much more than 63 TCP ?
on 61.213.12.114 (same server and installation as the 61.213.12.120) when i do netstat -p tcp -n their is thousands of TCP (it's a web server)

how can i check that i can have more than 63 TCP on 61.213.12.120 ? but i m sure yes ...

thanks by advance
stephane

@firebird-automations
Copy link
Collaborator Author

Modified by: @dyemanov

Link: This issue is duplicated by CORE3439 [ CORE3439 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Roman Vanicek (roman)

I confirm this (or very similar) bug.

My setup is: Web server (Linux+Apache+PHP) and database server (Windows 2000 server SP4 + Firebird 2.5 SuperServer).

We have been using Firebird 1.5 and 2.0 SuperServer in the same setup and this error did not appear in the logs. Now the server runs fine for about two weeks and then 'freezes'. The client gets the message "Unable to complete network request", in the server log appear the pair of messages:

INET/select_wait: select failed, errno = 10013
and
SRVR_multi_thread/RECEIVE: error on main_port, shutting down

But the server does not shut down and stays in the frozen state when it does not recieve connections and does not shut down.

Restarting the Firebird service helps and it runs immediately again very fine.

I have tested to open 100 connection at the same time and do some select on every one and this works fine.

I will attach my log file shortly.

@firebird-automations
Copy link
Collaborator Author

Modified by: Roman Vanicek (roman)

Attachment: firebird.log [ 11951 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Nikolay Ponomarenko (pnv82)

Seems we have the same issue
Win 2008 R2 SP1 64bit
FB 2.1.0.17798
pooled Zeos connector

When i stop freesing server, there is additional info in log about 63 connection limit:
GROUND02 (Server) Mon Jun 04 12:19:35 2012
INET/select_wait: select failed, errno = 10055
GROUND02 (Server) Mon Jun 04 12:19:35 2012
SRVR_multi_thread/RECEIVE: error on main_port, shutting down

GROUND02 (Server) Mon Jun 04 12:21:47 2012
Shutting down the Firebird service with 63 active connection(s) to 1 database(s)
GROUND02 (Server) Mon Jun 04 12:21:47 2012
The database C:\MORFIK\DB\DATA_SERVICE.FDB was being accessed when the server was shutdown

Server computer has no any antivirus or firewall and is used as web-server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant