|
[
Permalink
| « Hide
]
Dmitry Yemanov added a comment - 15/Mar/11 04:59 PM
This is the second report of such a problem already. It seems we have a regression.
What v2.5.1 build have you used priorly, that didn't have this issue?
Hard to say, in the 2.5 original i also have this kind of issue (but much much more often, but not sure it's the same). I try too little time the beta release between to 2.5 original and the 2.5 version 2630 to say with don't have this issue.
what i can say is that this issue in the 2630 don't appear so often as in the 2.5 original (but on the other way the FB 2.5 original crash close to every days and the 26230 never crash so hard to know) i don't know, if you want i can try to attach a debuger on the fb_inet_server.exe to see what's happen ? I forget, just say me if i can kill the "frozen" clients or if you need them to do some tests ...
the same error again ....
today the behavior is little different one client was running several day with 2 connections on the FB server 1 waiting some EVENT; 1 doing some queries; yesterday the client become frozen. when i look in the $mon table in the server side i see only the EVENT connection. so it's mean that the client was disconnected (but why? as the event is still connected it's not look like a network problem) but the client not detect it and continue to wait undefinitivelly the answer of the server this bug become very important because it's block all the time our application :( do you think i can take a newer version than the FB 26230 ? today a little different,
i have on the server Thu Mar 24 11:07:35 2011 INET/inet_error: read errno = 10054 and on the client Unable to complete network request to host "xxx". Error reading data from the connection. Unsuccessful execution caused by a system error that precludes successful execution of subsequent statements Error Code: 401 BUT i have several other application connecting to the FB server on the same client machine that are still working correctly ! this error (10054 = A connection was forcibly closed by a peer.) look like morea bug than a network problem What the status of this bug ? it's appear in average one time a week but it's very strong because when it's happen it's "froze" the client Application :(
more info (probably it's repeat but can not be bad in case i forget something) : We have one client application that run at full time in background. This client open 2 connections on the database, one for query and one for select/update Every second this client application do a query on the database (select + insert/update). so the connection is never "idle" more than 1 second. but after days of work (1, 2, 7, etc..) then the connection freeze waiting the answer of the server that never arrive. on the server side (when it's freeze), i can only see (in the $MON table) the connection of the EVENT, but not the second connection that do Select/update. The $Mon table say that the connection of the event it's open from the 3/29/2011 in the log of the client application i see that the second connection is frozen from the 04-04-11 14:43:19 in the Server side, in the firebird.log i can see this : FBSERVER Mon Apr 04 14:43:48 2011 INET/inet_error: read errno = 10054 any idea ? did need to update the current version of the FBServer and FB client DLL ? actually client = legacy 2.5 and server = FB 26230 Different solutions are being tested on the customer site who has it more or less reproducible. I will report here as soon as we have something concrete found.
thanks dmitry, as soon you have something, you can come also to me to test it ...
The fix is committed (into v2.5 source tree), please try the next snapshot build
thanks Vlad !
but in the meantime Dmitry send me 3 differents versions to test. I actually test them now, and the 1rt version still have the bug, i m actually testing the second... did i meed to stop my test and use instead the next snapshot build ? stephane The issue should be resolved in the snapshot build, so it's worth trying it now. If it still locks up, you can return to my builds :-)
bad new, unfortunatly it's seam to not be resolved, i just have now a client that is waiting for already 2 hours the answer of the server and by this way enter in a "frozen" state :( on the server side i can see one EVENT connection with the client, but no any other connection (i look in the $Mon table).
on the server i just have something like this at the time the client start to wait the server answer : Mon Apr 25 17:04:25 2011 INET/inet_error: read errno = 10054 FB version 26263 ... any idea ? i confirm, now 24 hours passed the the client is still frozen :(
What do you mean by the "EVENT connection"? Do you set up a separate connection to wait for events and this connection is not used for any client queries?
at the begining, when the client is launch 2 connections are open to the server :
1 Event connection to listen to any database server event (mostly not used, not too much event are raised) ; 1 Query connection to select, update, etc... ; here we see that the query connection is lost (not present anymore in the $Mon table) BUT the Event connection is still show in the $mon table and still aleave ! this mostly show that the probleme look like not a "hardware network" probleme. it's not really hard to reproduce this bug (on our database), it's appear after 1/2 days of running if you want i can make a dump of the client to know what he is waiting for ? I'm not sure the dump would tell us something useful, but if it's possible for you then please proceed and send it to me by email (or make available for download and send me the link).
BTW, have you looked what netstat reports on both endpoints of the connection? Are the network connections alive or not? I make a little mistake, when i try to launch the debuger to make the dump i close the client by mistake :(
doesn't matter, i run it now again and wait for the next mistake and i will see the netstat at this time Hello everyone,
Now i have the confirmation that this bug is still present :( FB version 26263 One of my client application if waiting the serverr answer for now 2 days. this time i setup more log in my client application and i can say that * The client open 2 connections, one for event and one for queries * the client work now for several days (between 1 and 7) without any problem * and The client Start a new transaction (OK) * The client call just after DSQLExecuteImmediate <= here the problem is, server never answer, client wait for unlimited time the exact SQL Query was [05-07-11 18:47:42][ThreadId: 2136] Insert Into transac_comment (id_transac, Comment_Type, Language, Comment) Values ('VBOQ-T1885',3,'FRA','Ce bel appartement de 2 chambres se trouve sur la première Étage d''un immeuble bien entretenu avec un total de 3 unités d''habitation et la maison idéale pour une seule nouvelle ou une belle paire de tous les âges. <br /><br /> L''appartement ensoleillé est réparti sur 65 mètres carrés et fonctionne grâce à la nouvelle de nombreux ...') NB: this SQL is valid, just to show to you in case on the server side i can see in the $Mon table the Event connection, but not the Queries connection doing netstat on the client computer show this : Active Connections Proto Local Address Foreign Address State PID [WinRETranslator.exe] TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 [WinRETranslator.exe] TCP 94.210.76.118:49166 94.210.76.120:49261 ESTABLISHED 2132 [WinRETranslator.exe] TCP 94.210.76.118:49167 94.210.76.120:3050 ESTABLISHED 2132 [WinRETranslator.exe] TCP 94.210.76.118:50791 94.210.76.120:3050 ESTABLISHED 3588 NB: i remove here the connection not related to WinRETranslator.exe (pur client application) in this report doint netstat on the server , i can found only thes row connected to our client application: Active Connections Proto Local Address Foreign Address State TCP 94.210.76.120:49261 94.210.76.118:49166 ESTABLISHED TCP 94.210.76.120:3050 94.210.76.118:49167 ESTABLISHED TCP 94.210.76.120:3050 94.210.76.118:50791 ESTABLISHED so i can not found on the server the line for TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 thanks for you help ! sorry i make a litte error in pasting the netstat result
doing netstat on the client computer show this : Active Connections Proto Local Address Foreign Address State PID TCP 94.210.76.118:49164 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49166 94.210.76.120:49261 ESTABLISHED 2132 TCP 94.210.76.118:49167 94.210.76.120:3050 ESTABLISHED 2132 NB: i remove here the connection not related to WinRETranslator.exe (pur client application) in this report doint netstat on the server , i can found these row connected to our client application: Active Connections Proto Local Address Foreign Address State TCP 94.210.76.120:3050 94.210.76.118:49164 ESTABLISHED TCP 94.210.76.120:49261 94.210.76.118:49166 ESTABLISHED TCP 94.210.76.120:3050 94.210.76.118:49167 ESTABLISHED so i can not found on the server the line for TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 stephane Stephane, please explain :
- you said "The client open 2 connections, one for event and one for queries". It means client should have 3 sockets opened : 2 with remote port 3050 for database connection and one with different port - for events processing - but you show 4 sokets opened by client : TCP 94.210.76.118:49164 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49166 94.210.76.120:49261 ESTABLISHED 2132 TCP 94.210.76.118:49167 94.210.76.120:3050 ESTABLISHED 2132 and 3 of them is for regular database connection. hmm, i look the code and it's possible that more than 2 connections are open for the queries. in fact another connection can be open for "log" that simply store in a table all the queries executed by the first connection.
ok, 5 days passed and the client is still waiting the server answer :(
net stat still show this in the client : Proto Local Address Foreign Address State PID TCP 94.210.76.118:49164 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49166 94.210.76.120:49261 ESTABLISHED 2132 TCP 94.210.76.118:49167 94.210.76.120:3050 ESTABLISHED 2132 and in the server still the same Proto Local Address Foreign Address State TCP 94.210.76.120:3050 94.210.76.118:49164 ESTABLISHED TCP 94.210.76.120:49261 94.210.76.118:49166 ESTABLISHED TCP 94.210.76.120:3050 94.210.76.118:49167 ESTABLISHED so it's really strange that the client show TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 as established but on the server side their is nothing about it :( i was waiting few days hopping a time out or something like this will close the connection but no :( what about the memory dump for the client app? Also, does number of fb_inet_server processes correspond to select count(*) from mon$attachments?
dammed ... i just close today the client app before reading your message :(
but i thing i have another client application that is in the same situation, i will try to see if i can do a memory dump of it > Also, does number of fb_inet_server processes correspond to select count(*) from mon$attachments? i m in Firebird super classic, how to see that ? Ah, sorry, I thought you were using Classic.
ok, i have another client application that is frozen for 8 days too in the same situation.
so i just create a dump, and put it in the same server i send you the last time in private message. if you don't find it, i can send the login password again to you.. thanks ! Actually i m testing the new version of FBClient, and it's look fine for now (but need one or two week to know for sure)
but i just want to say also that 4 days ago, i was force to hardly kill some clients (because of server overload). i just today see that their connection is still present in the $MON table but the client is dead for several days now ! Some of this connection have even an active status... in the $MON table i can see 221 connections, and doing NetStat i can see around 286 established connections So just to say that the trouble seam to not be only in the fbclient.dll side but also in the server side ... The new version of FBclient that dmitry send me look like it's correct the bug !
from the time i install them (more than 2 weeks ago now) i don't have any more any frozen client ! congratulation dmitry and many thanks ! i thing we can close this bug as resolved now ! also when the modification will be commited in the daily snapshot ? thanks again stéphane Thanks for the good news. The fix will be committed today and thus will be available in the tomorrow's snapshot build.
While fixing
one thing, is the Gbak use the Fbclient.dll or it's own connection protocol ? because i also have one backup that is frozen for several days ... but i guess that gbak not use fbclient.dll but that the bug in gbak is simillar to the bug in the fbclient.dll ?
GBAK is a regular client application, i.e. it uses fbclient.dll. Perhaps this is a different issue.
sorry for late answer, was in travel..
hmm, that not sound good because it's pretty like the same issue :( on the client where gtat run net stat show : TCP 94.210.76.118:53154 94.210.76.120:3050 ESTABLISHED [gbak.exe] on the server nada :( i can not find any connection on the port 53154 :( in the $Mon table also, nothing about the gbak the gbak is frozen from the 24 may ... when it froze it's was exactly at this state : gbak:25900000 records written gbak:25920000 records written i just make a dump of the gbak and put it in the ftp in case you need... thanks ! another remark:
on the firebird server i can see 4 EVENT connections alive that was open by client that are gone serveral week ago ! i see theses connections in the $Mon table, i don't know how to recognise them in the net stat table These connections was not open with client that have the new FBClient.dll (don't know if matter, because here it's on the server side) the server version is still the 26230 hello,
I have now often the GBAK that freeze (one time a week). but except the GBAK i have much much (much) lower connection frozen bug than before !! do you thing that using the fb release canditate 1 can resolve this issue ? i still use the the server version is 2.5.1.26263 gbak is still connected now (from several days) if you want i can also do a memory dump (or something else?) thanks ! It's worth trying v2.5.1 RC1 regardless. A memory dump (ideally, for both frozen GBAK and the server it's supposedly connected to) is appreciated, as well as the network stats (also, for both server and client) for the moment.
ok, in more details :
on the Gbak computer the netstat show : TCP 93.122.12.118:52939 93.122.12.120:3050 ESTABLISHED [GBAK.EXE] on the firebird server computer ... nothing ! the gbak client is not in the $MON table nor in the netstat ! so it's simply look like that GBAK don't detect the deconnection from the server and stay connected forever :( i put the copy of the Gbak Dump file n the ftp server with i send you the credential by email the last time ... thanks ! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||