New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"AV - The code attempted to access a virtual address without privilege to do so" using aggregate distinct [CORE2900] #3284
Comments
Commented by: @AlexPeshkoff Is 98 windows or linux error code? |
Commented by: @hvlad Alex, i believe this is Linux error code |
Commented by: @AlexPeshkoff Addr in use - this is caused by handed superserver together with funny linux feature - it keeps socket opened for some time after end of process which did not close it explicitly. Please set BugcheckAbort=1, this will sooner of all let you get core file for the crash. |
Commented by: Veselin Pavlov (pavlov_v) these are the server logs: Windows NEXT (Client) Mon Feb 22 15:48:19 2010 Linux xxxxxx (Client) Tue Mar 2 18:07:57 2010 xxxxxx (Client) Tue Mar 2 18:07:57 2010 xxxxxx (Server) Tue Mar 2 20:35:13 2010 Let me explain why we are using 2.1.4 snapshot and not 2.1.3 - with 2.1.3 and previous we experience a major memory leak. After 20 hours of work the server crashes because of huge used memory. After our investigation we found that the reason are the temporary (or not exactly) blobs generated by list() operator - we have a lot of them. We have a lot of substrings also. With 2.1.4 the memory leak is gone and for a couple of weeks everything worked fine. Suddenly the error 'Can not read data from the connection' began occurring. thank you. |
Commented by: @hvlad Follow instructions below for windows for linux and provide us with crash dump or coredump to analyze, please. |
Commented by: Veselin Pavlov (pavlov_v) I am attaching the core and crash dump from drWatson |
Modified by: Veselin Pavlov (pavlov_v)Attachment: drWatson.zip [ 11593 ] |
Commented by: @hvlad Veselin, The call stack is :
> fbserver.exe!compute_agg_distinct(Jrd::thread_db * tdbb=0x00000000, Jrd::jrd_nod * node=0x08d3e1d8) Line 3072 C++ the AV is at CVT_get_int64() at line
You have some stored procedure which computed LIST(DISTINCT ...) Probably you can tell us something about it. BTW, i see custum UDF library FAUfile.DLL. Are you sure its 100% correct ? Is it used in expression mentioned above ? |
Commented by: Veselin Pavlov (pavlov_v) It took more time for the problem to occur. I couldn't reproduce it effectively. |
Modified by: Veselin Pavlov (pavlov_v)Attachment: drWatson2.zip [ 11595 ] |
Commented by: Veselin Pavlov (pavlov_v) ... and one more |
Modified by: Veselin Pavlov (pavlov_v)Attachment: drWatson3.zip [ 11596 ] |
Commented by: @hvlad I've asked for a FULL dump. I can't said something new with your second dump :( |
Commented by: Veselin Pavlov (pavlov_v) Sorry, we copied the bin directory from the pdb version of Firebird over the standart one, and I thoght that this is enough. in our database there are a lot of 'LIST(DISTINCT )' - in the stored procedures and dynamic SQLs. We have 2 custom UDFs, but they are totally out of scope. There is a general setting which deactivates them, and I guarantee that they are not invoked at all. |
Commented by: @hvlad Run drwtsn32 without parameters When i said about "numeric expression" i meant any expression (even just field reference) with numeric type (integer, numeric, decimal, double precision, etc) Could you try to run such "suspect" queries in hope to catch the AV ? Of course using the copy of the database at non-production server. |
Commented by: Veselin Pavlov (pavlov_v) Hope this log files will help. (We are suspecting that the problem is affecting FB 2.1.3 also. We have errors log but not crash dump from 2.1.3.) Log file is huge, the archive is larger than 10 mb. |
Modified by: Veselin Pavlov (pavlov_v)Attachment: link.txt [ 11597 ] |
Commented by: @hvlad Thanks, its much better now. The failed query is select case
where 0=0 and (((T.ID_TASK_CLASS in (1,3)))) and the procedure which was failed is TASK_GET_INFO Could you try to reproduce crash with this query ? |
Modified by: Veselin Pavlov (pavlov_v)Attachment: TASK_GET_INFO.sp [ 11598 ] |
Commented by: Veselin Pavlov (pavlov_v) Reproducing the crash is a very hard task. Sometimes our clients work for a whole week before the crash happen, sometimes it happens a couple of times a day. 1. Set i=1000, execution crashed around 600 because of exhausted memory. I reproduced the crash 3 times. And before or during all of them the database runs other 'general' queries. I will continue testing ... some hints will be helpfull |
Modified by: Veselin Pavlov (pavlov_v)Version: 2.1.3 [ 10302 ] summary: Can not read data from the connection. In the server log there is "INET/inet_error: bind errno = 98" => AV - The code attempted to access a virtual address without privilege to do so. |
Commented by: @hvlad Veselin, do you have any progress with testing ? |
Modified by: Veselin Pavlov (pavlov_v)Version: 2.5 RC2 [ 10372 ] |
Commented by: Veselin Pavlov (pavlov_v) Not really, Two of our clients suffer a lot from this problem. We can simulate the problem but not very predictable. I can produce more crash dump files if this will help. Here is a part of the firebird.log from 2.5 showing the AV: NEXT (Server) Wed May 12 10:18:41 2010 NEXT (Server) Wed May 12 10:18:41 2010 NEXT (Server) Wed May 12 10:18:41 2010 NEXT (Server) Wed May 12 10:18:41 2010 NEXT (Server) Wed May 12 10:18:41 2010 NEXT (Client) Wed May 12 10:18:42 2010 NEXT (Client) Wed May 12 10:18:43 2010 |
Commented by: Neil Pickles (npickles) I am seeing something very similar to this. We have a client running 3 or 4 Windows XP PC's in a network, with Firebird SS v2.1.3.18185 on the server using the Guardian. All machines at all sites are running the same application software, in this case it is our own Epos System software, and the same version of Firebird, I have made sure that they are all using the correct version of GDS32.DLL We recently upgraded the client from Firebird v1.5 because we were experiencing many issues with it and the advice seemed to be to upgrade to v2.1. All databases have been backed up and restored to ODS v11.1 and most of the time Firebird v2.1 is proving much more robust, faster and more reliable than v1.5. However, everything can be running along quite happily, then all of a sudden, on a seemingly random basis, the clients can't access the Firebird database any more. A quick reset of Firebird on the server sorts it out. This can happen several times a day at some sites, most of their 100 sites don't see any issues at all or only experience it once ot twice a month. In a sample of the Firebird logs I can see the following:- SVRA5505 (Server) Tue Jun 08 17:15:41 2010 SVRA5505 (Client) Tue Jun 08 17:15:41 2010 SVRA5505 (Client) Tue Jun 08 17:15:41 2010 SVRA5505 (Client) Tue Jun 08 17:15:41 2010 I am also seeing a lot of the 10054 & 10061 errors. Are these indicative of some other network problem or is it the firebird server process going down that causes them ? It is a bit unusual for me to see the Access Violation error message as above, more often than not I only get this in the fbserver log:- SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:01 2010 SVRA0000 (Client) Sun May 23 13:09:03 2010 SVRA0000 (Client) Sun May 23 13:19:00 2010 What additional info do you need from me to further investigate this or do you already know what the problem is from the original reporter of the issue ? |
Commented by: @hvlad Provide crash dump, please |
Commented by: @asfernandes Veselin, what is TASK_SOURCE and TASK_SUPLY? Are them tables, views or procedures? If they're not tables, please show they source to us. |
Commented by: Veselin Pavlov (pavlov_v) TASK_SOURCE and TASK_SUPPLY are tables. CREATE TABLE TASK_SUPPLY ( |
Commented by: Neil Pickles (npickles) After waiting for a week for the issue to reoccur at one of their sites that was falling over several times a day, I finally got a system to fall over after I installed the debug version of Firebird and configured Dr Watson at 3 other sites of theirs. I have attached a 7z file, leedscrashdump.7z, containing the crashdump file and Dr Watson log file from a failure this morning. I had to use 7zip to get it below the 10Mb upload limit. |
Modified by: Neil Pickles (npickles)Attachment: leedscrashdump.7z [ 11649 ] |
Commented by: @asfernandes Please test the fix. |
Modified by: @asfernandesVersion: 2.5 RC1 [ 10362 ] Version: 3.0 Initial [ 10301 ] Version: 2.5 Beta 2 [ 10300 ] Version: 2.5 Beta 1 [ 10251 ] Version: 2.1.2 [ 10270 ] Version: 2.0.5 [ 10222 ] Version: 2.1.1 [ 10223 ] Version: 2.5 Alpha 1 [ 10224 ] Version: 2.0.4 [ 10211 ] Version: 2.1.0 [ 10041 ] Version: 2.0.3 [ 10200 ] Version: 2.0.2 [ 10130 ] Version: 2.0.1 [ 10090 ] Version: 2.0.0 [ 10091 ] Version: 2.1.4 [ 10361 ] => |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Modified by: @pavel-zotovQA Status: No test |
Submitted by: Veselin Pavlov (pavlov_v)
Attachments:
drWatson2.zip
drWatson.zip
drWatson3.zip
link.txt
TASK_GET_INFO.sp
leedscrashdump.7z
Votes: 2
After several hours of normal working some of the client machines receive the error "Unable to complete network request to host "xxx". Error reading data from the connection." Some of the clients continue normal working. Looking ind firebird.log I found a lot of these "INET/inet_error: bind errno = 98".
After a firebird service restart the problem is fixed, but after a couple of hours is back again.
Commits: 701e14c b9c41df f70830b
The text was updated successfully, but these errors were encountered: