Issue Details (XML | Word | Printable)

Key: CORE-2900
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Adriano dos Santos Fernandes
Reporter: Veselin Pavlov
Votes: 2
Watchers: 4
Operations

If you were logged in you would be able to see more operations.
Firebird Core

"AV - The code attempted to access a virtual address without privilege to do so" using aggregate distinct

Created: 03/Mar/10 11:37 AM   Updated: 04/Feb/11 11:53 AM
Component/s: Engine
Affects Version/s: 2.0.0, 2.0.1, 2.0.2, 2.0.3, 2.1.0, 2.0.4, 2.5 Alpha 1, 2.1.1, 2.0.5, 2.1.2, 2.5 Beta 1, 2.5 Beta 2, 2.1.3, 3.0 Initial, 2.5 RC1, 2.5 RC2
Fix Version/s: 2.5 RC3, 2.1.4, 2.0.7, 3.0 Alpha 1

Time Tracking:
Not Specified

File Attachments: 1. Zip Archive drWatson.zip (34 kB)
2. Zip Archive drWatson2.zip (38 kB)
3. Zip Archive drWatson3.zip (52 kB)
4. File leedscrashdump.7z (7.67 MB)
5. Text File link.txt (0.0 kB)
6. File TASK_GET_INFO.sp (2 kB)

Environment: using binary snapshot build of 2.1.4. Tested on Windows 2003 server and Linux-Gentoo.

Planning Status: Unspecified


 Description  « Hide
After several hours of normal working some of the client machines receive the error "Unable to complete network request to host "xxx". Error reading data from the connection." Some of the clients continue normal working. Looking ind firebird.log I found a lot of these "INET/inet_error: bind errno = 98".
After a firebird service restart the problem is fixed, but after a couple of hours is back again.

 All   Comments   Work Log   Change History   Version Control   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Alexander Peshkov added a comment - 03/Mar/10 12:03 PM
Is 98 windows or linux error code?

Vlad Khorsun added a comment - 03/Mar/10 10:15 PM
Alex, i believe this is Linux error code

Alexander Peshkov added a comment - 04/Mar/10 11:30 AM
Addr in use - this is caused by handed superserver together with funny linux feature - it keeps socket opened for some time after end of process which did not close it explicitly. Please set BugcheckAbort=1, this will sooner of all let you get core file for the crash.

Veselin Pavlov added a comment - 08/Mar/10 07:54 AM - edited
these are the server logs:

Windows
---------------
NEXT (Server) Mon Feb 22 15:48:19 2010
     Access violation.
        The code attempted to access a virtual
        address without privilege to do so.
    This exception will cause the Firebird server
    to terminate abnormally.

NEXT (Client) Mon Feb 22 15:48:19 2010
    "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)


Linux
-----------------
xxxxxx (Server) Tue Mar 2 17:00:22 2010
        INET/inet_error: read errno = 104

xxxxxx (Client) Tue Mar 2 18:07:57 2010
        /opt/fb21ss/bin/fbguard: /opt/fb21ss/bin/fbserver terminated abnormally (-1)

xxxxxx (Client) Tue Mar 2 18:07:57 2010
        /opt/fb21ss/bin/fbguard: guardian starting bin/fbserver

xxxxxx (Server) Tue Mar 2 20:35:13 2010
        INET/inet_error: read errno = 104


Let me explain why we are using 2.1.4 snapshot and not 2.1.3 - with 2.1.3 and previous we experience a major memory leak. After 20 hours of work the server crashes because of huge used memory. After our investigation we found that the reason are the temporary (or not exactly) blobs generated by list() operator - we have a lot of them. We have a lot of substrings also. With 2.1.4 the memory leak is gone and for a couple of weeks everything worked fine. Suddenly the error 'Can not read data from the connection' began occurring.
What we can do to investigate further?

thank you.

Vlad Khorsun added a comment - 08/Mar/10 10:14 AM

Veselin Pavlov added a comment - 10/Mar/10 07:34 AM
I am attaching the core and crash dump from drWatson

Vlad Khorsun added a comment - 10/Mar/10 09:30 AM
Veselin,
could you create *full* crash dump ? You provided minidump and i can't see some important data to better inderstand the issue.

The call stack is :

  fbserver.exe!CVT_get_int64(const dsc * desc=0x08d99c60, short scale=0x0000, void (int, <no type>)* err=0x0042a350) Line 968 C++
  fbserver.exe!CVT_move(const dsc * from=0x0bb30008, dsc * to=0x08acf0dc, void (int, <no type>)* err=0x0042a350) Line 1694 + 0x11 bytes C++
  fbserver.exe!integer_to_text(const dsc * from=0x00000000, dsc * to=0x08acf178, void (int, <no type>)* err=0x00000000) Line 2261 C++
  fbserver.exe!CVT_move(const dsc * from=0x0bb30008, dsc * to=0x08acf178, void (int, <no type>)* err=0x0042a350) Line 1634 + 0xc bytes C++
  fbserver.exe!CVT2_make_string2(const dsc * desc=0x08d99c60, unsigned short to_interp=0x0002, unsigned char * * address=0x08acf228, Firebird::HalfStaticArray<unsigned char,256> & temp={...}, void (int, <no type>)* err=0x0042a350) Line 713 C++
  fbserver.exe!MOV_make_string2(Jrd::thread_db * tdbb=0x08acf9a4, const dsc * desc=0x08d99c60, unsigned short ttype=0x0002, unsigned char * * address=0x08acf228, Firebird::HalfStaticArray<unsigned char,256> & buffer={...}, bool limit=false) Line 584 + 0x12 bytes C++
> fbserver.exe!compute_agg_distinct(Jrd::thread_db * tdbb=0x00000000, Jrd::jrd_nod * node=0x08d3e1d8) Line 3072 C++
  fbserver.exe!EVL_group(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x08d99bd4, Jrd::jrd_nod * const node=0x08d3d2d0, unsigned short state=0x0002) Line 1792 C++
  fbserver.exe!get_record(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x08d98dac, Jrd::RecordSource * parent_rsb=0x00000000, Jrd::rse_get_mode mode=RSE_get_forward) Line 2450 + 0x17 bytes C++
  fbserver.exe!RSE_get_record(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x08d98dac, Jrd::rse_get_mode mode=RSE_get_forward) Line 316 + 0x29 bytes C++
  fbserver.exe!looper(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::jrd_req * request=0x08d955f0, Jrd::jrd_nod * in_node=0x00ceb8dc) Line 1967 + 0xf bytes C++
  fbserver.exe!execute_looper(Jrd::thread_db * tdbb=0x00000000, Jrd::jrd_req * request=0x00000000, Jrd::jrd_tra * transaction=0x0aec92a4, Jrd::jrd_req::req_s next_state=req_sync) Line 1461 + 0x1f bytes C++
  fbserver.exe!EXE_receive(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::jrd_req * request=0x08d955f0, unsigned short msg=0x0001, unsigned short length=0x04da, unsigned char * buffer=0x0acf1884, bool top_level=false) Line 749 + 0xc bytes C++
  fbserver.exe!get_procedure(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x0acf89a4, Jrd::irsb_procedure * impure=0x0acfc86c, Jrd::record_param * rpb=0x0acf0ea0) Line 1802 C++
  fbserver.exe!get_record(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x0acf89a4, Jrd::RecordSource * parent_rsb=0x0acf8950, Jrd::rse_get_mode mode=RSE_get_forward) Line 2344 + 0xd bytes C++
  fbserver.exe!get_record(Jrd::thread_db * tdbb=0x08acf900, Jrd::RecordSource * rsb=0x0acf8950, Jrd::RecordSource * parent_rsb=0x00000000, Jrd::rse_get_mode mode=RSE_get_forward) Line 2219 + 0x1a bytes C++
  fbserver.exe!fetch_left(Jrd::thread_db * tdbb=0x00000000, Jrd::RecordSource * rsb=0x00000000, Jrd::irsb * impure=0x00000000) Line 1033 + 0xe bytes C++
  fbserver.exe!get_record(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x0acf9658, Jrd::RecordSource * parent_rsb=0x00000000, Jrd::rse_get_mode mode=RSE_get_forward) Line 2481 + 0xb bytes C++
  fbserver.exe!fetch_left(Jrd::thread_db * tdbb=0x00000000, Jrd::RecordSource * rsb=0x00000000, Jrd::irsb * impure=0x00000000) Line 1006 + 0xe bytes C++
  fbserver.exe!get_record(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x0acfd554, Jrd::RecordSource * parent_rsb=0x00000000, Jrd::rse_get_mode mode=RSE_get_forward) Line 2481 + 0xb bytes C++
  fbserver.exe!fetch_left(Jrd::thread_db * tdbb=0x00000000, Jrd::RecordSource * rsb=0x00000000, Jrd::irsb * impure=0x00000000) Line 1006 + 0xe bytes C++
  fbserver.exe!get_record(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x0acf93f8, Jrd::RecordSource * parent_rsb=0x0acfe210, Jrd::rse_get_mode mode=RSE_get_forward) Line 2481 + 0xb bytes C++
  fbserver.exe!get_record(Jrd::thread_db * tdbb=0x08acf900, Jrd::RecordSource * rsb=0x0acfe210, Jrd::RecordSource * parent_rsb=0x00000000, Jrd::rse_get_mode mode=RSE_get_forward) Line 2219 + 0x1a bytes C++
  fbserver.exe!open_sort(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::RecordSource * rsb=0x0acfe438, Jrd::irsb_sort * impure=0x0acfc8d0, unsigned __int64 max_records=0x0000000000000000) Line 2981 + 0xf bytes C++
  fbserver.exe!RSE_open(Jrd::thread_db * tdbb=0x0acfaa40, Jrd::RecordSource * rsb=0x0acfe438) Line 479 + 0x18 bytes C++
  fbserver.exe!looper(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::jrd_req * request=0x0acfaa40, Jrd::jrd_nod * in_node=0x00d1d120) Line 1955 + 0xd bytes C++
  fbserver.exe!EXE_start(Jrd::thread_db * tdbb=0x08acf9a4, Jrd::jrd_req * request=0x0acfaa40, Jrd::jrd_tra * transaction=0x0aec92a4) Line 1113 + 0xe bytes C++
  fbserver.exe!jrd8_start_request(int * user_status=0x08acfd90, Jrd::jrd_req * * req_handle=0x0a685658, Jrd::jrd_tra * * tra_handle=0x0a686eb4, short level=0x0000) Line 3851 + 0xb bytes C++

the AV is at CVT_get_int64() at line

case dtype_long:
value = *((SLONG *) p); <-- here
break;

You have some stored procedure which computed LIST(DISTINCT ...)
The argument of LIST is some numeric expression

Probably you can tell us something about it.

BTW, i see custum UDF library FAUfile.DLL. Are you sure its 100% correct ? Is it used in expression mentioned above ?

Veselin Pavlov added a comment - 16/Mar/10 07:38 AM
It took more time for the problem to occur. I couldn't reproduce it effectively.

Veselin Pavlov added a comment - 16/Mar/10 08:23 AM
... and one more

Vlad Khorsun added a comment - 16/Mar/10 08:35 AM
I've asked for a FULL dump. I can't said something new with your second dump :(
Also you not answers on my questions about procedure with LIST(DISTINCT )and about your custom UDF.

Veselin Pavlov added a comment - 16/Mar/10 09:59 AM - edited
Sorry, we copied the bin directory from the pdb version of Firebird over the standart one, and I thoght that this is enough.
This is my first experience with collecting crash dumps.
 
in our database there are a lot of 'LIST(DISTINCT )' - in the stored procedures and dynamic SQLs.
There is only one 'LIST(DISTINCT <expresion>)':
  select substring(cast(list(distinct iif(DI.ID_FISCAL_INVENT is null, null, S.CODE)) as varchar(1500)) from 1 for 500),
         substring(cast(list(distinct S.CODE) as varchar(1500)) from 1 for 500)
  from LOG_UNSUCCESSFULL_INVENTORY UI
         join INVENT I on I.ID_INVENTORY_PHYSICAL = UI.ID_INVENTORY
         left outer join OTHER_INVENT DI on DI.ID_FISCAL_INVENT = I.ID_INVENT
         left outer join STOCK S on S.ID_STOCK = UI.ID_STOCK

We have 2 custom UDFs, but they are totally out of scope. There is a general setting which deactivates them, and I guarantee that they are not invoked at all.

Vlad Khorsun added a comment - 16/Mar/10 10:34 AM
Run drwtsn32 without parameters
In its GUI select option "Full" at the radio button 'Crash dump type' than press OK

When i said about "numeric expression" i meant any expression (even just field reference) with numeric type (integer, numeric, decimal, double precision, etc)

Could you try to run such "suspect" queries in hope to catch the AV ? Of course using the copy of the database at non-production server.

Veselin Pavlov added a comment - 18/Mar/10 10:04 AM
Hope this log files will help.

(We are suspecting that the problem is affecting FB 2.1.3 also. We have errors log but not crash dump from 2.1.3.)

Log file is huge, the archive is larger than 10 mb.
download it from the following link : http://uploading.com/files/764m12d3/drWatson.zip/

Vlad Khorsun added a comment - 18/Mar/10 12:45 PM
Thanks, its much better now.

The failed query is

select case
         when RR.STATUS = 2 then SLissue.ACTION_DATETIME
         else SLcreate.ACTION_DATETIME
       end as DT_STATUS,
       coalesce(T.EXEC_DATE_END, T.PLAN_DATE_END) as FINISH_DATE,
       coalesce(T.EXEC_DATE_START, T.PLAN_DATE_START) as START_DATE,
       T.*,
       iif(T.EXEC_DATE_END is null, 1, 0) as FINISH_DATE_IS_PLAN,
       iif(T.EXEC_DATE_START is null, 1, 0) as START_DATE_IS_PLAN,
       T.ID_TASK as ID_DETAIL,
       'head' as ID_DETAIL_CONTEXT,
       S.NAME as TASK_NAME,
       S.CODE as TASK_CODE,
       RR.REV_NO,
       RR.STATUS,
       C.NAME as CONTRAGENT_NAME,
       RRN.STATUS as DRAFT_STATUS,
       ERN.STATUS as EXEC_STATUS,
       TI.*,
       TC.EXEC_TYPE,
       T.ID_TASK as ID_DETAIL
from TASK T
     /*
     join TMP_IDS IDS on IDS.ID = T.ID_TASK and IDS.ID_SESSION = current_transaction
     */
     join STOCK S on S.ID_STOCK = T.ID_STOCK
     join TASK_REVISION RRN on RRN.ID_TASK_REVISION = T.ID_NEXT_REVISION
      
      
      
      
     left outer join TASK_REVISION RR on RR.ID_TASK_REVISION = T.ID_CURRENT_REVISION
     left outer join TASK_GET_EXPENDORDER(T.ID_TASK) TGEO on 1=1
     left outer join EXPEND_ORDER EO on EO.ID_EXPEND_ORDER = TGEO.ID_EXPEND_ORDER
     left outer join CONTRAGENT C on C.ID_CONTRAGENT = T.ID_CONTRAGENT
     left outer join SYSTEM_LOG SLcreate on SLcreate.ID_SYSTEM_LOG = RR.ID_LOG_CREATE
     left outer join SYSTEM_LOG SLissue on SLissue.ID_SYSTEM_LOG = RR.ID_LOG_ISSUE
     left outer join TASK_GET_INFO(T.ID_TASK) TI on 1=1
     left outer join TASK_CLASS TC on TC.ID_TASK_CLASS = T.ID_TASK_CLASS
     left outer join TASK_REVISION ERN on ERN.ID_TASK_REVISION = T.ID_EXEC_REVISION
where 0=0
   
   
   
   
   
   
   
   
   
   
   and (((T.ID_TASK_CLASS in (1,3))))
order by 1 desc


and the procedure which was failed is TASK_GET_INFO


Could you try to reproduce crash with this query ?
Also it is interesting to see query plan and procedure source.

Veselin Pavlov added a comment - 19/Mar/10 02:59 PM
Reproducing the crash is a very hard task. Sometimes our clients work for a whole week before the crash happen, sometimes it happens a couple of times a day.
To simulate the crash i created a procedure SYS_CRASH_SQL which runs the QUERY and fetches all records- they are almost 20000.
Next I created another procedure SYS_CRASH_TEST which runs the previous a couple of times - i

1. Set i=1000, execution crashed around 600 because of exhausted memory.
2. Set i=10 and created a script which commits the transaction after each execution - 10th execution of the QUERY
3. used 2 computers for parallel execution over one database.

I reproduced the crash 3 times. And before or during all of them the database runs other 'general' queries.
For example the following scenario NEVER reproduced the crash:
    1. restart the server
    2. start the testing script
but this one reproduced it 3 times:
    1. Some SQL activity, from the application or meta data modification
    2. start the testing script

I will continue testing ... some hints will be helpfull

Vlad Khorsun added a comment - 04/May/10 09:39 PM
Veselin,

do you have any progress with testing ?

Veselin Pavlov added a comment - 12/May/10 05:34 PM
Not really,

  Two of our clients suffer a lot from this problem.
  We tried installing FB 2.5 (last snapshot) but the same problem persists. There is a little change - with 2.1 when a crash happen then it repeats a couple of times almost immediately. With 2.5 it happen only once.
  
  We can simulate the problem but not very predictable.
  As I explained earlier, simulation involves a script which execute the "big" SQL from at least 2 machines and in the same time doing something else from our application.
  
  I can produce more crash dump files if this will help.

Here is a part of the firebird.log from 2.5 showing the AV:

NEXT (Server) Wed May 12 10:18:41 2010
Access violation.
The code attempted to access a virtual
address without privilege to do so.
This exception will cause the Firebird server
to terminate abnormally.


NEXT (Server) Wed May 12 10:18:41 2010
INET/inet_error: read errno = 10093


NEXT (Server) Wed May 12 10:18:41 2010
INET/select_wait: select failed, errno = 10093


NEXT (Server) Wed May 12 10:18:41 2010
INET/inet_error: send errno = 10093


NEXT (Server) Wed May 12 10:18:41 2010
SRVR_multi_thread/RECEIVE: error on main_port, shutting down


NEXT (Client) Wed May 12 10:18:42 2010
"C:\Program Files\Firebird\Firebird_2_5\bin\fbserver.exe": terminated abnormally (4294967295)



NEXT (Client) Wed May 12 10:18:43 2010
Guardian starting: "C:\Program Files\Firebird\Firebird_2_5\bin\fbserver.exe"

Neil Pickles added a comment - 10/Jun/10 03:14 PM
I am seeing something very similar to this. We have a client running 3 or 4 Windows XP PC's in a network, with Firebird SS v2.1.3.18185 on the server using the Guardian. All machines at all sites are running the same application software, in this case it is our own Epos System software, and the same version of Firebird, I have made sure that they are all using the correct version of GDS32.DLL

We recently upgraded the client from Firebird v1.5 because we were experiencing many issues with it and the advice seemed to be to upgrade to v2.1. All databases have been backed up and restored to ODS v11.1 and most of the time Firebird v2.1 is proving much more robust, faster and more reliable than v1.5.

However, everything can be running along quite happily, then all of a sudden, on a seemingly random basis, the clients can't access the Firebird database any more. A quick reset of Firebird on the server sorts it out.

This can happen several times a day at some sites, most of their 100 sites don't see any issues at all or only experience it once ot twice a month.

In a sample of the Firebird logs I can see the following:-

SVRA5505 (Server) Tue Jun 08 17:15:41 2010
Access violation.
The code attempted to access a virtual
address without privilege to do so.
This exception will cause the Firebird server
to terminate abnormally.

SVRA5505 (Client) Tue Jun 08 17:15:41 2010
"C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)

SVRA5505 (Client) Tue Jun 08 17:15:41 2010
INET/inet_error: read errno = 10054

SVRA5505 (Client) Tue Jun 08 17:15:41 2010
INET/inet_error: read errno = 10054

I am also seeing a lot of the 10054 & 10061 errors. Are these indicative of some other network problem or is it the firebird server process going down that causes them ?

It is a bit unusual for me to see the Access Violation error message as above, more often than not I only get this in the fbserver log:-

SVRA0000 (Client) Sun May 23 13:09:01 2010
"C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe": terminated abnormally (4294967295)

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database.
Uncommitted work may have been lost

SVRA0000 (Client) Sun May 23 13:09:01 2010
REMOTE INTERFACE/gds__detach: Unsuccesful detach from database.
Uncommitted work may have been lost

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: send errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:01 2010
INET/inet_error: read errno = 10054

SVRA0000 (Client) Sun May 23 13:09:03 2010
Guardian starting: "C:\Program Files\Firebird\Firebird_2_1\bin\fbserver.exe"

SVRA0000 (Client) Sun May 23 13:19:00 2010
INET/inet_error: read errno = 10054

What additional info do you need from me to further investigate this or do you already know what the problem is from the original reporter of the issue ?

Vlad Khorsun added a comment - 11/Jun/10 07:52 AM
Provide crash dump, please

Adriano dos Santos Fernandes added a comment - 12/Jun/10 10:03 PM
Veselin, what is TASK_SOURCE and TASK_SUPLY? Are them tables, views or procedures?

If they're not tables, please show they source to us.

Veselin Pavlov added a comment - 16/Jun/10 06:43 AM
TASK_SOURCE and TASK_SUPPLY are tables.
CREATE TABLE TASK_SOURCE (
    ID_TASK_SOURCE ID_TYPE NOT NULL /* ID_TYPE = INTEGER */,
    ID_TASK ID_TYPE NOT NULL /* ID_TYPE = INTEGER */,
    ID_STOCK ID_TYPE NOT NULL /* ID_TYPE = INTEGER */,
    QTY TQUANTITY NOT NULL /* TQUANTITY = NUMERIC(15,4) */,
    MEASURE TSTOCK_MEASSURE NOT NULL /* TSTOCK_MEASSURE = VARCHAR(10) */,
    ORDER_TAG INT_VALUE /* INT_VALUE = SMALLINT */,
    ORDER_TAG2 INT_VALUE /* INT_VALUE = SMALLINT */,
    ID_EXPEND_DETAIL ID_TYPE /* ID_TYPE = INTEGER */,
    ID_LIMIT_CARD ID_TYPE /* ID_TYPE = INTEGER */
);

CREATE TABLE TASK_SUPPLY (
    ID_TASK ID_TYPE NOT NULL /* ID_TYPE = INTEGER */,
    ID_SUPPLY ID_TYPE NOT NULL /* ID_TYPE = INTEGER */
);

Neil Pickles added a comment - 16/Jun/10 09:19 AM - edited
After waiting for a week for the issue to reoccur at one of their sites that was falling over several times a day, I finally got a system to fall over after I installed the debug version of Firebird and configured Dr Watson at 3 other sites of theirs.

I have attached a 7z file, leedscrashdump.7z, containing the crashdump file and Dr Watson log file from a failure this morning. I had to use 7zip to get it below the 10Mb upload limit.

Vlad Khorsun added a comment - 16/Jun/10 10:05 AM
Neil,

your issue is completely different.

Firebird crashes in isc_start_transaction() as passed attachment handle contains garbage.
The call stack is

> fbserver.exe!check_database(Jrd::thread_db * tdbb=0x010df910, Jrd::Attachment * attachment=0x037cc03c, int * user_status=0x010dfd8c) Line 4915 + 0x8 bytes C++
  fbserver.exe!jrd8_start_multiple(int * user_status=0x010dfd8c, Jrd::jrd_tra * * tra_handle=0x010dfc08, unsigned short count=1, Jrd::teb * vector=0x010dfa38) Line 3882 + 0xf bytes C++
  fbserver.exe!jrd8_start_transaction(int * user_status=0x010dfd8c, Jrd::jrd_tra * * tra_handle=0x010dfc08, short count=1, ...) Line 3993 C++
  fbserver.exe!isc_start_multiple(int * user_status=0x00000000, void * * tra_handle=0x010dfd88, short count=1, void * vec=0x00000000) Line 5001 + 0x31 bytes C++
  fbserver.exe!isc_start_transaction(int * user_status=0x010dfd8c, void * * tra_handle=0x010dfd88, short count=1, ...) Line 5080 C++
  fbserver.exe!rem_port::start_transaction(P_OP operation=op_release_lock, p_sttr * stuff=0x005ae5b2, packet * sendL=0x0000001d) Line 5245 + 0x1e bytes C++

And the fault is

// Make sure blocks look and feel kosher
Database* dbb;
if (!attachment ||
HERE -> (MemoryPool::blk_type(attachment) != type_att) ||
!(dbb = attachment->att_database) ||
MemoryPool::blk_type(dbb)!= type_dbb)
{
return handle_error(user_status, isc_bad_db_handle, tdbb);
}


Another thread performs engine shutdown (because service is stopped) and just released attachment above :

  fbserver.exe!ISC_event_wait(short count=1, event_t * * events=0x0416f040, long * values=0x0416f048, long micro_seconds=0, void (void *)* timeout_handler=0x00000000, void * handler_arg=0x00000000) Line 1292 C++
  fbserver.exe!stall(thread * thread=0x00000078) Line 966 C++
  fbserver.exe!SCH_enter() Line 445 C++
  fbserver.exe!subsystem_enter() Line 6156 C++
  fbserver.exe!isc_detach_database(int * user_status=0x00698c90, void * * handle=0x00698ce0) Line 1903 C++
  fbserver.exe!SecurityDatabase::fini() Line 240 + 0xa bytes C++
  fbserver.exe!purge_attachment(Jrd::thread_db * tdbb=0x0416f250, int * user_status=0x0416f234, Jrd::Attachment * attachment=0x037cc03c, const bool force_flag=true) Line 7167 + 0x7 bytes C++
  fbserver.exe!shutdown_dbb(Jrd::thread_db * tdbb=0x0416f250, Jrd::Database * dbb=0x00000000, Jrd::Attachment * * released=0x00000000) Line 6606 C++
  fbserver.exe!shutdown_all() Line 6652 + 0xb bytes C++
  fbserver.exe!JRD_shutdown_all(bool asyncMode=false) Line 6927 + 0x5 bytes C++
> fbserver.exe!cleanup_thread(void * __formal=0x00000000) Line 342 C++

Note - attachment in purge_attachment() is the same as in check_database() : 0x037cc03c

Neil Pickles added a comment - 16/Jun/10 10:10 AM
I have another occurence of this but the zipped crashdump file is 80Meg. How can I get this to you ?

Vlad Khorsun added a comment - 16/Jun/10 10:24 AM
Could you put it somewhere for download ?

Neil Pickles added a comment - 16/Jun/10 10:30 AM
I'll get it back from site and let you have the URl to download it from.

Adriano dos Santos Fernandes added a comment - 16/Jun/10 11:26 AM
I'm assigning to myself the original bug as the stack trace got in the dump and said by Vlad.

I was suspected about asb_desc usage (already fixed in HEAD) and was able to reproduce (although with completely different test case) the problem under the debugger.

Test case:

create or alter procedure p1
as
  declare x blob;
begin
  select
      list(distinct rdb$relation_id,
        (select first 1 rdb$relation_id from rdb$relations b
           where b.rdb$relation_id <= a.rdb$relation_id
           order by rdb$relation_id desc))
    from rdb$relations a into x;
end!

Then make debug tricks freezing threads while executing P1 in two different attachments under superserver.

Neil Pickles added a comment - 16/Jun/10 11:39 AM - edited
You can now download the 7z file from http://news.csy.co.uk/leedscrashdump2.7z, it's 96 Meg.

Is this the same error as last time or is it related to the original posted issue ?

Adriano dos Santos Fernandes added a comment - 16/Jun/10 11:49 AM
Neil, please open another ticket for your issue, as Vlad indicates it's completely different thing.

Adriano dos Santos Fernandes added a comment - 17/Jun/10 02:00 AM
Please test the fix.