Issue Details (XML | Word | Printable)

Key: CORE-4553
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Alexander Peshkov
Reporter: Pavel Zotov
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Error stack can be broken after several minutes under heavy workload

Created: 14/Sep/14 09:55 AM   Updated: 23/Sep/15 11:29 AM
Component/s: None
Affects Version/s: 3.0 Alpha 2
Fix Version/s: 3.0 Beta 1

QA Status: Cannot be tested
Test Details:
Reproduced only under high concurrent load (OLTP-EMUL).
Letters to / from Alex: 15-oct-2014, subj: "better fix for damaged strings in error messages".
Issue in trunk/ChangeLog:
 2014-10-13 09:30 alexpeshkoff
   M src/yvalve/MasterImplementation.cpp
Fixed search for free circullar buffer when using LWP number as thread id


 Description  « Hide
Consider the following DDL:

recreate table ttt(id bigint primary key, x int); commit;
insert into ttt select i, 1 from (select row_number()over() i from rdb$types rows 100) order by rand();
commit;
create descending index ttt_id_desc on ttt(id);
commit;

set term ^;

create or alter procedure p_15(a_rnd double precision) returns (dbkey dm_dbkey) as
  declare c cursor for (select rdb$db_key dbkey from ttt where id >= :a_rnd order by id rows 1 FOR UPDATE WITH LOCK);
begin
  execute statement ('select rdb$db_key dbkey from ttt where id >= :r order by id rows 1 FOR UPDATE WITH LOCK')
                    ( r := a_rnd)
                    into dbkey;
  suspend;
end
^
create or alter procedure p_14(a_rnd double precision) returns (dbkey dm_dbkey) as
begin
                              execute statement ('select p.dbkey from p_15( :r ) p')
                    ( r := a_rnd)
                    into dbkey;
  suspend;
end
^

 --- . . . so on downto p_00 . . .

create or alter procedure p_00(a_rnd double precision) returns (dbkey dm_dbkey) as
begin
  execute statement ('select p.dbkey from p_01( :r ) p')
                    ( r := a_rnd)
                    into dbkey; suspend;
end
^
set term ;^
commit;

(full text see in attach, file broken_stack_ddl.sql ).

So, we can call any of SPs from: { p_00, p_01, ... p_14 } - and they all ultimately will call p_15 (most deepest) which attempts to lock first row with ID >= :some_selected_id in table TTT.

There is huge .sql which is run by ~50...70 attaches and does the follow:

set transaction no wait no auto undo;
set term ^;
execute block returns(dts char(12), id bigint, elapsed_ms int) as
  declare id_min double precision;
  declare id_max double precision;
  declare id_rnd double precision;
  declare dbkey char(8) character set octets;
  declare t0 timestamp;
begin
  select min(id)-0.5 from ttt into id_min;
  select max(id)+0.5 from ttt into id_max;
  id_rnd = id_min+rand()*(id_max-id_min);
  t0='now';
  select dbkey from p_03( :id_rnd ) into dbkey; -- ### NB ###

  update ttt t set t.x = rand()*1000 -- (select count(*) from rdb$types,rdb$types)
  where t.rdb$db_key = :dbkey
  returning id into id;

  dts=substring(cast(cast('now' as timestamp) as varchar(25)) from 12 for 12);
  elapsed_ms = datediff(millisecond from t0 to cast('now' as timestamp));
  suspend;
end
^
set term ;^
commit;

-- and this EB repeats many times (full test see in attach, file broken_stack_run.sql ).

We can change the name of called SP from p_01 upto p_14.
But error can be reproduced only if 'p_00' . . . 'p_03' is specified in EB (at least in my env.).

Batch file to launch isqls:

setlocal enabledelayedexpansion enableextensions
set winq=%1
for /l %%i in (1, 1, %winq%) do (
  set str=isql 192.168.0.220/3333:oltp30 -n -i broken_stack_run.sql 1^>nul 2^>nul
  start /min cmd /c !str!
)

If we run this batch like this: broken_stack_run.bat 70 - it will launch 70 isql windows.

Trace config:

enabled = true
log_errors = true
time_threshold = 0
print_plan = true
explain_plan = true
print_perf = true
max_sql_length = 16384
max_log_size = 9999999999

After run trace and batch: broken_stack_run.bat 70 - we need to wait a few minutes.

Then we can periodically check log of trace like this (file `get-broken-lines.sh` in attach) :

grep " : " $1 | grep -v "deadlock\|concurrent\|At" | head -20

(where $1 - the name of trace log).

The following 'broken' messages will be displayed after some minutes:

335544842 : 1
335544842 : 7
335544842 : 5
335544842 : 7
335544842 : 5
335544842 : 5
335544842 : t procedure 'P_09' line: 4, col: 21
335544842 : procedure 'P_04' line: 4, col: 11
335544842 : procedure 'P_04' line: 4, col: 11
335544842 : procedure 'P_05' line: 4, col: 13
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : 5
335544842 : 5
335544842 : procedure 'P_03' line: 4, col: 9
335544842 : t procedure 'P_08' line: 4, col: 19
335544842 : t procedure 'P_08' line: 4, col: 19
335544842 : t procedure 'P_05' line: 4, col: 13
335544842 : t procedure 'P_08' line: 4, col: 19

The full test case please see in attach.
Correction in `broken_stack_run.bat` is required for host name + port + database alias.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Pavel Zotov added a comment - 14/Sep/14 09:58 AM
Sorry, couldn`t attach .zip

Please download it from here: https://yadi.sk/d/P1UVFnIdbTyMa

(link is permanent)