Issue Details (XML | Word | Printable)

Key: CORE-4896
Type: Bug Bug
Status: Open Open
Priority: Critical Critical
Assignee: Unassigned
Reporter: Pavel Zotov
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Firebird Core

FBSVCMGR with `action_trace_stop` does not verify that trace log is ENTIRELY written on disk before returning control to outer environment

Created: 01/Aug/15 03:25 PM   Updated: 27/Oct/15 09:27 AM
Component/s: SVCMGR
Affects Version/s: 2.5.4, 3.0 RC1
Fix Version/s: None

File Attachments: 1. Zip Archive fbsvcmgr-do_NOT-waiting-for-its-log-will-be-completely-written-on-disk_-_batches-and-logs.zip (8 kB)


QA Status: No test


 Description  « Hide
It looks like very unpleasant "feature" that FBSVCMGR with action_trace_stop can return control to outer environment (batch or fbtest) without checking that OS really has finished writing of log.
This effect can be watched at least on Windows 32-bit, I've checked FB build WI-V3.0.0.31948.

Consider following batch (be sure that you have standard utility for comparing text files - 'fc.exe' on Windows):

=== start of batch `fbsvc_check.bat` ===
@echo off
setlocal enabledelayedexpansion enableextensions

cls

set fbhome=C:\1INSTALL\FIREBIRD\fb30sS
set fbhost=localhost
set fbport=3333
set dbtest=C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb
set dbbkup=C:\FBTESTING\qa\fbt-repo\tmp\e30.fbk
set fbsrun=%fbhome%\fbsvcmgr %fbhost%/%fbport%:service_mgr user SYSDBA password masterkey

set fbscfg=%~n0.cfg.tmp
set fbslst=%~n0.lst.tmp
set fbslog=%~n0.trc.log
set ownlog=%~n0.own.log

del %fbscfg% 2>nul
del %fbslst% 2>nul
del %fbslog% 2>nul
del %ownlog% 2>nul

echo services>> %fbscfg%
echo {>> %fbscfg%
echo enabled = true>> %fbscfg%
echo log_services = true>> %fbscfg%
echo exclude_filter = "%%((List Trace Sessions)|(Start Trace Session))%%">> %fbscfg%
echo }>> %fbscfg%


set estlog=%~n0.est.log
del %estlog% 2>nul

set actlog=%~n0.act.log
del %actlog% 2>nul

echo %dbtest% %dbbkup% >>%estlog%
echo -REPLACE_DATABASE %dbbkup% %dbtest% >>%estlog%
echo %dbtest% -SYSTEM >>%estlog%
echo %dbtest% -HEADER >>%estlog%
echo %dbtest% -WRITE ASYNC >>%estlog%
echo %dbtest% -BUFFERS 3210 >>%estlog%
echo %dbtest% -HOUSEKEEPING 33333 >>%estlog%
echo %dbtest% -SQL_DIALECT 1 >>%estlog%
echo %dbtest% -MODE READ_ONLY >>%estlog%
echo %dbtest% -MODE READ_WRITE >>%estlog%

set str=%fbsrun% action_trace_start trc_cfg %fbscfg% 1^>%fbslog% 2^>^&1
echo Starting trace session using FBSVCMGR.
echo Command: %str%
start /min cmd /c !str!

ping -n 2 127.0.0.1>nul

@rem Do some actions which should be written into trace log:

@echo --- BACKUP ---
%fbsrun% action_backup dbname %dbtest% bkp_file %dbbkup%

@echo --- RESTORE ---
%fbsrun% action_restore bkp_file %dbbkup% dbname %dbtest% res_replace

@echo --- STATISTICS of sys relations ---
%fbsrun% action_db_stats dbname %dbtest% sts_sys_relations 1>nul

@echo --- STATISTICS of header ---
%fbsrun% action_db_stats dbname %dbtest% sts_hdr_pages 1>nul

@echo --- PROPERTIES: update FW ---
%fbsrun% action_properties dbname %dbtest% prp_write_mode prp_wm_async

@echo --- PROPERTIES: update BUFFERS number ---
%fbsrun% action_properties dbname %dbtest% prp_page_buffers 3210

@echo --- PROPERTIES: update SWEEP interval ---
%fbsrun% action_properties dbname %dbtest% prp_sweep_interval 33333

@echo --- PROPERTIES: change SQL DIALECT ---
%fbsrun% action_properties dbname %dbtest% prp_set_sql_dialect 1

@echo --- PROPERTIES: make database READ-ONLY ---
%fbsrun% action_properties dbname %dbtest% prp_access_mode prp_am_readonly

@echo --- PROPERTIES: return database to READ-WRITE ---
%fbsrun% action_properties dbname %dbtest% prp_access_mode prp_am_readwrite


%fbsrun% action_trace_list 1>%fbslst% 2>&1

for /f "tokens=3" %%i in ('findstr "Session" %fbslst%') do (
  echo %time% %~nx0: request to stop trace session with id = %%i 1>>%ownlog%
  %fbsrun% action_trace_stop trc_id %%i 1>>%ownlog%
  echo %time% %~nx0: control from FBSVCMGR returned to this batch.>>%ownlog%
  copy %fbslog% copy-of-%fbslog% 1>nul
  echo %time% %~nx0: copy of trace log has been created.>>%ownlog%
)

@rem ping -n 4 127.0.0.1>nul ----------------------------- [ ########### 1 ########### ]

cls
echo Estimated to be seen in trace log (see %estlog%^):
echo --------------------------------------------------------------------
type %estlog%
echo --------------------------------------------------------------------

findstr /c:"%dbtest%" %fbslog%>%actlog%
echo.
echo Actual content of trace log (%fbslog%^) related to actions:
echo --------------------------------------------------------------------
type %actlog%
echo --------------------------------------------------------------------

echo.
echo Difference between estimated and actual content, w/o leading or trailing spaces and tabs:
echo -----------------------------------------------------------------------------------------
fc.exe /c /n /w %actlog% %estlog%
@rem echo %errorlevel%

echo Content of copy-of-%fbslog% related to actions:
findstr /c:"%dbtest%" copy-of-%fbslog%
=== end of batch `fbsvc_check.bat` ===

Correct following env. variables in it:

set fbhome=C:\1INSTALL\FIREBIRD\fb30sS
set fbhost=localhost
set fbport=3333
set dbtest=C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb
set dbbkup=C:\FBTESTING\qa\fbt-repo\tmp\e30.fbk

-- and run several times.

You will be able to see following messages about comparing files using fc.exe:

. . .
Comparing files fbsvc_check.act.log and FBSVC_CHECK.EST.LOG
***** fbsvc_check.act.log
***** FBSVC_CHECK.EST.LOG
    9: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -MODE READ_ONLY
   10: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -MODE READ_WRITE
*****
. . .

***** fbsvc_check.act.log
***** FBSVC_CHECK.EST.LOG
   10: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -MODE READ_WRITE
*****
. . .

***** fbsvc_check.act.log
***** FBSVC_CHECK.EST.LOG
    7: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -HOUSEKEEPING 33333
    8: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -SQL_DIALECT 1
    9: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -MODE READ_ONLY
   10: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -MODE READ_WRITE
*****
. . .

***** fbsvc_check.act.log
***** FBSVC_CHECK.EST.LOG
   10: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -MODE READ_WRITE
*****
. . .

***** fbsvc_check.act.log
***** FBSVC_CHECK.EST.LOG
    8: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -SQL_DIALECT 1
    9: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -MODE READ_ONLY
   10: C:\FBTESTING\qa\fbt-repo\tmp\e30.fdb -MODE READ_WRITE
*****

etc.

Is is unlikely that you will see message about no difference between estimated and actual output of trace - untill you UNCOMMENT line:

@rem ping -n 4 127.0.0.1>nul ----------------------------- [ ########### 1 ########### ]

(and maybe you need to change number of seconds from '4' to +/- (1..2) here)

Without knowledge about this 'feature' (that one need to wait several seconds after fbsvcmgr is stopped) it is very easy to get wrong results if analysis of trace log starts immediately after trace finish. And if trace log is huge I suppose that one need to wait this log to be flushed on disk much more - dozen seconds ?

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Pavel Zotov added a comment - 01/Aug/15 10:56 PM
I've encountered that issue about lost trainling part of FBSVCMGR also can be applied to FB 2.5.5.

Furthermore, FBSVCMGR can 'loose' trailing part of its log even when delay between action_trace_stop and start of parsing its log is really big -- say, 20 (yes, TWENTY) seconds.

What's more "wonderful" is that probability of this event (loosing of several last rows from log) also depends on... how VERBOSE is batch, i.e. how many messages are directed to console by ECHO commands (!!)

May be it's hard to believe but batch that was shown above will produce MORE messages about mismatch between expected and actual trace content if we commented its "echo" commands.

I've changed this batch (added setting "verbose_output" = 1 | 0 and also adjusted it to be run on both FB 2.5 and 3.0 ), and created auxiliary .bat that calls this 'main' one in infinite loop.
In every case there were 50 (fifty) calls of 'main' batch and frequency of mismatches significantly differs.

Please take these two batches ('main' and 'outer' for loop) for running in your environment:

1) main batch, name = 'fbsvc_check.bat'
=== begin of fbsvc_check.bat ===
@echo off
setlocal enabledelayedexpansion enableextensions

set ping_delay=%1
if .%1.==.. (
  set delay_after_stop_trc=0
) else (
  set /a delay_after_stop_trc=ping_delay+1
)

set verbose_output=1

set fbhome=C:\MIX\firebird\fb25\bin
set fbhost=localhost
set fbport=3255
set fbsrun=%fbhome%\fbsvcmgr %fbhost%/%fbport%:service_mgr user SYSDBA password masterkey

set engine=30
%fbsrun% info_server_version >>%~n0.own.log
findstr /i /c:"Firebird 2.5" %~n0.own.log>nul
if not errorlevel 1 set engine=25

set dbtest=%~dp0e%engine%.fdb
set dbbkup=%~dp0e%engine%.fbk

set fbscfg=%~n0.cfg.tmp
set fbslst=%~n0.lst.tmp
set fbslog=%~n0.trc.log
set ownlog=%~n0.own.log
set estlog=%~n0.est.log
set actlog=%~n0.act.log

del %fbscfg% 2>nul
del %fbslst% 2>nul
del %fbslog% 2>nul
del %ownlog% 2>nul
del %estlog% 2>nul
del %actlog% 2>nul

if .%engine%.==.25. (
  echo ^<services^> >> %fbscfg%
  echo enabled true>> %fbscfg%
  echo log_services true>> %fbscfg%
  echo exclude_filter "%%((List Trace Sessions)|(Start Trace Session))%%">> %fbscfg%
  echo ^</services^> >> %fbscfg%

  echo -TRUSTED_SVC SYSDBA %dbtest% %dbbkup% >>%estlog%
  echo -TRUSTED_SVC SYSDBA -REPLACE_DATABASE %dbbkup% %dbtest% >>%estlog%
  echo -TRUSTED_SVC SYSDBA %dbtest% -SYSTEM >>%estlog%
  echo -TRUSTED_SVC SYSDBA %dbtest% -HEADER >>%estlog%
  echo -TRUSTED_SVC SYSDBA %dbtest% -write async >>%estlog%
  echo -TRUSTED_SVC SYSDBA %dbtest% -buffers 3210 >>%estlog%
  echo -TRUSTED_SVC SYSDBA %dbtest% -housekeeping 33333 >>%estlog%
  echo -TRUSTED_SVC SYSDBA %dbtest% -sql_dialect 1 >>%estlog%
  echo -TRUSTED_SVC SYSDBA %dbtest% -mode read_only >>%estlog%
  echo -TRUSTED_SVC SYSDBA %dbtest% -mode read_write >>%estlog%
) else (
  echo services>> %fbscfg%
  echo {>> %fbscfg%
  echo enabled = true>> %fbscfg%
  echo log_services = true>> %fbscfg%
  echo exclude_filter = "%%((List Trace Sessions)|(Start Trace Session))%%">> %fbscfg%
  echo }>> %fbscfg%

  echo %dbtest% %dbbkup% >>%estlog%
  echo -REPLACE_DATABASE %dbbkup% %dbtest% >>%estlog%
  echo %dbtest% -SYSTEM >>%estlog%
  echo %dbtest% -HEADER >>%estlog%
  echo %dbtest% -WRITE ASYNC >>%estlog%
  echo %dbtest% -BUFFERS 3210 >>%estlog%
  echo %dbtest% -HOUSEKEEPING 33333 >>%estlog%
  echo %dbtest% -SQL_DIALECT 1 >>%estlog%
  echo %dbtest% -MODE READ_ONLY >>%estlog%
  echo %dbtest% -MODE READ_WRITE >>%estlog%
)

set str=%fbsrun% action_trace_start trc_cfg %fbscfg% 1^>%fbslog% 2^>^&1
if .%verbose_output%.==.1. (
  echo Starting trace session using FBSVCMGR.
  echo Command: %str%
)
start /min cmd /c !str!

ping -n 2 127.0.0.1>nul

@rem Do some actions which should be written into trace log:

if .%verbose_output%.==.1. @echo --- BACKUP ---
%fbsrun% action_backup dbname %dbtest% bkp_file %dbbkup%

if .%verbose_output%.==.1. @echo --- RESTORE ---
%fbsrun% action_restore bkp_file %dbbkup% dbname %dbtest% res_replace

if .%verbose_output%.==.1. @echo --- STATISTICS of sys relations ---
%fbsrun% action_db_stats dbname %dbtest% sts_sys_relations 1>nul

if .%verbose_output%.==.1. @echo --- STATISTICS of header ---
%fbsrun% action_db_stats dbname %dbtest% sts_hdr_pages 1>nul

if .%verbose_output%.==.1. @echo --- PROPERTIES: update FW ---
%fbsrun% action_properties dbname %dbtest% prp_write_mode prp_wm_async

if .%verbose_output%.==.1. @echo --- PROPERTIES: update BUFFERS number ---
%fbsrun% action_properties dbname %dbtest% prp_page_buffers 3210

if .%verbose_output%.==.1. @echo --- PROPERTIES: update SWEEP interval ---
%fbsrun% action_properties dbname %dbtest% prp_sweep_interval 33333

if .%verbose_output%.==.1. @echo --- PROPERTIES: change SQL DIALECT ---
%fbsrun% action_properties dbname %dbtest% prp_set_sql_dialect 1

if .%verbose_output%.==.1. @echo --- PROPERTIES: make database READ-ONLY ---
%fbsrun% action_properties dbname %dbtest% prp_access_mode prp_am_readonly

if .%verbose_output%.==.1. @echo --- PROPERTIES: return database to READ-WRITE ---
%fbsrun% action_properties dbname %dbtest% prp_access_mode prp_am_readwrite

%fbsrun% action_trace_list 1>%fbslst% 2>&1

if .%verbose_output%.==.1. (
  echo %time% %~nx0: number of active trace sessions now (output should be 1^):
  type %fbslst% | find /i /c "Session"
)

for /f "tokens=3" %%i in ('findstr "Session" %fbslst%') do (
  if .!verbose_output!.==.1. (
    set msg=%time% %~nx0: request to stop trace session with id = %%i
    echo !msg!
    echo !msg!>>%ownlog%
  )

  %fbsrun% action_trace_stop trc_id %%i 1>>%ownlog%

  if .!verbose_output!.==.1. (
    set msg=%time% %~nx0: control from FBSVCMGR returned to this batch.
    echo !msg!
    echo !msg!>>%ownlog%
  )
  copy %fbslog% copy-of-%fbslog% 1>nul
)

if .%verbose_output%.==.1. (
  echo %time% %~nx0: ensure that there is no active trace sessions (output should be 0^):
  %fbsrun% action_trace_list | find /i /c "Session"
)

if .%verbose_output%.==.1. (
  if .%delay_after_stop_trc%.==.0. (
    set msg=%time%: NO pause after action_trace_stop, go on.
    echo !msg!
    echo !msg!>>%ownlog%
  ) else (
    set msg=%time%: take pause %ping_delay% seconds.
    echo !msg!
    echo !msg!>>%ownlog%
  
    ping -n %delay_after_stop_trc% 127.0.0.1>nul
  
    set msg=%time%: go on after pause.
    echo !msg!
    echo !msg!>>%ownlog%
  )
) else (
  ping -n %delay_after_stop_trc% 127.0.0.1>nul
)

findstr /c:"%dbtest%" %fbslog%>%actlog%

if .%verbose_output%.==.1. (
  cls
  echo Estimated to be seen in trace log (see %estlog%^):
  echo --------------------------------------------------------------------
  type %estlog%
  echo --------------------------------------------------------------------
  echo.
  echo Actual content of trace log (%fbslog%^) related to actions:
  echo --------------------------------------------------------------------
  type %actlog%
  echo --------------------------------------------------------------------
  echo.
  echo Difference between estimated and actual content, w/o leading or trailing spaces and tabs:
  echo -----------------------------------------------------------------------------------------
)
fc.exe /c /n /w %actlog% %estlog%

if .%verbose_output%.==.1. (
  echo Content of copy-of-%fbslog% related to actions:
  findstr /c:"%dbtest%" copy-of-%fbslog%
)
=== end of fbsvc_check.bat ===

2) auxiliary batch for infinite loop, name = 'fbsvc_loop.bat':
===
@echo off
set delay=%1
set k=1
:m1
  echo %time%: iter # %k%
  call fbsvc_check.bat %delay%
  echo Wait for a while - here you can interrupt this batch. . .
  ping -n 3 127.0.0.1>nul
  set /a k=%k%+1
goto m1
===

For starting test type:

fbsvc_loop.bat NN

-- where NN is number of seconds during which 'main' batch will stay in pause after FBSVCMGR has been stopped.

So, I've launched fbsvc_loop.bat two times and forced it run during 50 times. Both time console output was duplicated in log with the help of MTEE utility.

First time I've set in main batch ('fbsvc_check.bat') verbose_output=0 and 2nd time -- verbose_output=1.

Then I counted number of iterations when file comparer (FC.EXE) found mismatches - i.e. when FBSVCMGR did not completed its log to be entirely written on disk.

RESULT:
1) number 'looses' when verbose_output=0 is 18 ( result of find /c "***** fbsvc_check.act.log" fbsvc_loop.20s.verb_0.log // .log file see in attach)
2) number 'looses' when verbose_output=1 is 4 (result of find /c "***** fbsvc_check.act.log" fbsvc_loop.20s.verb_1.log )

So, the more batch "speaks"... the less FBSVCMGR looses! Looks like fantasy...