Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build process does not produce consistent results [CORE5548] #5816

Open
firebird-automations opened this issue May 22, 2017 · 36 comments
Open

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Bernhard M. Wiedemann (bmwiedemann)

Votes: 2

When working on reproducible builds for openSUSE,
I found that the firebird package produced different results on every build.

The differences come from two sources:

1.
g++ orders functions in http://libEngine12.so and fbintl
depending on random order of files in the build system's filesystem.

2.
Additionally, help.fdb and other sample .fdb files contain some bytes that differ between builds.

See https://reproducible-builds.org/ for why this matters.

Commits: 3278b68

@firebird-automations
Copy link
Collaborator Author

Commented by: Bernhard M. Wiedemann (bmwiedemann)

fix for first issue is #92

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

I hardly understand how can second issue be fixed. Having creation datetime in the database header is an old useful feature and there is no idea how can we avoid this irreproducible bytes in the build

@firebird-automations
Copy link
Collaborator Author

Commented by: Bernhard M. Wiedemann (bmwiedemann)

There is one nicely designed way to do this, using the SOURCE_DATE_EPOCH environment variable as the 'current' time int, when we have it set during package build.
E.g. see examples
thezbyg/gpick#138
michaelrsweet/mxml@b79d3e0
https://gitlab.kitware.com/cmake/cmake/merge_requests/432
rpm-software-management/rpm#144

Some programs like TeX that were afraid, that this might accidentally trigger during normal operation, added some extra variable (e.g. SOURCE_DATE_EPOCH_TEX_PRIMITIVES there) to be sure.

Can you give me a hint on where in the code that creation datetime is set?
help.fdb diff looks thus:

00000000 01 00 00 00 12 00 00 00 00 00 00 00 00 00 00 00
00000010 00 10 0c 80 03 00 00 00 00 00 00 00 06 00 00 00
-00000020 07 00 00 00 07 00 00 00 00 00 28 00 12 e2 00 00
-00000030 58 bd 0c 33 06 00 00 00 00 00 00 00 01 01 01 00
+00000020 07 00 00 00 07 00 00 00 00 00 28 00 a4 e3 00 00
+00000030 64 f4 03 1c 06 00 00 00 00 00 00 00 01 01 01 00

so the change is spread out over 8 bytes

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

I believe it's this line in PAG_format_header():

    \*\(ISC\_TIMESTAMP\*\) header\-\>hdr\_creation\_date = TimeStamp::getCurrentTimeStamp\(\)\.value\(\);

See NoThrowTimeStamp::getCurrentTimeStamp() to get idea how to convert time values.

@firebird-automations
Copy link
Collaborator Author

Commented by: Bernhard M. Wiedemann (bmwiedemann)

wrote #93
for taking care of timestamps

yet, there are still diffs left in .fdb files. Usually just a single bit

e.g. for help.fdb I got

000bbfd0 1a 00 10 00 00 00 00 00 01 00 00 00 1a 00 00 00
000bbfe0 03 00 01 00 1a 00 01 01 03 06 53 59 53 44 42 41
000bbff0 00 02 01 06 03 04 07 09 08 00 01 00 02 04 00 00
-000bc000 05 00 00 00 05 00 00 00 00 00 00 00 bc 00 00 00
+000bc000 05 00 00 00 04 00 00 00 00 00 00 00 bc 00 00 00
000bc010 0c 00 00 00 09 00 24 00 d8 0f 26 00 ac 0f 2c 00
000bc020 84 0f 26 00 58 0f 2c 00 30 0f 26 00 04 0f 2c 00
000bc030 dc 0e 26 00 b0 0e 2c 00 88 0e 26 00 5c 0e 2c 00
@@ -14473,7 +14473,7 @@
000bcfd0 09 fd 00 02 80 0a f6 00 00 00 00 00 00 00 00 00
000bcfe0 00 00 00 00 00 01 fc fd 00 07 53 51 4c 24 34 30
000bcff0 37 e8 20 fb 00 01 09 fd 00 02 7f 0a f6 00 00 00
-000bd000 05 10 00 00 05 00 00 00 00 00 00 00 bd 00 00 00
+000bd000 05 10 00 00 04 00 00 00 00 00 00 00 bd 00 00 00
000bd010 0d 00 00 00 09 00 1b 00 c8 0f 38 00 90 0f 38 00
000bd020 58 0f 38 00 20 0f 38 00 e8 0e 38 00 b8 0e 30 00
000bd030 88 0e 30 00 58 0e 30 00 28 0e 30 00 f8 0d 30 00

maybe something that depends on timing of individual operations?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

1. Why this ticket classified as a bug ? This is feature request, imo

2. Why "reproducible" database files is required ? Why "reproducible" binaries is not enough ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Bernhard M. Wiedemann (bmwiedemann)

1) the separating line between bugs and features can be vague. E.g. when working on reproducible builds I found https://savannah.gnu.org/support/index.php?109234 and https://bugs.launchpad.net/intltool/+bug/1687644 which are pretty definitely bugs. Feel free to change the classification.

2) for some reason, the help.fdb and sample .fdb files get built and shipped as part of our openSUSE packages and we try to make all our packages build reproducibly.
Will firebird work without those? Then we could just omit them and be done.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

1. Race conditions you found in that projects during build could be real bugs, yes, but i see no relation with current ticket, sorry.

2. The only database file required to run Firebird v3 is security3.fdb.

@firebird-automations
Copy link
Collaborator Author

Modified by: @hvlad

issuetype: Bug [ 1 ] => Improvement [ 4 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

As for

---------------------------------------------
yet, there are still diffs left in .fdb files. Usually just a single bit

e.g. for help.fdb I got

000bbfd0 1a 00 10 00 00 00 00 00 01 00 00 00 1a 00 00 00
000bbfe0 03 00 01 00 1a 00 01 01 03 06 53 59 53 44 42 41
000bbff0 00 02 01 06 03 04 07 09 08 00 01 00 02 04 00 00
-000bc000 05 00 00 00 05 00 00 00 00 00 00 00 bc 00 00 00
+000bc000 05 00 00 00 04 00 00 00 00 00 00 00 bc 00 00 00
000bc010 0c 00 00 00 09 00 24 00 d8 0f 26 00 ac 0f 2c 00
000bc020 84 0f 26 00 58 0f 2c 00 30 0f 26 00 04 0f 2c 00
000bc030 dc 0e 26 00 b0 0e 2c 00 88 0e 26 00 5c 0e 2c 00
@@ -14473,7 +14473,7 @@
000bcfd0 09 fd 00 02 80 0a f6 00 00 00 00 00 00 00 00 00
000bcfe0 00 00 00 00 00 01 fc fd 00 07 53 51 4c 24 34 30
000bcff0 37 e8 20 fb 00 01 09 fd 00 02 7f 0a f6 00 00 00
-000bd000 05 10 00 00 05 00 00 00 00 00 00 00 bd 00 00 00
+000bd000 05 10 00 00 04 00 00 00 00 00 00 00 bd 00 00 00
000bd010 0d 00 00 00 09 00 1b 00 c8 0f 38 00 90 0f 38 00
000bd020 58 0f 38 00 20 0f 38 00 e8 0e 38 00 b8 0e 30 00
000bd030 88 0e 30 00 58 0e 30 00 28 0e 30 00 f8 0d 30 00

maybe something that depends on timing of individual operations?
---------------------------------------------

the difference is in pag::pag_generation field, it changes every time page is written to disk.
Could be related with background garbage collector activity.

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

I must admit I'm not really sure what is help.fdb used for. As for employee.fdb, while firebird can run without it, it's often used for examples (e.g. in documentation) so I believe completely omitting it would be a mistake (currently it's in firebird-examples subpackage). If it really came to the worst, it might by probably generated from post-install script but I would prefer to avoid such dirty tricks. While I can see value in reproducible build results, I see it rather as "nice to have" than "must have" kind of feature.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Just FYI - help.fdb is help file for QLI. If you type in QLI 'help <command>' help.fdb is used.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

I would not suprise if a next person asks "what is QLI?" :)

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Edited Summary for better context

@firebird-automations
Copy link
Collaborator Author

Modified by: Sean Leyne (seanleyne)

summary: firebird does not build reproducibly => Build process does not produce consistent results

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Sean,

build process does produced consistent results.
It creates binaries of the same version from the same sources and it is consistent in a whole.
Else i don't know what is consistent results of build process.

Original summary was better and more clear, imo.

@firebird-automations
Copy link
Collaborator Author

Commented by: Bernhard M. Wiedemann (bmwiedemann)

I just got an idea for a nicer way to be able to build firebird twice with identical results:

in the end of the build we can run something like
find -type f -mtime -1 -name \*.fdb | xargs fdb-strip-nondeterminism

and that fdb-strip program can then streamline hdr_creation_date and pag::pag_generation field values.

IMHO, it would make sense to maintain such a tool as part of firebird, because the data structure definitions and functions are available and guaranteed to be current.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Bernhard,

if you absolutely need this feature, i see no problem with additional build-time utility.

Do you need any help to implement it ?

@firebird-automations
Copy link
Collaborator Author

Commented by: Bernhard M. Wiedemann (bmwiedemann)

I'm busy fixing other packages and do not know the firebird codebase, so I would not get to it soon...
Could you write a simple version of this fdb-strip tool?

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Bernhard,

i'll look at it a bit later

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

My (failed) attempt at stripping random stuff from .fdb files is at https://anonscm.debian.org/cgit/pkg-firebird/3.0.git/tree/debian/fdb-r15y-prune.cpp

It does two things:
1) sets the creation date
2) wipes unused page space

It seems to work (used in the Debian package builds) and the resulting employee.fdb seems correct (isql -x does not complain, tested during the package build, on all Debian architectures).

However, there are still varying bits - perhaps index trees or something like that.

Just FYI, to help you not reinvent the wheel, maybe.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Damyan,

thanks for your efforts.

Could you show few samples of that varying bits ?

To check database validity use gfix -validate -full

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

Hello, Vlad, sorry about the delay in responding.

All the data about reproducibility testing of firebird3.0 is at https://tests.reproducible-builds.org/debian/rb-pkg/firebird3.0.html . Just pick architecture/version from the left and follow the "differences" link.

Here are a few samples of differences, as detected by the reproducible-builds effort (long URLs).

Comparisons are made after running the fdb-r15y-prune tool. Validation using gfix -validate -full passes.

https://tests.reproducible-builds.org/debian/dbd/stretch/amd64/firebird3.0_3.0.1.32609.ds4-14.diffoscope.html#firebird-.--examples_-.-.-.-----.ds----_all.deb-data.tar.xz-data.tar-.-usr-share-doc-firebird-.--common-doc-examples-empbuild-employee.fdb.gz-employee.fdb

https://tests.reproducible-builds.org/debian/dbd/unstable/amd64/firebird3.0_3.0.2.32703.ds4-4.diffoscope.html#firebird-.--examples_-.-.-.-----.ds---_all.deb-data.tar.xz-data.tar-.-usr-share-doc-firebird-.--common-doc-examples-empbuild-employee.fdb.gz

https://tests.reproducible-builds.org/debian/dbd/stretch/arm64/firebird3.0_3.0.1.32609.ds4-14.diffoscope.html#firebird-.--examples_-.-.-.-----.ds----_all.deb-data.tar.xz-data.tar-.-usr-share-doc-firebird-.--common-doc-examples-empbuild-employee.fdb.gz-employee.fdb

https://tests.reproducible-builds.org/debian/dbd/unstable/arm64/firebird3.0_3.0.2.32703.ds4-4.diffoscope.html#firebird-.--examples_-.-.-.-----.ds---_all.deb-data.tar.xz-data.tar-.-usr-share-doc-firebird-.--common-doc-examples-empbuild-employee.fdb.gz-employee.fdb

Sometimes there are differences in the security database too:

https://tests.reproducible-builds.org/debian/dbd/unstable/arm64/firebird3.0_3.0.2.32703.ds4-4.diffoscope.html#firebird-.--server_-.-.-.-----.ds---_arm--.deb-data.tar.xz-data.tar-.-var-lib-firebird--.--system-default-security-.fdb

https://tests.reproducible-builds.org/debian/dbd/buster/amd64/firebird3.0_3.0.2.32703.ds4-4.diffoscope.html#firebird-.--server_-.-.-.-----.ds---_amd--.deb-data.tar.xz-data.tar-.-var-lib-firebird--.--system-default-security-.fdb

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

Re-reading fdb-r15y-prune.cpp, it also resets pag_generation to '1' and pag_reserved to '0', so the remaining differences are elsewhere.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

Damyan,

i see that some pages differs in a swept state. I think it is worth to add a sweep step before pruning.

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

Here's a diff after adding a sweep before pruning:

--- employee-1.fdb
+++ employee-2.fdb
@@ -150944,103 +150944,103 @@
0024d9f0: 5500 3000 0000 0000 0200 0000 7d00 0000 U.0.........}...
0024da00: 0100 0000 0101 0100 1100 2800 0500 0000 ..........(.....
0024da10: 5500 0000 0300 1100 0000 0000 0000 0000 U...............
0024da20: 0100 0000 0100 0000 0500 0000 0800 456e ..............En
0024da30: 676c 6973 680a 0000 0000 0000 0007 0046 glish..........F
0024da40: 7265 6e63 680a 0a0a 0000 0000 0000 0800 rench...........
0024da50: 5370 616e 6973 680a 0000 0000 0000 0001 Spanish.........
-0024da60: 000a 7461 6c69 616e 0a00 0000 90a7 c088 ..talian........
+0024da60: 000a 7461 6c69 616e 0a00 0000 9067 6669 ..talian.....gfi
0024da70: 0100 0a00 0000 0000 0000 0000 0000 0000 ................
0024da80: 0000 0000 0000 0000 0000 0000 5500 3000 ............U.0.
0024da90: 0000 0000 0200 0000 7d00 0000 0100 0000 ........}.......
0024daa0: 0101 0100 1100 2800 0500 0000 5500 0000 ......(.....U...
0024dab0: 0300 1100 0000 0000 0000 0000 0100 0000 ................
0024dac0: 0100 0000 0500 0000 0800 4974 616c 6961 ..........Italia
0024dad0: 6e0a 0000 0000 0000 0007 0047 6572 6d61 n..........Germa
0024dae0: 6e0a 0a0a 0000 0000 0000 0700 4672 656e n...........Fren
0024daf0: 6368 0a00 0000 0000 0000 0001 000a 7461 ch............ta
-0024db00: 6c69 616e 0a00 0000 90a7 c088 0100 0a00 lian............
+0024db00: 6c69 616e 0a00 0000 9067 6669 0100 0a00 lian.....gfi....
0024db10: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024db20: 0000 0000 0000 0000 5500 3000 0000 0000 ........U.0.....
0024db30: 0200 0000 7d00 0000 0100 0000 0101 0100 ....}...........
0024db40: 1100 2800 0500 0000 5500 0000 0300 1100 ..(.....U.......
0024db50: 0000 0000 0000 0000 0100 0000 0100 0000 ................
0024db60: 0500 0000 0900 4a61 7061 6e65 7365 0a00 ......Japanese..
0024db70: 0000 0000 0008 0045 6e67 6c69 7368 0a0a .......English..
0024db80: 0000 0000 0000 0100 0a00 0000 0000 0000 ................
0024db90: 0000 0000 0000 0001 000a 7461 6c69 616e ..........talian
-0024dba0: 0a00 0000 90a7 c088 0100 0a00 0000 0000 ................
+0024dba0: 0a00 0000 9067 6669 0100 0a00 0000 0000 .....gfi........
0024dbb0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024dbc0: 0000 0000 5500 3000 0000 0000 0200 0000 ....U.0.........
0024dbd0: 7d00 0000 0100 0000 0101 0100 1100 2800 }.............(.
0024dbe0: 0500 0000 5500 0000 0300 1100 0000 0000 ....U...........
0024dbf0: 0000 0000 0100 0000 0100 0000 0500 0000 ................
0024dc00: 0700 4765 726d 616e 0a00 0000 0000 0000 ..German........
0024dc10: 0007 0046 7265 6e63 680a 0a0a 0000 0000 ...French.......
0024dc20: 0000 0800 456e 676c 6973 680a 0000 0000 ....English.....
0024dc30: 0000 0008 0049 7461 6c69 616e 0a00 0000 .....Italian....
-0024dc40: 90a7 c088 0100 0a00 0000 0000 0000 0000 ................
+0024dc40: 9067 6669 0100 0a00 0000 0000 0000 0000 .gfi............
0024dc50: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024dc60: 5500 3000 0000 0000 0200 0000 7d00 0000 U.0.........}...
0024dc70: 0100 0000 0101 0100 1100 2800 0500 0000 ..........(.....
0024dc80: 5500 0000 0300 1100 0000 0000 0000 0000 U...............
0024dc90: 0100 0000 0100 0000 0500 0000 0800 456e ..............En
0024dca0: 676c 6973 680a 0000 0000 0000 0007 0046 glish..........F
0024dcb0: 7265 6e63 680a 0a0a 0000 0000 0000 0100 rench...........
0024dcc0: 0a00 0000 0000 0000 0000 0000 0000 0001 ................
-0024dcd0: 000a 0000 0000 0000 0000 0000 90a7 c088 ................
+0024dcd0: 000a 0000 0000 0000 0000 0000 9067 6669 .............gfi
0024dce0: 0100 0a00 0000 0000 0000 0000 0000 0000 ................
0024dcf0: 0000 0000 0000 0000 0000 0000 5500 3000 ............U.0.
0024dd00: 0000 0000 0200 0000 7d00 0000 0100 0000 ........}.......
0024dd10: 0101 0100 1100 2800 0500 0000 5500 0000 ......(.....U...
0024dd20: 0300 1100 0000 0000 0000 0000 0100 0000 ................
0024dd30: 0100 0000 0500 0000 0800 456e 676c 6973 ..........Englis
0024dd40: 680a 0000 0000 0000 0007 0047 6572 6d61 h..........Germa
0024dd50: 6e0a 0a0a 0000 0000 0000 0700 4672 656e n...........Fren
0024dd60: 6368 0a00 0000 0000 0000 0001 000a 0000 ch..............
-0024dd70: 0000 0000 0000 0000 90a7 c088 0100 0a00 ................
+0024dd70: 0000 0000 0000 0000 9067 6669 0100 0a00 .........gfi....
0024dd80: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024dd90: 0000 0000 0000 0000 5500 3000 0000 0000 ........U.0.....
0024dda0: 0200 0000 7d00 0000 0100 0000 0101 0100 ....}...........
0024ddb0: 1100 2800 0500 0000 5500 0000 0300 1100 ..(.....U.......
0024ddc0: 0000 0000 0000 0000 0100 0000 0100 0000 ................
0024ddd0: 0500 0000 0800 456e 676c 6973 680a 0000 ......English...
0024dde0: 0000 0000 0008 0053 7061 6e69 7368 0a0a .......Spanish..
0024ddf0: 0000 0000 0000 0100 0a00 0000 0000 0000 ................
0024de00: 0000 0000 0000 0001 000a 0000 0000 0000 ................
-0024de10: 0000 0000 90a7 c088 0100 0a00 0000 0000 ................
+0024de10: 0000 0000 9067 6669 0100 0a00 0000 0000 .....gfi........
0024de20: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024de30: 0000 0000 5500 3000 0000 0000 0200 0000 ....U.0.........
0024de40: 7d00 0000 0100 0000 0101 0100 1100 2800 }.............(.
0024de50: 0500 0000 5500 0000 0300 1100 0000 0000 ....U...........
0024de60: 0000 0000 0100 0000 0100 0000 0500 0000 ................
0024de70: 0800 456e 676c 6973 680a 0000 0000 0000 ..English.......
0024de80: 0007 0047 6572 6d61 6e0a 6e0a 0000 0000 ...German.n.....
0024de90: 0000 0700 4672 656e 6368 0a00 0000 0000 ....French......
0024dea0: 0000 0001 000a 0000 0000 0000 0000 0000 ................
-0024deb0: 90a7 c088 0100 0a00 0000 0000 0000 0000 ................
+0024deb0: 9067 6669 0100 0a00 0000 0000 0000 0000 .gfi............
0024dec0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024ded0: 5500 3000 0000 0000 0200 0000 7d00 0000 U.0.........}...
0024dee0: 0100 0000 0101 0100 1100 2800 0500 0000 ..........(.....
0024def0: 5500 0000 0300 1100 0000 0000 0000 0000 U...............
0024df00: 0100 0000 0100 0000 0500 0000 0800 456e ..............En
0024df10: 676c 6973 680a 0000 0000 0000 0007 0047 glish..........G
0024df20: 6572 6d61 6e0a 6e0a 0000 0000 0000 0700 erman.n.........
0024df30: 4672 656e 6368 0a00 0000 0000 0000 0001 French..........
-0024df40: 000a 0000 0000 0000 0000 0000 90a7 c088 ................
+0024df40: 000a 0000 0000 0000 0000 0000 9067 6669 .............gfi
0024df50: 0100 0a00 0000 0000 0000 0000 0000 0000 ................
0024df60: 0000 0000 0000 0000 0000 0000 5500 3000 ............U.0.
0024df70: 0000 0000 0200 0000 7d00 0000 0100 0000 ........}.......
0024df80: 0101 0100 1100 2800 0500 0000 5500 0000 ......(.....U...
0024df90: 0300 1100 0000 0000 0000 0000 0100 0000 ................
0024dfa0: 0100 0000 0500 0000 0900 4a61 7061 6e65 ..........Japane
0024dfb0: 7365 0a00 0000 0000 0009 004d 616e 6461 se.........Manda
0024dfc0: 7269 6e0a 0000 0000 0000 0800 456e 676c rin.........Engl
0024dfd0: 6973 680a 0000 0000 0000 0001 000a 0000 ish.............
-0024dfe0: 0000 0000 0000 0000 90a7 c088 0100 0a00 ................
+0024dfe0: 0000 0000 0000 0000 9067 6669 0100 0a00 .........gfi....
0024dff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024e000: 0510 0000 0100 0000 0000 0000 2701 0000 ............'...
0024e010: 0100 0000 8500 0600 841f 7900 e81e 9900 ..........y.....
0024e020: 701e 7500 081e 6800 901d 7600 0c1d 8100 p.u...h...v.....
0024e030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024e040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0024e050: 0000 0000 0000 0000 0000 0000 0000 0000 ................

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

I tried to find out what actual data makes the differences. Looking at the first difference:

--- employee-1.fdb
+++ employee-2.fdb
0024d9e0: 65 6d 65 6e 74 73 2e 0a 00 00 00 00 00 00 00 00 ements..........
0024d9f0: 5500 3000 0000 0000 0200 0000 7d00 0000 U.0.........}...
0024da00: 0100 0000 0101 0100 1100 2800 0500 0000 ..........(.....
0024da10: 5500 0000 0300 1100 0000 0000 0000 0000 U...............
0024da20: 0100 0000 0100 0000 0500 0000 0800 456e ..............En
0024da30: 676c 6973 680a 0000 0000 0000 0007 0046 glish..........F
0024da40: 7265 6e63 680a 0a0a 0000 0000 0000 0800 rench...........
0024da50: 5370 616e 6973 680a 0000 0000 0000 0001 Spanish.........
-0024da60: 000a 7461 6c69 616e 0a00 0000 90a7 c088 ..talian........
+0024da60: 000a 7461 6c69 616e 0a00 0000 9067 6669 ..talian.....gfi
0024da70: 01 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0024da80: 00 00 00 00 00 00 00 00 00 00 00 00 55 00 30 00 ............U.0.

This is a data page, containing record data. The record is spread from 0x24d9e8 to 0x24da80. It is flagged as stream BLOB (flags 0x30), but appears to contain ARRAY data. The array has 5 (6?) elements of type VARCHAR(15) - "English\n", "French\n", "Spanish\n" and "\n". The storage of the values takes 17 bytes - 2 for the length of the VARCHAR and 15 for the actual data. The actual data is padded so that it takes 15 bytes, and this padding is where the difference is.

I'll see if I can make fdb-r15y-prune to wipe these unused padding bytes, but that would require some extra intelligence like linking record data to the table structure. Currently it is relatively simple examining pages one by one and wiping unused ranges based only on the page header.

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

The more I delve in the details, the more I think that making the engine produce byte-for-byte equal results when given equal input is not the way to go. There are too many moving parts in the engine, and there is more than one way to describe the same data in the database file.

Instead, I'd rather deferring the creation of the security database and all others (employee, qli's, help) to installation time.

This is easily done for security3.fdb, which is created using simple SQL script.

employee.fdb is mostly populated with SQL, but array and blob data is added via empbuild.e. Firebird 3 already supports feeding text blobs via SQL, but arrays are still a problem (See CORE710).

My solution for the Debian package will be to 1) remove employee.fdb; and 2) create security3.fdb upon package installation. This way no .fdb files are part of the packages, making them deterministic.

(qli is not part of the Debian packages since 2011)

I am not happy to drop employee.fdb from the packages since it is a nice sample database using many features. I think that the ideal fix would be to rework employee.fdb a bit to avoid usage of arrays, and create it entirely out of plain SQL.

@firebird-automations
Copy link
Collaborator Author

Commented by: @hvlad

You could try to use ServerMode = Classic while build databases.
The guess is that some non-determinism is introduced by background garbage collector.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

You may package employee.fbk instead of .fdb, its layout should be more consistent. Those who need examples will restore this backup themselves.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Or appropriate gbak command to restore database may be added to the package install script.

What about security.db. Presence of creating it SQL script is just a legacy way to install legacy plugin. SRP plugin creates all required data structures itself on first run, and this is recommended way to go for new plugins. I.e. if you do not use legacy auth - security.db may be created empty at install time.

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

Thanks for the replies,

gbak may be the way to go (backup during build, restore during install), but still the backups are different every time:

$ gbak -user sysdba -b emp.fdb emp.fbk1
$ gbak -user sysdba -b emp.fdb emp.fbk2
$ gbak -user sysdba -b emp.fdb emp.fbk3
$ sha1sum emp.fbk*
6d98a1cd20a24919e4a953dc44cb447f082147e5 emp.fbk1
b5b9bdec406cf54edd48c3c5603f6b1edea73ddb emp.fbk2
08fb37bc241b50874d96e1d5b7b221966e987eff emp.fbk3

$ diffoscope emp.fbk1 emp.fbk2
--- emp.fbk1
+++ emp.fbk2
@@ -1,11 +1,11 @@
00000000: 0002 040a 0000 0004 0401 0000 0005 0401 ................
00000010: 0000 0006 0400 0004 0008 0401 0000 0007 ................
00000020: 0765 6d70 2e66 6462 0118 5765 6420 4a75 .emp.fdb..Wed Ju
-00000030: 6c20 3236 2031 393a 3135 3a34 3120 3230 l 26 19:15:41 20
+00000030: 6c20 3236 2031 393a 3135 3a34 3320 3230 l 26 19:15:43 20
00000040: 3137 000e 0e04 0300 0000 0504 0020 0000 17........... ..
00000050: 0804 204e 0000 0c04 0100 0000 0107 656d .. N..........em
00000060: 702e 6664 6200 0107 0753 514c 2433 3633 p.fdb....SQL$363
00000070: 0b04 4e4f 4e45 0002 0109 4649 5253 544e ..NONE....FIRSTN
00000080: 414d 4508 0425 0000 000a 040f 0000 0009 AME..%..........
00000090: 0400 0000 000b 0400 0000 0029 040f 0000 ...........)....
000000a0: 002a 0400 0000 002b 0400 0000 0019 0753 .*.....+.......S

Would you accept a patch replacing arrays in eployee.fdb with detail-tables (making it re-buildable via SQL)? I'd rather not spend more time on binary formats.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

I dislike such patch. Arrays are not actively supported feature but they are not even deprecated. What about the difference you see - if it's the only difference it's not too big problem. gbak stores current datetime and certainly it changes. May be adding a switch to it making that time to be a day when 'official' set of packages is built will be a good fix?

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

Alex,

Yes, I would try such a switch if it existed. But maybe it is not necessary after all. I discovered the 'faketime' utility which can fool programs (via a library loaded using LD_PRELOAD) about the current time. I'll report back if it works with gbak.

@firebird-automations
Copy link
Collaborator Author

Commented by: @AlexPeshkoff

Damyan, re

> I would try such a switch if it existed

I wanted to say that if the only problem will stay missing switch it's not too hard to add it.

@firebird-automations
Copy link
Collaborator Author

Commented by: Damyan Ivanov (dam)

It seems the date is not the only source of differences. See https://tests.reproducible-builds.org/debian/dbdtxt/unstable/amd64/firebird3.0_3.0.2.32703.ds4-8.diffoscope.txt

The major difference here is the contents of the backup file. This is probably because the data in the database is not guaranteed to have any particular order. I doubt this is solvable or even worth solving.

To me it seems that the only solution to this is to use plain SQL in the package. Today this means not filling the array columns, but the other "solution" is to not have employees example database at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant