Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNICODE collations does not work with ICU 49 [CORE3946] #4279

Closed
firebird-automations opened this issue Oct 4, 2012 · 33 comments
Closed

UNICODE collations does not work with ICU 49 [CORE3946] #4279

firebird-automations opened this issue Oct 4, 2012 · 33 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: @mkubecek

Attachments:
collation-gdb.txt

On system with ICU 4.9, command

create collation test_1 for UTF8 from UNICODE;

fails with

Statement failed, SQLSTATE = 42000
unsuccessful metadata update
-Invalid collation attributes

With ICU 4.4, the same command succeeds. With ICU 4.9, it fails with UTF8 and any collation but succeeds with ISO8859_1 or ISO8859_2 charset (tested all ISO8859_1 and about half of ISO8859_2 collations).

Commits: 36dcd8e 8ce4b58

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Does the builtin UNICODE collation works?

What is the Linux distro?

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

It doesn't seem to work:

SQL> create database 'localhost:test' default character set UTF8;
SQL> create table TBL(S varchar(32) collate UNICODE);
Statement failed, SQLSTATE = 22021
unsuccessful metadata update
-TBL
-COLLATION UNICODE for CHARACTER SET UTF8 is not installed

Distribution is OpenSuSE 12.2. Tested with distribution package (2.5) and 3.0 package from

http://download.opensuse.org/repositories/home:/mkubecek:/firebird30/openSUSE_12.2/

Successful tests were on OpenSuSE 11.1 and 11.4 with 2.5 packages from

http://download.opensuse.org/repositories/home:/mkubecek:/firebird25/

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Is there anything in firebird.log?

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

Nothing at all, neither for "create collation" nor for "create table".

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Are you using 32 or 64 bit version?

Please paste the result of:
find /usr/ /lib* -name 'libicu*'

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

It is 64-bit version.

unicorn:~ #⁠ find /usr/ /lib* -name 'libicu*'
/usr/share/susehelp/meta/Development/Libraries/libicu-doc.desktop
/usr/share/susehelp/meta/Development/Libraries/libicu17.desktop
/usr/share/susehelp/meta/Development/Libraries/libicu-devel.desktop
/usr/lib64/libicule.so.49.1
/usr/lib64/libicui18n.so
/usr/lib64/libicutu.so.49
/usr/lib64/libiculx.so.49
/usr/lib64/libicudata.so
/usr/lib64/libiculx.so
/usr/lib64/libicuuc.so.49
/usr/lib64/libicule.so.49
/usr/lib64/libicuuc.so.49.1
/usr/lib64/libicui18n.so.49.1
/usr/lib64/libicuio.so.49
/usr/lib64/libicudata.so.49
/usr/lib64/libicutest.so.49
/usr/lib64/libicudata.so.49.1
/usr/lib64/libicuio.so.49.1
/usr/lib64/libicutu.so.49.1
/usr/lib64/libicutu.so
/usr/lib64/libiculx.so.49.1
/usr/lib64/libicule.so
/usr/lib64/libicui18n.so.49
/usr/lib64/libicuuc.so
/usr/lib64/libicutest.so
/usr/lib64/libicutest.so.49.1
/usr/lib64/libicuio.so

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

What's the result of command below?

objdump -T /usr/lib64/libicuuc.so.49 |grep 'u_init\|u_versionToString\|uloc_countAvailable\|uloc_getAvailable\|uset_close\|uset_getItem\|uset_getItemCount\|uset_open'
objdump -T /usr/lib64/libicuuc.so.49 |grep 'ucnv_open\|ucnv_close\|ucnv_fromUChars\|u_tolower\|u_toupper\|u_strCompare\|u_countChar32\|utf8_nextCharSafeBody\|UCNV_FROM_U_CALLBACK_STOP\|UCNV_TO_U_CALLBACK_STOP\|ucnv_fromUnicode'
objdump -T /usr/lib64/libicuuc.so.49 |grep 'ucnv_toUnicode\|ucnv_getInvalidChars\|ucnv_getMaxCharSize\|ucnv_getMinCharSize\|ucnv_setFromUCallBack\|ucnv_setToUCallBack'
objdump -T /usr/lib64/libicui18n.so.49 |grep 'ucol_close\|ucol_getContractions\|ucol_getSortKey\|ucol_open\|ucol_setAttribute\|ucol_strcoll\|ucol_getVersion\|utrans_open\|utrans_close\|utrans_transUChars'

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

mike@unicorn:~> objdump -T /usr/lib64/libicuuc.so.49 |grep 'u_init\|u_versionToString\|uloc_countAvailable\|uloc_getAvailable\|uset_close\|uset_getItem\|uset_getItemCount\|uset_open'
000000000005b3d0 g DF .text 0000000000000155 Base u_versionToString_49
00000000000e22b0 g DF .text 0000000000000012 Base uset_close_49
000000000009aae0 g DF .text 000000000000004f Base uloc_getAvailable_49
00000000000debf0 g DF .text 0000000000000094 Base uset_openPattern_49
00000000000df030 g DF .text 0000000000000005 Base uset_closeOver_49
00000000000e2200 g DF .text 000000000000004d Base uset_openEmpty_49
000000000009ab30 g DF .text 0000000000000034 Base uloc_countAvailable_49
00000000000dec90 g DF .text 00000000000000bc Base uset_openPatternOptions_49
000000000005c840 g DF .text 000000000000006f Base u_init_49
00000000000e2250 g DF .text 0000000000000060 Base uset_open_49
00000000000e26d0 g DF .text 000000000000013a Base uset_getItem_49
00000000000e2690 g DF .text 0000000000000035 Base uset_getItemCount_49

mike@unicorn:~> objdump -T /usr/lib64/libicuuc.so.49 |grep 'ucnv_open\|ucnv_close\|ucnv_fromUChars\|u_tolower\|u_toupper\|u_strCompare\|u_countChar32\|utf8_nextCharSafeBody\|UCNV_FROM_U_CALLBACK_STOP\|UCNV_TO_U_CALLBACK_STOP\|ucnv_fromUnicode'
00000000000667c0 g DF .text 000000000000014e Base ucnv_close_49
0000000000070dd0 g DF .text 0000000000000284 Base ucnv_fromUnicode_UTF8_49
0000000000066250 g DF .text 00000000000000b1 Base ucnv_openU_49
00000000000ab280 g DF .text 00000000000000d3 Base u_countChar32_49
000000000006d860 g DF .text 0000000000000002 Base UCNV_FROM_U_CALLBACK_STOP_49
0000000000066310 g DF .text 0000000000000084 Base ucnv_openCCSID_49
00000000000aa900 g DF .text 0000000000000029 Base u_strCompare_49
0000000000066240 g DF .text 0000000000000005 Base ucnv_openPackage_49
00000000000a9830 g DF .text 000000000000021a Base utf8_nextCharSafeBody_49
000000000006cc60 g DF .text 00000000000000d2 Base ucnv_openAllNames_49
00000000000caaf0 g DF .text 000000000000000e Base u_toupper_49
000000000006c060 g DF .text 0000000000000122 Base ucnv_openStandardNames_49
00000000000aa2f0 g DF .text 0000000000000158 Base u_strCompareIter_49
000000000006d870 g DF .text 0000000000000002 Base UCNV_TO_U_CALLBACK_STOP_49
0000000000066210 g DF .text 0000000000000023 Base ucnv_open_49
00000000000670e0 g DF .text 000000000000021a Base ucnv_fromUChars_49
00000000000caae0 g DF .text 000000000000000e Base u_tolower_49
0000000000066cc0 g DF .text 00000000000001d3 Base ucnv_fromUnicode_49
0000000000071060 g DF .text 0000000000000367 Base ucnv_fromUnicode_UTF8_OFFSETS_LOGIC_49

mike@unicorn:~> objdump -T /usr/lib64/libicuuc.so.49 |grep 'ucnv_toUnicode\|ucnv_getInvalidChars\|ucnv_getMaxCharSize\|ucnv_getMinCharSize\|ucnv_setFromUCallBack\|ucnv_setToUCallBack'
0000000000066ea0 g DF .text 0000000000000231 Base ucnv_toUnicode_49
0000000000068cb0 g DF .text 0000000000000057 Base ucnv_getInvalidChars_49
0000000000066aa0 g DF .text 000000000000000d Base ucnv_getMinCharSize_49
0000000000066c90 g DF .text 000000000000002f Base ucnv_setFromUCallBack_49
0000000000066c50 g DF .text 0000000000000031 Base ucnv_setToUCallBack_49
0000000000066a90 g DF .text 0000000000000005 Base ucnv_getMaxCharSize_49

mike@unicorn:~> objdump -T /usr/lib64/libicui18n.so.49 |grep 'ucol_close\|ucol_getContractions\|ucol_getSortKey\|ucol_open\|ucol_setAttribute\|ucol_strcoll\|ucol_getVersion\|utrans_open\|utrans_close\|utrans_transUChars'
00000000001191a0 g DF .text 0000000000000449 Base ucol_setAttribute_49
000000000011e090 g DF .text 0000000000000022 Base ucol_openRules_49
000000000010a6f0 g DF .text 00000000000000d4 Base ucol_openElements_49
000000000010a7d0 g DF .text 0000000000000072 Base ucol_closeElements_49
000000000011e110 g DF .text 0000000000000648 Base ucol_open_internal_49
0000000000115510 g DF .text 00000000000000dc Base ucol_getSortKey_49
0000000000138e30 g DF .text 0000000000000012 Base utrans_close_49
0000000000138dd0 g DF .text 0000000000000015 Base utrans_openInverse_49
000000000011b1a0 g DF .text 000000000000006d Base ucol_getVersion_49
00000000001227c0 g DF .text 0000000000000019 Base ucol_getContractions_49
0000000000121d40 g DF .text 0000000000000215 Base ucol_openFromShortString_49
0000000000118f60 g DF .text 000000000000000a Base ucol_openBinary_49
000000000011e760 g DF .text 0000000000000049 Base ucol_open_49
000000000011cc90 g DF .text 0000000000000034 Base ucol_openAvailableLocales_49
000000000011db90 g DF .text 00000000000004f9 Base ucol_openRulesForImport_49
000000000011b6e0 g DF .text 00000000000008dd Base ucol_strcoll_49
00000000001390c0 g DF .text 00000000000000a5 Base utrans_openIDs_49
0000000000138d00 g DF .text 00000000000000c1 Base utrans_open_49
000000000011b2e0 g DF .text 00000000000003f7 Base ucol_strcollIter_49
00000000001392d0 g DF .text 000000000000012e Base utrans_transUChars_49
000000000010d630 g DF .text 000000000000015a Base ucol_close_49
0000000000117f70 g DF .text 00000000000000fb Base ucol_getSortKeyWithAllocation_49
0000000000138ba0 g DF .text 0000000000000160 Base utrans_openU_49
0000000000122660 g DF .text 000000000000015c Base ucol_getContractionsAndExpansions_49

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

I do not see anything problematic.

I need you send backtrace of problem, specifying exact version/buildnum you're using (I do prefer it's done with 3.0).

gdb -args isql -ch utf8
(gdb) run
create database 'test.fdb';
<ctrl c>
(gdb) catch throw
(gdb) cont
select 1 from rdb$database where 'a' = 'a' collate unicode;
-- gdb must catch an exception
(gdb) bt

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

This output was created with 3.0.0.30084 (svn revision 57178).

The exception is caught in src/jrd/intl.cpp, line 398, CharSetContainer::lookupCollation():

if \(\!lookup\_texttype\(tt, &info\)\)
\{
    delete tt;
    ERR\_post\(Arg::Gds\(isc\_collation\_not\_installed\) << Arg::Str\(info\.collationName\) <<
        Arg::Str\(info\.charsetName\)\);
\}

Value of info can be found in the attachment.

@firebird-automations
Copy link
Collaborator Author

Modified by: @mkubecek

Attachment: collation-gdb.txt [ 12238 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Please run in ISQL:

show collation unicode;

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

SQL> show collation unicode;
UNICODE, CHARACTER SET UTF8, PAD SPACE, SYSTEM

I also played a bit more with gdb and got to this stack:

#⁠0 Jrd::UnicodeUtil::Utf16Collation::loadICU src/common/unicode_util.cpp:1463
#⁠1 Jrd::UnicodeUtil::Utf16Collation::create src/common/unicode_util.cpp:1143
#⁠2 Firebird::IntlUtil::initUnicodeCollation src/common/IntlUtil.cpp:528
#⁠3 ttype_unicode8_init src/jrd/intl_builtin.cpp:1081
#⁠4 Jrd::IntlManager::lookupCollation src/jrd/IntlManager.cpp:636
#⁠5 lookup_texttype src/jrd/intl.cpp:497
#⁠6 CharSetContainer::lookupCollation src/jrd/intl.cpp:394
...

where loadICU(""41.128.4.4", "", ""icu_versions=default") fails

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Don't know why, but the collation on your database is initialized incorrectly.

Please locate fbintl.conf and set icu_versions to 4.9:
icu_versions 4.9

Then retry. Create a new database, show the collation and test to see what happens.

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

Now it works (with newly created database) and output of 'show collation' is different:

SQL> create database '/srv/firebird/test3.fdb';
SQL> show collation UNICODE;
UNICODE, CHARACTER SET UTF8, PAD SPACE, 'COLL-VERSION=58.0.6.49', SYSTEM

SQL> select 1 from rdb$database where 'a' = 'a' collate unicode;

CONSTANT 

============
1

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Was the test with "icu_versions default" done with a fresh new database too?

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

Yes (I checked again now to be sure). Could the problem be caused by some part of ICU (or something else) missing during the build?

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

I did the same test on OpenSuSE 12.1 with ICU 4.6 and 4.8.1 (used both for build and test) and the same version of Firebird. In both cases the collation works even with 'icu_versions = default'. So it looks like some incompatibility introduced between ICU 4.8 and 4.9.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

"default" means the version present at build time. Looks like you have an installed dev package without the actual runtime paackage. Or some problem in the build include path.

Locate these lines (here 887) in src/common/unicode_util.cpp:

string version = icuVersion\.isEmpty\(\) ? versions\[0\] : icuVersion;
if \(version == "default"\)
	version\.printf\("%d\.%d", U\_ICU\_VERSION\_MAJOR\_NUM, U\_ICU\_VERSION\_MINOR\_NUM\);

for \(ObjectsArray<string\>::const\_iterator i\(versions\.begin\(\)\); i \!= versions\.end\(\); \+\+i\)

put a breakpoint on the last (for) line in the gdb prompt:
(gdb) b unicode_util.cpp:887

Once the breakpoint is reach, print version:
(gdb) print version.stringBuffer

Or do play at compile time and check where U_ICU_VERSION_MAJOR_NUM and U_ICU_VERSION_MINOR_NUM is coming from and what's they values.

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

I get majorVersion = 49, minorVersion = 1, which after

filename.printf(ucTemplate, majorVersion, minorVersion);

gives filename = ""libicuuc.so.491". This fails to load as the name should probably be "libicuuc.so.49". I checked libicu header files and indeed, version 49 (4.9) defines U_ICU_VERSION_MAJOR_NUM=49, U_ICU_VERSION_MINOR_NUM=1 while version 48 (4.8) defined U_ICU_VERSION_MAJOR_NUM=4, U_ICU_VERSION_MINOR_NUM=8.

Looking at the version macros defined by 49 (4.9) and 48 (4.8), it seems U_ICU_VERSION_SHORT might be the right one but I'm not sure it will work correctly with older versions as well. Or maybe we could just distinguish cases U_ICU_VERSION_MAJOR_NUM>4 and U_ICU_VERSION_MAJOR_NUM<=4 (and hope they won't change the scheme again).

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Problem is that this is not ICU 4.9, it's ICU 49 really, but they changed how this is encoded in the filename.

Looks like these people has nothing else to do!

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

summary: create collation for UTF8 from UNICODE fails with ICU 4.9 => UNICODE collations does not work with ICU 49

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

I commited a fix for FB 3.0, without testing with ICU 49. Please test it.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

hope you will backport it to 2.5
since Mageia, Fedora and certainly others distribution are using 49 in their next coming release

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

I confirm that both standard UNICODE collation and custom collation created from it work as expected now. Thank you.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Committed to 2.5 branch. Please test it.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.5.2 [ 10450 ]

Fix Version: 3.0 Alpha 1 [ 10331 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @mkubecek

Current 2.5 from subversion works for me. Thank you.

@firebird-automations
Copy link
Collaborator Author

Commented by: @pmakowski

this is marked as fixed in 2.5.2, but it is not the case

@firebird-automations
Copy link
Collaborator Author

Modified by: @pmakowski

Fix Version: 2.5.3 [ 10461 ]

Fix Version: 2.5.2 [ 10450 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Closed [ 6 ] => Closed [ 6 ]

QA Status: No test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment