New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malformed string when trying to cast text containing national (cyryllic) characters to UTF8 if field was created in charset = NONE [CORE3830] #4172
Comments
Commented by: @asfernandes You want to insert invalid data and Firebird told it to you. What's wrong? |
Commented by: @pavel-zotov First question appeared due to my misunderstanding. And what about RDB$EXCEPTION.RDB$MESSAGE ? |
Commented by: @dyemanov The only valid point here is whether RDB$MESSAGE should be Unicode in ODS12 and if not, why. The rest of the ticket is useless. |
Commented by: @asfernandes And the RDB$MESSAGE was discussed already, but no agreement so far AFAIR. Not a matter for the tracker. |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Commented by: @mrotteveel Firebird can convert the NONE to WIN1251 because the text in you query was most likely a WIN1251 text and was inserted as is. When you then try to convert NONE to UTF8 it fails because there are invalid byte combinations for UTF8 encoding (ie first byte for a character is > 127). |
Commented by: Paul Gardner (cantak) Why? The É character is 144 or C9. If I have CHARACTER SET NONE and I cast as UTF8 it throws an error. Why can't this be converted to UTF8 as C3 89? How can someone perform a cast on data? |
Commented by: @dyemanov It *can* be converted if the source value is defined as e.g. WIN1250 (or whatever charset defining É as 0xC9). But it *cannot* be converted if the source value is defined as NONE, because the engine has no clue what real (OS level, application level, etc) charset was used when populating that value. It's simply impossible to convert from "unknown" to anything else. The bytes are treated as UTF8 and validation fails because the input is not a well-formed UTF8 string. |
Commented by: Paul Gardner (cantak) I understand about the NONE character set. However I tried an ASCII field and it also fails to convert to UTF8. |
Submitted by: @pavel-zotov
C:\MIX\firebird\misc>isql
Use CONNECT or CREATE DATABASE to specify a database
SQL> create database 'tcs2.fdb'; commit;
SQL> show database;
Database: tcs2.fdb
Owner: SYSDBA
PAGE_SIZE 4096
Number of DB pages allocated = 196
Sweep interval = 20000
Forced Writes are ON
Transaction - oldest = 3
Transaction - oldest active = 4
Transaction - oldest snapshot = 4
Transaction - Next = 5
ODS = 11.2
Default Character set: NONE
SQL> select a.mon$character_set_id cset_id,c.rdb$character_set_name cset_name
CON> from mon$attachments a
CON> left join rdb$character_sets c on a.mon$character_set_id=c.rdb$character_set_id
CON> where a.mon$attachment_id=current_connection;
CSET_ID CSET_NAME
======= ===============================================================================
0 NONE
SQL> recreate table ttt_none(msg varchar(60)); commit;
SQL> insert into ttt_none values('some message'); commit;
SQL> select cast(msg as varchar(60) character set utf8) msg from ttt_none;
MSG
===============================================================================
some message
SQL> insert into ttt_none values('некоторое сообщение'); commit; -- some text in russian here
SQL> select cast(msg as varchar(60) character set utf8) msg from ttt_none;
MSG
===============================================================================
some message
Statement failed, SQLSTATE = 22000
Malformed string
SQL>
But:
-----
SQL> select cast(msg as varchar(60) character set win1251) msg from ttt_none;
MSG
some message
некоторое сообщение -- is extracted OK and seems readable.
------------
PS.
SQL> show version;
ISQL Version: WI-V2.5.1.26351 Firebird 2.5
Server version:
Firebird/x86/Windows NT (access method), version "WI-V2.5.1.26351 Firebird 2.5"
Firebird/x86/Windows NT (remote server), version "WI-V2.5.1.26351 Firebird 2.5/XNet (CSPROG)/P12"
Firebird/x86/Windows NT (remote interface), version "WI-V2.5.1.26351 Firebird 2.5/XNet (CSPROG)/P12"
on disk structure version 11.2
PPS.
The RDB$EXCEPTION table contains field RDB$MESSAGE that is created in charset = NONE.
Is it correct ?
The text was updated successfully, but these errors were encountered: