Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong order for Chinese characters with CHARACTER SET GBK [CORE1335] #1754

Open
firebird-automations opened this issue Jul 2, 2007 · 12 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: henryxu (henryxu)

Attachments:
TESTGBK.FDB
InsertSQL.sql

There's wrong order for Chinese characters if database is with CHARACTER SET GBK.

Test case:
1. Create Database
SET SQL DIALECT 3;
CREATE DATABASE 'D:\TestGBK.fdb'
USER 'SYSDBA' PASSWORD 'masterkey'
PAGE_SIZE 4096
DEFAULT CHARACTER SET GBK

2.Create Table
CREATE TABLE TB_CHINAPORT (
FID BIGINT NOT NULL,
FPORTCH VARCHAR(20),
FPORTEN VARCHAR(20));

3.Insert Record
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (1, '??', 'SHANGHAI');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (2, '??', 'NINGBO');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (3, '??', 'SHENZHEN');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (4, '??', 'WENZHOU');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (5, '??', 'XIAMEN');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (6, '??', 'ZHANJIANG');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (7, '??', 'DALIAN');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (8, '??', 'QINGDAO');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (9, '??', 'TIANJIN');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (10, '??', 'SHANTOU');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (11, '???', 'ZHANGJIAGANG');
INSERT INTO TB_CHINAPORT (FID, FPORTCH, FPORTEN) VALUES (12, '??', 'GUANGZHOU');

p.s.
Field "FPORTCH": Chinese Port with Chinese characters:
Field "FPORTEN": The Chinese Spell for Field "FPORTCH"

4.Test "ORDER BY"
SELECT FPORTCH ,FPORTEN
FROM TB_CHINAPORT
ORDER BY 1

Result is wrong:

FPORTCH FPORTEN

?? SHANGHAI
?? XIAMEN
?? DALIAN
?? TIANJIN
?? NINGBO
?? GUANGZHOU
??? ZHANGJIAGANG
?? SHANTOU
?? SHENZHEN
?? WENZHOU
?? ZHANJIANG
?? QINGDAO

the correct order(same to ordering by FPORTEN)

FPORTCH FPORTEN

?? DALIAN
?? GUANGZHOU
?? NINGBO
?? QINGDAO
?? SHANGHAI
?? SHANTOU
?? SHENZHEN
?? TIANJIN
?? WENZHOU
?? XIAMEN
??? ZHANGJIAGANG
?? ZHANJIANG

p.s.:
If CHARACTER SET is GB2312,the result is correct.

@firebird-automations
Copy link
Collaborator Author

Commented by: henryxu (henryxu)

1. CHARACTER SET is GBK
2. Created by firebird 2.1 beta1 16038

@firebird-automations
Copy link
Collaborator Author

Modified by: henryxu (henryxu)

Attachment: TESTGBK.FDB [ 10450 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: henryxu (henryxu)

environment: System: Winxp Sp2
CPU: P4 2.6MHZ
GUI tool: IBExpert 2007.06.05
FB Version: Firebird 2.1 beta1(15999-16038)

=>

System: Winxp Sp2
CPU: P4 2.6GHZ
GUI tool: IBExpert 2007.06.05
FB Version: Firebird 2.1 beta1(15999-16038)

@firebird-automations
Copy link
Collaborator Author

Commented by: henryxu (henryxu)

Sorry,because the size of database is big,now I upload the INSERT SCRIPT.

@firebird-automations
Copy link
Collaborator Author

Modified by: henryxu (henryxu)

Attachment: InsertSQL.sql [ 10451 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pcisar

Zip it and use "Attach file".

@firebird-automations
Copy link
Collaborator Author

Commented by: morgan mo (morgan)

It likes order by UCS.
GBK Chinese character distribute into two part: GB2312 part ( GBK/2: B0A1-F7FE ) and GB 13000.1 extend Chinese character.

The GB 2312 part contain 6763 Chinese character , it is as same as GB2312, so the order of character is PinYin and character strokes as the GB2312.
The GB 13000.1 extend Chinese character divide into : GBK/3: 8140-A0FE, contains 6080 CJK Chinese character, them order by UCS; GBK/4: AA40-FEA0, contains 8160 CJK Chinese character and other expand Chinese character, the CJK part order by UCS and other expand Chinese character order by The Kangxi Dictionary.

The simplest way that deal with the order of GBK Chinese character maybe is add the collation to GB2312.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Yes, it orders in binary encode order.
What about GBK_UNICODE collate? Isn't it sufficient?

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Argh... I didn't want to resolve it.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Resolved [ 5 ] => Reopened [ 4 ]

resolution: Fixed [ 1 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Workflow: jira [ 12480 ] => Firebird [ 15241 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant