Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Japanese character set CP943C [CORE1324] #1743

Closed
firebird-automations opened this issue Jun 15, 2007 · 17 comments
Closed

Japanese character set CP943C [CORE1324] #1743

firebird-automations opened this issue Jun 15, 2007 · 17 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: KIMURA, Meiji (meijik)

In Firebird 2.0 or later, character set conversion method has changed, then "Windows-31J" extension were
cannot use in FB2.0 or later environment. (detail in quated mail as below)

This is a severe problem for japanese user. Typical develloper use delphi with FB1.0 or 1.5 on Windows server and
use "Windows-31J" extension, the same code don't work on FB 2.0 or later. Then many japanese user cannot migrate
from FB1.x to 2.x.

Please add character set 'cp932' to FB 2.1.
#⁠ I will help to test it.

Fortunately, iCU routine has 'Windows-31j' then use it in order to support 'cp932'.

Regards,
KIMURA, Meiji(FAMILY, Given)

//--> Quated mail as below
[Firebird-devel] Firebird 2.x cannot handle with some japanesecharacters in SJIS_0208 environment.

KIMURA, Meiji wrote:
> In Firebird 1.x, InterBase 6.x or later, 'SJIS_0208' *IS* Shift_JIS in IANA.
> But in the condition that the same character set 'SJIS_0208' between client and server,
> there is no conversion of character set. As a result, 'Windows-31J' extension can use
> with no error.
>
In previous version there is a direct (special) converter from SJIS to
something else and this converter was removed, doing the conversion
through Unicode.

> But in Firebird 2.0 environment, If the same character set 'SJIS_0208' used between
> client and server, Unicode is used as a pivot character set. as a result,
> we cannot use "Windows-31J" extension.
>
I've already heard this, maybe from Daiju.

> It seems that the same problem occurs in MySQL 4.1.
> In the case of MySQL, there is no conversion version 4.0 or before, but
> version 4.1 or later, Unicode is used as a pivot character set, then
> the same problem occurs.
>
> MySQL support character set 'cp932' as a measure for this problem.
> cp932 means 'Windows Codepage 932'. cp932 *IS* Windows-31J in IANA.
>
> I supporse if Firebird 2.0 will support character set 'cp932', we can avoid this problem.
> #⁠ When use iCU routine, use 'windows-31j' instead of 'shift-jis'.
>
This seems to be the way to go.

Adriano

Commits: 5d06ef3 f044f67

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

ICU has CP932 too.
Is it different from Windows-31j or an alias?

@firebird-automations
Copy link
Collaborator Author

Commented by: KIMURA, Meiji (meijik)

I think that there are three candidate for handling shift_jis extensiton.

Converter Explorer
http://demo.icu-project.org/icu-bin/convexp
(1) ibm-942_P12A-1999
(2) ibm-943_P15A-2003
(3) ibm-943_P130-1999

In this time, I think (2) is the best candidate. But some kanji characters are mapped as multiple code, then it maynot good for Firebird 2.0 when using unicode as pivot.

Please wait a week or so, I will test and make these difference clear.

@firebird-automations
Copy link
Collaborator Author

Commented by: KIMURA, Meiji (meijik)

I want to make some program for check the conversion specification.

What 'Internal Converter Name' of Unicode is used for pivot code in Firebird 2.0?

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Please try a snapshot build >= 16169.
It has CP932, but I need you to edit intl/fbintl.conf, trying the other ICU charsets to see what is better.
You need to leave one "collation CP932" uncommented in each try, and restart the server after edit the file:
<charset CP932>
intl_module fbintl
collation CP932 ibm-942_P12A-1999
#⁠ collation CP932 ibm-943_P130-1999
#⁠ collation CP932 ibm-943_P15A-2003
collation CP932_UNICODE
</charset>

Please report here.

@firebird-automations
Copy link
Collaborator Author

Commented by: Dimitrios Chr. Ioannidis (dchri)

Adriano,

actually the (1.)16169 is the revision of the http://writeBuildNum.sh file. The HEAD branch build number as result of your commits increased to 16012, so he must try a snapshot build >= 16012.

regards,

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

Meiji, did you tested it?

@firebird-automations
Copy link
Collaborator Author

Commented by: KIMURA, Meiji (meijik)

Sorry, not yet.

I try to this on Firebird 2.1 Beta, but failed. I tried as above.

(1) Add CP932 definition to intl/fbintl.conf
<charset CP932>
intl_module fbintl
collation CP932 ibm-942_P12A-1999
collation CP932_UNICODE
</charset>

(2) restart firebird server
(3) run SQL Script intl.sql in misc/
(4) SQL> execute procedure sp_register_character_set('CP932',4);

But error message said CHARACTER SET CP932 is not installed.

I have to use newer than FB2.1 beta ? or there is something to do?

@firebird-automations
Copy link
Collaborator Author

Commented by: KIMURA, Meiji (meijik)

Today I try to overwrite latest FB2.1 snapsot after installation FB2.1 Beta.

@firebird-automations
Copy link
Collaborator Author

Commented by: KIMURA, Meiji (meijik)

It works, it seem that 'collation CP932 ibm-943_P130-1999' is good for this purpos, I will tested 2 or 3 days in detail.

@firebird-automations
Copy link
Collaborator Author

Commented by: KIMURA, Meiji (meijik)

I tested it. Please impliment this functions as below.

(i) charset name 'CP943C'
(ii) use ibm-943_P15A-2003

(i)
Strictly speaking, ICU don't have CP932. CP943C is upper set of CP932.
If we use the name 'CP932', it will throw japanese FB users into confusion.
Then we have to the name 'CP943C'.

(ii)
It seems that the specification of 'ibm-943_P15A-2003' is good for ordinary japanese FB users.

As a result, they are good choice for this issue. this function save a lot of japanese users.

<charset CP943C>
intl_module fbintl
collation CP943C ibm-943_P15A-2003
collation CP943C_UNICODE
</charset>

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

summary: Please Support japanese characters cp932 => Japanese character set CP943C

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 2.1 Beta 2 [ 10190 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: Minoru Yoshida (timeful2)

Hi,

Thanks for the addition of new spec.
I tested for CP943C charcter set, and making reports.
http://timeful.co.jp/fbmap/

Using the same character sets connection is very fine.
There are some problems by different character sets connection.
(The red font has described in the report )

1. CP943C to UTF8(and UNICODE_FSS)

The following characters are wrong.

- 0x8790 - 879C (9 chars)
- 0xED40 - EEFC (374 chars)

2. CP943C to SJIS_0208

The following characters are wrong.

- 0x7E
- 0x815F - 81CA(7 chars)
- 0x8740 - 879C(83 chars)
- 0xED40 - EEFC(374 chars)

Note:
ibm-943_P15A-2003
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL

the bytes 81
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=81&s=ALL#layout

the bytes 87
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=87&s=ALL#layout

the bytes ED
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=ED&s=ALL#layout

the bytes EE
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=EE&s=ALL#layout

Regards,
Minoru

@firebird-automations
Copy link
Collaborator Author

Commented by: Minoru Yoshida (timeful2)

I had mistakes. This thread was fixed.....
And the 0x7E character handrings is good(maybe) by isql.
I will retests , and make the new thread.

Regards,
Minoru

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

status: Resolved [ 5 ] => Closed [ 6 ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @pcisar

Workflow: jira [ 12332 ] => Firebird [ 15468 ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment