Issue Details (XML | Word | Printable)

Key: CORE-1324
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Adriano dos Santos Fernandes
Reporter: KIMURA, Meiji
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Firebird Core

Japanese character set CP943C

Created: 15/Jun/07 07:42 PM   Updated: 26/Oct/07 11:46 AM
Component/s: Charsets/Collation
Affects Version/s: 2.0.0, 2.0.1, 2.1 Initial, 2.1 Alpha 1
Fix Version/s: 2.1 Beta 2

Environment: Firebird 2.0 or lator.


 Description  « Hide
In Firebird 2.0 or later, character set conversion method has changed, then "Windows-31J" extension were
cannot use in FB2.0 or later environment. (detail in quated mail as below)

This is a severe problem for japanese user. Typical develloper use delphi with FB1.0 or 1.5 on Windows server and
use "Windows-31J" extension, the same code don't work on FB 2.0 or later. Then many japanese user cannot migrate
from FB1.x to 2.x.

Please add character set 'cp932' to FB 2.1.
# I will help to test it.

Fortunately, iCU routine has 'Windows-31j' then use it in order to support 'cp932'.

Regards,
KIMURA, Meiji(FAMILY, Given)



//--> Quated mail as below
 [Firebird-devel] Firebird 2.x cannot handle with some japanesecharacters in SJIS_0208 environment.

KIMURA, Meiji wrote:
> In Firebird 1.x, InterBase 6.x or later, 'SJIS_0208' *IS* Shift_JIS in IANA.
> But in the condition that the same character set 'SJIS_0208' between client and server,
> there is no conversion of character set. As a result, 'Windows-31J' extension can use
> with no error.
>
In previous version there is a direct (special) converter from SJIS to
something else and this converter was removed, doing the conversion
through Unicode.

> But in Firebird 2.0 environment, If the same character set 'SJIS_0208' used between
> client and server, Unicode is used as a pivot character set. as a result,
> we cannot use "Windows-31J" extension.
>
I've already heard this, maybe from Daiju.

> It seems that the same problem occurs in MySQL 4.1.
> In the case of MySQL, there is no conversion version 4.0 or before, but
> version 4.1 or later, Unicode is used as a pivot character set, then
> the same problem occurs.
>
> MySQL support character set 'cp932' as a measure for this problem.
> cp932 means 'Windows Codepage 932'. cp932 *IS* Windows-31J in IANA.
>
> I supporse if Firebird 2.0 will support character set 'cp932', we can avoid this problem.
> # When use iCU routine, use 'windows-31j' instead of 'shift-jis'.
>
This seems to be the way to go.


Adriano

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Adriano dos Santos Fernandes added a comment - 16/Jun/07 10:48 PM
ICU has CP932 too.
Is it different from Windows-31j or an alias?

KIMURA, Meiji added a comment - 18/Jun/07 10:28 AM
I think that there are three candidate for handling shift_jis extensiton.

Converter Explorer
http://demo.icu-project.org/icu-bin/convexp
(1) ibm-942_P12A-1999
(2) ibm-943_P15A-2003
(3) ibm-943_P130-1999

In this time, I think (2) is the best candidate. But some kanji characters are mapped as multiple code, then it maynot good for Firebird 2.0 when using unicode as pivot.

Please wait a week or so, I will test and make these difference clear.

KIMURA, Meiji added a comment - 21/Jun/07 08:18 AM
I want to make some program for check the conversion specification.

What 'Internal Converter Name' of Unicode is used for pivot code in Firebird 2.0?

Adriano dos Santos Fernandes added a comment - 21/Jun/07 10:04 PM
Please try a snapshot build >= 16169.
It has CP932, but I need you to edit intl/fbintl.conf, trying the other ICU charsets to see what is better.
You need to leave one "collation CP932" uncommented in each try, and restart the server after edit the file:
<charset CP932>
intl_module fbintl
collation CP932 ibm-942_P12A-1999
# collation CP932 ibm-943_P130-1999
# collation CP932 ibm-943_P15A-2003
collation CP932_UNICODE
</charset>

Please report here.

Dimitrios Chr. Ioannidis added a comment - 22/Jun/07 06:04 AM - edited
Adriano,

 actually the (1.)16169 is the revision of the writeBuildNum.sh file. The HEAD branch build number as result of your commits increased to 16012, so he must try a snapshot build >= 16012.

regards,

Adriano dos Santos Fernandes added a comment - 11/Jul/07 09:45 PM
Meiji, did you tested it?

KIMURA, Meiji added a comment - 12/Jul/07 08:18 PM
Sorry, not yet.

I try to this on Firebird 2.1 Beta, but failed. I tried as above.

(1) Add CP932 definition to intl/fbintl.conf
<charset CP932>
intl_module fbintl
collation CP932 ibm-942_P12A-1999
collation CP932_UNICODE
</charset>

(2) restart firebird server
(3) run SQL Script intl.sql in misc/
(4) SQL> execute procedure sp_register_character_set('CP932',4);

But error message said CHARACTER SET CP932 is not installed.

I have to use newer than FB2.1 beta ? or there is something to do?

KIMURA, Meiji added a comment - 12/Jul/07 08:43 PM
Today I try to overwrite latest FB2.1 snapsot after installation FB2.1 Beta.

KIMURA, Meiji added a comment - 12/Jul/07 08:58 PM
It works, it seem that 'collation CP932 ibm-943_P130-1999' is good for this purpos, I will tested 2 or 3 days in detail.

KIMURA, Meiji added a comment - 17/Jul/07 06:24 PM
I tested it. Please impliment this functions as below.

(i) charset name 'CP943C'
(ii) use ibm-943_P15A-2003

(i)
 Strictly speaking, ICU don't have CP932. CP943C is upper set of CP932.
If we use the name 'CP932', it will throw japanese FB users into confusion.
Then we have to the name 'CP943C'.

(ii)
 It seems that the specification of 'ibm-943_P15A-2003' is good for ordinary japanese FB users.

As a result, they are good choice for this issue. this function save a lot of japanese users.

<charset CP943C>
intl_module fbintl
collation CP943C ibm-943_P15A-2003
collation CP943C_UNICODE
</charset>

Minoru Yoshida added a comment - 15/Oct/07 05:14 AM
Hi,

Thanks for the addition of new spec.
I tested for CP943C charcter set, and making reports.
http://timeful.co.jp/fbmap/

Using the same character sets connection is very fine.
There are some problems by different character sets connection.
(The red font has described in the report )

1. CP943C to UTF8(and UNICODE_FSS)

The following characters are wrong.

- 0x8790 - 879C (9 chars)
- 0xED40 - EEFC (374 chars)

2. CP943C to SJIS_0208

The following characters are wrong.

- 0x7E
- 0x815F - 81CA(7 chars)
- 0x8740 - 879C(83 chars)
- 0xED40 - EEFC(374 chars)

Note:
ibm-943_P15A-2003
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=ALL

the bytes 81
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=81&s=ALL#layout

the bytes 87
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=87&s=ALL#layout

the bytes ED
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=ED&s=ALL#layout

the bytes EE
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=EE&s=ALL#layout

Regards,
Minoru


Minoru Yoshida added a comment - 16/Oct/07 05:47 PM
I had mistakes. This thread was fixed.....
And the 0x7E character handrings is good(maybe) by isql.
I will retests , and make the new thread.

Regards,
Minoru