Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore parameter to convert one byte character set to UTF-8 [CORE5963] #6217

Open
firebird-automations opened this issue Nov 13, 2018 · 3 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Tomasz Kujalow (tkujalow)

Jira_subtask_inward CORE4661

I think the most common scenario it is convert one byte character set to UTF-8.
So maybe posibibility putting parameter which force replace one byte character set (set by this parameter) to UTF-8 (what is allways possible).

For example we have multiple bases with WIN1250. When we set parameter for example: -force_convert=WIN1250 for gbak, it will convert all meta fields (tables, procedures...etc) and data in this fields to UTF8.
It could be very usfull functionality.
Is it possible?

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

It's not "always possible".

There is stored routines, which may use characters sets and collations in their body.

There may be code doing "where my_field collate xxx = 'y', and then collate xxx is not compatible with utf8.

It seems a task for recreate metadata, editing routines, then pump data.

@firebird-automations
Copy link
Collaborator Author

Commented by: Tomasz Kujalow (tkujalow)

Ok. But changing collations in routines (procedures , triggers, packages, etc) is simple.
The most difficult is convert fields character set from single-byte to UTF-8, especially for big databases (1k tables, metadata size=35MB).
And what is important any errors will occure after restore (on run of database), what can be simple repaired (change sp, views, triggers).

We generally have big problem with migrating from WIN1250 (95% all string fields) to UTF-8 for about > 300 databases which are installed on our customer computers (some of them are not connected to internet - no access to them). So we have to prepare program which automaticly convert database from win1250 to utf-8. Calling gbak in such scenario (backup/resotore) is simple and reliable. But gbak not have option to replace character set for fields, which is the most difficalt in whole migration.

May be such parameter:
-CONV_SC_FROM_TO_UTF8=WIN1250,UCS_BASIC
First (WIN1250): Convert from this charset to utf-8
Second (UCS_BASIC): Set such collation for destination utf-8 field.

If it will work only for table fields, it will be big convenience.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

The problematic expressions may be embedded everywhere (expression index, constraint, etc).

So it's not "reliable" to put a funcionality in a builtin tool that has lots of situation to not work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant