Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HASH function returns different values for the same string value due to character sets [CORE5030] #5317

Open
firebird-automations opened this issue Nov 24, 2015 · 1 comment

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Mark Jones (mjnz)

Using a different character set changes the resulting HASH value for the same string

CREATE TABLE TEST (
TESTSTR1 VARCHAR(20) CHARACTER SET UTF8,
TESTSTR2 VARCHAR(20) CHARACTER SET WIN1252 );

INSERT INTO TEST (TESTSTR1, TESTSTR2) VALUES ('€URO','€URO');

SELECT
IIF(TESTSTR1 = TESTSTR2,'TRUE','FALSE') ISMATCH,
HASH(TESTSTR1) HASH1,
HASH(TESTSTR2) HASH2 FROM TEST;

And we see the following output
ISMATCH HASH1 HASH2
-------------------------------------
TRUE 246225519 547439

Also exhibits similar behaviour when using different client connection character sets and evaluating a static value, e.g.
SELECT HASH('€URO') FROM RDB$DATABASE;

isql -ch WIN1252....
HASH = 547439

isql -ch UTF8 ...
HASH = 246225519

I guess that the solution would be to ensure that the HASH always converts text strings (or blob strings) to UTF8 before evaluating, although that would likely break existing systems that are expecting it to work the way that it does now...

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

Character set can change the way that the characters are represented/stored, so a different hash is reasonable/expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant