Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for different hash algorithms in HASH system function [CORE4436] #4756

Closed
firebird-automations opened this issue May 26, 2014 · 18 comments

Comments

@firebird-automations
Copy link
Collaborator

Submitted by: Joaquim Pais (joaquim.pais.alidata)

Votes: 3

Hi,

Different string's give the same hash result ist's a bug?

Witch max length to parameter hash function?

Example

select hash('4-20100433-01775-LOTES')
from rdb$database
union
select hash('1-20100433-01765-LOTES')
from rdb$database

Commits: 55c35b7

====== Test Details ======

Test verifies only:
1) ability to use syntax: hash(<string> using <algo>)
2) non-equality of hash results for sha1, sha256 and sha512 using _TRIVIAL_ sample from ticket.

@firebird-automations
Copy link
Collaborator Author

Commented by: @aafemt

Any hash function has collisions. It is unavoidable when dimension of values is shrinked.

@firebird-automations
Copy link
Collaborator Author

Commented by: @reevespaul

Interesting. This seems to produce the correct result:

select 1, hash('1-20100433-01765-LOTES')
from rdb$database
union
select 2, hash('2-20100433-01765-LOTES')
from rdb$database
union
select 3, hash('3-20100433-01765-LOTES')
from rdb$database
union
select 4, hash('4-20100433-01765-LOTES')
from rdb$database
union
select 5, hash('5-20100433-01775-LOTES')
from rdb$database;

but this doesn't:

select 1,hash( '1-20100433-01765-LOTES')
from rdb$database
union
select 4, hash('4-20100433-01775-LOTES')
from rdb$database;

@firebird-automations
Copy link
Collaborator Author

Commented by: Omacht András (aomacht)

Your selects are different:
First:
...
select 4, hash('4-20100433-01765-LOTES')
from rdb$database
...

vs. secound:
...
select 4, hash('4-20100433-01775-LOTES')
from rdb$database;

01765 vs 01775

If you try with 01775 in the first select the result will be the same.

(It is possible to two different string have the same hash code. See: http://en.wikipedia.org/wiki/Hash_function)

@firebird-automations
Copy link
Collaborator Author

Commented by: @reevespaul

Ah, yes. In fact this does demonstrate it correctly:

select 1, hash('1-20100433-01765-LOTES')
from rdb$database
union
select 2, hash('2-20100433-01765-LOTES')
from rdb$database
union
select 3, hash('3-20100433-01765-LOTES')
from rdb$database
union
select 4, hash('4-20100433-01775-LOTES')
from rdb$database
union
select 5, hash('5-20100433-01765-LOTES')
from rdb$database;

CONSTANT

============ =====================
1 533117221586865043
2 533117221318429587
3 533117222123735955
4 533117221586865043
5 533117221872077715

@firebird-automations
Copy link
Collaborator Author

Commented by: @reevespaul

The documentation for hash() just says "Returns a hash value for the input string. This function fully supports text BLOBs of any length and character set."

I suppose one resolution of this bug report could be to amend the documentation to indicate that collisions are possible and must be tested for. No assumption can be made about the uniqueness of a string based on its hash value. It is then left to the user to compare the original strings when two identical hash values are returned.

Perhaps the real question is "Does that make this implementation of hash() useless?".

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

FB's HASH is a function (and algorithm) ported from Yaffil. The algorithm is not a good hash function, but collisions are possible in any algorithm.

This function should be extended to support standard and well known algorithms (MD5, SHA-*) via a second parameter.

@firebird-automations
Copy link
Collaborator Author

Commented by: Sean Leyne (seanleyne)

@dimitry Sibiryakov,

While collisions have been found in MD5 and SHA-1 hash functions, making them unsuitable for the current purposes.

SHA-1 ad SHA-2 hash functions only have a **theoretical** chance of hash collision.

Further, SHA-3 hash functions don't even have a theoretical hash collision. (http://en.wikipedia.org/wiki/Cryptographic_hash_function)

So, there is a modern hash algorithm that would address this issue.

@firebird-automations
Copy link
Collaborator Author

Commented by: @dyemanov

Sean, every hash function (SHA-3 included) has collisions, this is by design. It's impossible to map longer string into a shorter one without collisions.

@firebird-automations
Copy link
Collaborator Author

Commented by: Evandro Amparo (evandroamparo)

I deduced Firebird hash algorithm while playing with some examples:

- Reverse the string
- Consider the first position p = 1
- For each char, shift it's ASCII code to the left by 4 * (p - 1) bits and add up with the previous
- The resulting sum is the hash

That's why longer strings have longer hashes, and that's why it's so weak!

It would be a great if other algorithms were added.

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

assignee: Adriano dos Santos Fernandes [ asfernandes ]

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

issuetype: Bug [ 1 ] => Improvement [ 4 ]

Component: Engine [ 10000 ]

summary: Hash Function => Support for different hash algorithms in HASH system function

Component: API / Client Library [ 10040 ] =>

@firebird-automations
Copy link
Collaborator Author

Modified by: @asfernandes

status: Open [ 1 ] => Resolved [ 5 ]

resolution: Fixed [ 1 ]

Fix Version: 4.0 Alpha 1 [ 10731 ]

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

After reading https://stackoverflow.com/questions/3475648/sha1-collision-demo-example I decided to make trivial test:

set list on;
set blob all;

recreate table test(s1 blob, s2 blob);
commit;
--insert into test(s) values('4-20100433-01775-LOTES');
--insert into test(s) values('1-20100433-01765-LOTES');

-- https://stackoverflow.com/questions/3475648/sha1-collision-demo-example
insert into test(s1, s2) values('20177527e04e05d5e7b448c1ab2b872f86831d0b', '20177527e04e05d5e7b448c1ab2b872f86831d0b');
commit;

select
s1,s2
,hash(s1 using sha1) s1_hash_using_sha1
,hash(s2 using sha1) s2_hash_using_sha1
,iif( hash(s1 using sha1) = hash(s2 using sha1), '#⁠#⁠ ACHTUNG #⁠#⁠', 'Passed, all fine.') as collision_sha1_check
from test;

select
s1,s2
,hash(s1 using sha256) s1_hash_using_sha256
,hash(s2 using sha256) s2_hash_using_sha256
,iif( hash(s1 using sha256) = hash(s2 using sha256), '#⁠#⁠ ACHTUNG #⁠#⁠', 'Passed, all fine.') as collision_sha256_check
from test;

select
s1,s2
,hash(s1 using sha512) s1_hash_using_sha512
,hash(s2 using sha512) s2_hash_using_sha512
,iif( hash(s1 using sha512) = hash(s2 using sha512), '#⁠#⁠ ACHTUNG #⁠#⁠', 'Passed, all fine.') as collision_sha512_check
from test;

quit;

Its output on build 713 is:

S1 9b:0
20177527e04e05d5e7b448c1ab2b872f86831d0b
S2 9b:1
20177527e04e05d5e7b448c1ab2b872f86831d0b
S1_HASH_USING_SHA1 8C4E783C453F8059ADCF8B52C4E57FAE93F75861
S2_HASH_USING_SHA1 8C4E783C453F8059ADCF8B52C4E57FAE93F75861
COLLISION_SHA1_CHECK #⁠#⁠ ACHTUNG #⁠#⁠

S1 9b:0
20177527e04e05d5e7b448c1ab2b872f86831d0b
S2 9b:1
20177527e04e05d5e7b448c1ab2b872f86831d0b
S1_HASH_USING_SHA256 2C15B17DF7EA260B5654E85A2BFF7CF3AF5E40C0D9FC1FD758EFBA61BB88CC53
S2_HASH_USING_SHA256 2C15B17DF7EA260B5654E85A2BFF7CF3AF5E40C0D9FC1FD758EFBA61BB88CC53
COLLISION_SHA256_CHECK #⁠#⁠ ACHTUNG #⁠#⁠

S1 9b:0
20177527e04e05d5e7b448c1ab2b872f86831d0b
S2 9b:1
20177527e04e05d5e7b448c1ab2b872f86831d0b
S1_HASH_USING_SHA512 C0CB5278996574D9CCBF952AF67B5DFF38CDAC394AE713928A759A884222985DDBC92C15A74278236EB9599E8603EDFC9A7613AD475085D5ADF93174DC63536A
S2_HASH_USING_SHA512 C0CB5278996574D9CCBF952AF67B5DFF38CDAC394AE713928A759A884222985DDBC92C15A74278236EB9599E8603EDFC9A7613AD475085D5ADF93174DC63536A
COLLISION_SHA512_CHECK #⁠#⁠ ACHTUNG #⁠#⁠

Is it expected that current implementation of SHA256 and SHA512 in FB 4.0.0 can fail as SHA1 ?

@firebird-automations
Copy link
Collaborator Author

Modified by: @pavel-zotov

status: Resolved [ 5 ] => Resolved [ 5 ]

QA Status: Done with caveats

Test Details: Test verifies only:
1) ability to use syntax: hash(<string> using <algo>)
2) non-equality of hash results for sha1, sha256 and sha512 using _TRIVIAL_ sample from ticket.

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

> Is it expected that current implementation of SHA256 and SHA512 in FB 4.0.0 can fail as SHA1 ?

Can fail as what? What do you mean?

@firebird-automations
Copy link
Collaborator Author

Commented by: @pavel-zotov

I mean that two different strings with not big length give the same hash on SHA-256 and even in SHA-512.
It this the feature of SHA algorithm per se or this is because of concrete implementation in FB ?

@firebird-automations
Copy link
Collaborator Author

Commented by: @asfernandes

A SHA-256 / SHA-512 will return the same hash for the same string in every implementation. If not, the code is bugged.

It's out of our scope to judge quality of these algorithms and minimal length for clashes.

@firebird-automations
Copy link
Collaborator Author

Commented by: John Franck (bozzy)

>I mean that two different strings with not big length give the same hash on SHA-256 and even in SHA-512.

Your tests are hashing two identical strings with various hashing algorithm. The resulting couples of hashes are expected to be identical, given the input is identical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants