New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for different hash algorithms in HASH system function [CORE4436] #4756
Comments
Commented by: @aafemt Any hash function has collisions. It is unavoidable when dimension of values is shrinked. |
Commented by: @reevespaul Interesting. This seems to produce the correct result: select 1, hash('1-20100433-01765-LOTES') but this doesn't: select 1,hash( '1-20100433-01765-LOTES') |
Commented by: Omacht András (aomacht) Your selects are different: vs. secound: 01765 vs 01775 If you try with 01775 in the first select the result will be the same. (It is possible to two different string have the same hash code. See: http://en.wikipedia.org/wiki/Hash_function) |
Commented by: @reevespaul Ah, yes. In fact this does demonstrate it correctly: select 1, hash('1-20100433-01765-LOTES')
============ ===================== |
Commented by: @reevespaul The documentation for hash() just says "Returns a hash value for the input string. This function fully supports text BLOBs of any length and character set." I suppose one resolution of this bug report could be to amend the documentation to indicate that collisions are possible and must be tested for. No assumption can be made about the uniqueness of a string based on its hash value. It is then left to the user to compare the original strings when two identical hash values are returned. Perhaps the real question is "Does that make this implementation of hash() useless?". |
Commented by: @asfernandes FB's HASH is a function (and algorithm) ported from Yaffil. The algorithm is not a good hash function, but collisions are possible in any algorithm. This function should be extended to support standard and well known algorithms (MD5, SHA-*) via a second parameter. |
Commented by: Sean Leyne (seanleyne) @dimitry Sibiryakov, While collisions have been found in MD5 and SHA-1 hash functions, making them unsuitable for the current purposes. SHA-1 ad SHA-2 hash functions only have a **theoretical** chance of hash collision. Further, SHA-3 hash functions don't even have a theoretical hash collision. (http://en.wikipedia.org/wiki/Cryptographic_hash_function) So, there is a modern hash algorithm that would address this issue. |
Commented by: @dyemanov Sean, every hash function (SHA-3 included) has collisions, this is by design. It's impossible to map longer string into a shorter one without collisions. |
Commented by: Evandro Amparo (evandroamparo) I deduced Firebird hash algorithm while playing with some examples: - Reverse the string That's why longer strings have longer hashes, and that's why it's so weak! It would be a great if other algorithms were added. |
Modified by: @asfernandesassignee: Adriano dos Santos Fernandes [ asfernandes ] |
Modified by: @asfernandesissuetype: Bug [ 1 ] => Improvement [ 4 ] Component: Engine [ 10000 ] summary: Hash Function => Support for different hash algorithms in HASH system function Component: API / Client Library [ 10040 ] => |
Modified by: @asfernandesstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 4.0 Alpha 1 [ 10731 ] |
Commented by: @pavel-zotov After reading https://stackoverflow.com/questions/3475648/sha1-collision-demo-example I decided to make trivial test:set list on; recreate table test(s1 blob, s2 blob); -- https://stackoverflow.com/questions/3475648/sha1-collision-demo-example select select select quit;Its output on build 713 is:S1 9b:0 S1 9b:0 S1 9b:0
|
Modified by: @pavel-zotovstatus: Resolved [ 5 ] => Resolved [ 5 ] QA Status: Done with caveats Test Details: Test verifies only: |
Commented by: @asfernandes > Is it expected that current implementation of SHA256 and SHA512 in FB 4.0.0 can fail as SHA1 ? Can fail as what? What do you mean? |
Commented by: @pavel-zotov I mean that two different strings with not big length give the same hash on SHA-256 and even in SHA-512. |
Commented by: @asfernandes A SHA-256 / SHA-512 will return the same hash for the same string in every implementation. If not, the code is bugged. It's out of our scope to judge quality of these algorithms and minimal length for clashes. |
Commented by: John Franck (bozzy) >I mean that two different strings with not big length give the same hash on SHA-256 and even in SHA-512. Your tests are hashing two identical strings with various hashing algorithm. The resulting couples of hashes are expected to be identical, given the input is identical. |
Submitted by: Joaquim Pais (joaquim.pais.alidata)
Votes: 3
Hi,
Different string's give the same hash result ist's a bug?
Witch max length to parameter hash function?
Example
select hash('4-20100433-01775-LOTES')
from rdb$database
union
select hash('1-20100433-01765-LOTES')
from rdb$database
Commits: 55c35b7
====== Test Details ======
Test verifies only:
1) ability to use syntax: hash(<string> using <algo>)
2) non-equality of hash results for sha1, sha256 and sha512 using _TRIVIAL_ sample from ticket.
The text was updated successfully, but these errors were encountered: