Discussion:
how to allow integer overflow for calculating hash code of a string?
(too old to reply)
Haifeng Liu
2012-09-20 08:55:46 UTC
Permalink
I want to write a hash function which acts as String.hashCode() in java: hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How can I avoid this? I saw java do not care overflow of int, it just make the result negative.
--
Sent via pgsql-admin mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Craig James
2012-09-20 14:34:29 UTC
Permalink
Post by Haifeng Liu
hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How
can I avoid this? I saw java do not care overflow of int, it just make the
result negative.
Use the bitwise AND operator to mask the hash value with 0x3FFFFFF before
each iteration:

hash = (hash & 67108863) * 31 + s.charAt(i);

Craig
Haifeng Liu
2012-09-21 02:56:10 UTC
Permalink
Post by Haifeng Liu
I want to write a hash function which acts as String.hashCode() in java: hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How can I avoid this? I saw java do not care overflow of int, it just make the result negative.
hash = (hash & 67108863) * 31 + s.charAt(i);
Craig
Thank you, I believe your solution is OK for a hash function, but I am aiming to create a hash function that is consistent with the one applications use. I know postgresql 9.1 has a hash function called hashtext, but I don't know what algorithm it use, and I also see that it's not recommended to relay on it. So I am trying to create a hash function which behaves exactly the same as java.lang.String.hashCode(). The later one may generate negative hash value. I guess when the number is overflowing, the part out of range will be ignored, and if the highest bit get 1, the hash value turn to negative value.
Craig James
2012-09-21 15:21:04 UTC
Permalink
Post by Craig James
Post by Haifeng Liu
hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How
can I avoid this? I saw java do not care overflow of int, it just make the
result negative.
hash = (hash & 67108863) * 31 + s.charAt(i);
Craig
Thank you, I believe your solution is OK for a hash function, but I am
aiming to create a hash function that is consistent with the one
applications use. I know postgresql 9.1 has a hash function called
hashtext, but I don't know what algorithm it use, and I also see that it's
not recommended to relay on it. So I am trying to create a hash function
which behaves exactly the same as java.lang.String.hashCode(). The later
one may generate negative hash value. I guess when the number is
overflowing, the part out of range will be ignored, and if the highest bit
get 1, the hash value turn to negative value.
You are probably doing something where you want the application and the
database to implement the exact same function, but if you stick to the Java
built-in function, you will only have control over one implementation of
that function. What happens if someone working on Java changes the how the
Java internals work?

A better solution would be to implement your own hash function in Postgres,
and then once you know exactly how it will work, re-implement it in Java
with your own code. That's the only way you can ensure consistency between
the two.

Craig
Haifeng Liu
2012-10-29 11:07:51 UTC
Permalink
I got a way which works fine: use bigint first, and then convert it to bit(32), and convert it to int4 at last.

declare i integer := 0;
declare h bigint := 0;
begin
for i in 1..length(str) loop
h = (h * 31 + ascii(substring(str, i, 1))) & 4294967295;
end loop;
return cast(cast(h as bit(32)) as int4);
end;

I did some tests which include both positive and negative results, seems all Ok.
Post by Haifeng Liu
Post by Haifeng Liu
I want to write a hash function which acts as String.hashCode() in java: hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How can I avoid this? I saw java do not care overflow of int, it just make the result negative.
hash = (hash & 67108863) * 31 + s.charAt(i);
Craig
Thank you, I believe your solution is OK for a hash function, but I am aiming to create a hash function that is consistent with the one applications use. I know postgresql 9.1 has a hash function called hashtext, but I don't know what algorithm it use, and I also see that it's not recommended to relay on it. So I am trying to create a hash function which behaves exactly the same as java.lang.String.hashCode(). The later one may generate negative hash value. I guess when the number is overflowing, the part out of range will be ignored, and if the highest bit get 1, the hash value turn to negative value.
You are probably doing something where you want the application and the database to implement the exact same function, but if you stick to the Java built-in function, you will only have control over one implementation of that function. What happens if someone working on Java changes the how the Java internals work?
That's not the trouble, just create a hash tool which copies the code of java.lang.String.hashCode() and use that tool instead will resolve this. The key is, I know and I can reimplement the algorithm.
Post by Haifeng Liu
A better solution would be to implement your own hash function in Postgres, and then once you know exactly how it will work, re-implement it in Java with your own code. That's the only way you can ensure consistency between the two.
Craig
Haifeng Liu
2012-10-30 11:48:23 UTC
Permalink
I got a way which works fine: use bigint first, and then convert it to bit(32), and convert it to int4 at last.

declare i integer := 0;
declare h bigint := 0;
begin
for i in 1..length(str) loop
h = (h * 31 + ascii(substring(str, i, 1))) & 4294967295;
end loop;
return cast(cast(h as bit(32)) as int4);
end;

I did some tests which include both positive and negative results, seems all Ok.
Post by Haifeng Liu
Post by Haifeng Liu
I want to write a hash function which acts as String.hashCode() in java: hash = hash * 31 + s.charAt(i)... but I got integer out of range error. How can I avoid this? I saw java do not care overflow of int, it just make the result negative.
hash = (hash & 67108863) * 31 + s.charAt(i);
Craig
Thank you, I believe your solution is OK for a hash function, but I am aiming to create a hash function that is consistent with the one applications use. I know postgresql 9.1 has a hash function called hashtext, but I don't know what algorithm it use, and I also see that it's not recommended to relay on it. So I am trying to create a hash function which behaves exactly the same as java.lang.String.hashCode(). The later one may generate negative hash value. I guess when the number is overflowing, the part out of range will be ignored, and if the highest bit get 1, the hash value turn to negative value.
You are probably doing something where you want the application and the database to implement the exact same function, but if you stick to the Java built-in function, you will only have control over one implementation of that function. What happens if someone working on Java changes the how the Java internals work?
That's not the trouble, just create a hash tool which copies the code of java.lang.String.hashCode() and use that tool instead will resolve this. The key is, I know and I can reimplement the algorithm.
Post by Haifeng Liu
A better solution would be to implement your own hash function in Postgres, and then once you know exactly how it will work, re-implement it in Java with your own code. That's the only way you can ensure consistency between the two.
Craig
Loading...