Hi everyone! I was wondering is it possible to rep...
# announcements
w
Hi everyone! I was wondering is it possible to replicate hashFnv32a function that is used in GrowthBook SDKs in SQL, for example in PostgreSQL? I'm thinking of querying against tables with ab tests results. Thank you.
f
in theory, it should be
w
Some magic from ChatGPT. Please keep in mind that I didn't test it thoroughly, just a little bit by comparing with similar function from python SDK.
Copy code
CREATE OR REPLACE FUNCTION fnv1a32(str TEXT) RETURNS BIGINT AS $$
DECLARE
  hval BIGINT := 2166136261;
  prime BIGINT := 16777619;
  uint32_max BIGINT := 2 ^ 32;
BEGIN
  FOR i IN 1..length(str) LOOP
    hval := (hval # ASCII(SUBSTR(str, i, 1))) * prime % uint32_max;
  END LOOP;
  RETURN hval::BIGINT;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
It's Postgre SQL
f
going to put us all out of work 🙂
w
🙂
f
do you have a good way to test if that works?
it would be cool to share with the community
w
Well, I can't be sure as I don't understand the logic completely. I tested it just by generating 10000 random users in SQL and python. Not sure if this is enough.
Copy code
%%sql
with t1 as (
    select random_between(a, a*50000) n
    from unnest(array(select * from generate_series(10000,20000))) a
)
select n, fnv1a32(n::TEXT) h from t1
and then by
Copy code
df['hp']=df['n'].apply(lambda x: fnv1a32(str(x)))
df['h'].equals(df['hp'])
After a bit of research I came to the conclusion that the hash function probably can be replicated in SQL if the particular SQL dialect supports recursion. For example Postgre SQL does support it, but Clickhouse doesn't. The reason for that is how the FNV-1a hash algorithm works. You of course can always send the variation data explicitly and not bother with hashes in SQL at all, I just thought of it as a useful bonus of the deterministic nature of the function.
n
cc @colossal-petabyte-61345
this is exactly what we were looking to do! but the idea to have ChatGPT do the heavy lifting did not occur to me 🙂
💡 1
c
This could help on pre-identify which customer is going into which variant, but it won't help us targeting a specified user segment defined internally, no ?
w
I believe you typically define segments for targeting based on some attributes that come from your internal data, like, for example, those who spent more than 1000$ as "vip customers" for attribute "customer_type" . Variation assignment comes later.