Hi everyone I was wondering is it possible to replicate hash GrowthBook Users #announcements

Hi everyone! I was wondering is it possible to rep...

wooden-country-60054

03/04/2023, 2:27 AM

Hi everyone! I was wondering is it possible to replicate hashFnv32a function that is used in GrowthBook SDKs in SQL, for example in PostgreSQL? I'm thinking of querying against tables with ab tests results. Thank you.

fresh-football-47124

03/04/2023, 2:27 AM

in theory, it should be

wooden-country-60054

03/04/2023, 4:22 AM

Some magic from ChatGPT. Please keep in mind that I didn't test it thoroughly, just a little bit by comparing with similar function from python SDK.

Copy code

CREATE OR REPLACE FUNCTION fnv1a32(str TEXT) RETURNS BIGINT AS $$
DECLARE
  hval BIGINT := 2166136261;
  prime BIGINT := 16777619;
  uint32_max BIGINT := 2 ^ 32;
BEGIN
  FOR i IN 1..length(str) LOOP
    hval := (hval # ASCII(SUBSTR(str, i, 1))) * prime % uint32_max;
  END LOOP;
  RETURN hval::BIGINT;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

wooden-country-60054

03/04/2023, 4:23 AM

It's Postgre SQL

fresh-football-47124

03/04/2023, 4:23 AM

going to put us all out of work 🙂

wooden-country-60054

03/04/2023, 4:23 AM

🙂

fresh-football-47124

03/04/2023, 4:23 AM

do you have a good way to test if that works?

fresh-football-47124

03/04/2023, 4:24 AM

it would be cool to share with the community

wooden-country-60054

03/04/2023, 4:29 AM

Well, I can't be sure as I don't understand the logic completely. I tested it just by generating 10000 random users in SQL and python. Not sure if this is enough.

Copy code

%%sql
with t1 as (
    select random_between(a, a*50000) n
    from unnest(array(select * from generate_series(10000,20000))) a
)
select n, fnv1a32(n::TEXT) h from t1

and then by

Copy code

df['hp']=df['n'].apply(lambda x: fnv1a32(str(x)))
df['h'].equals(df['hp'])

wooden-country-60054

03/05/2023, 7:12 AM

After a bit of research I came to the conclusion that the hash function probably can be replicated in SQL if the particular SQL dialect supports recursion. For example Postgre SQL does support it, but Clickhouse doesn't. The reason for that is how the FNV-1a hash algorithm works. You of course can always send the variation data explicitly and not bother with hashes in SQL at all, I just thought of it as a useful bonus of the deterministic nature of the function.

narrow-tomato-61607

03/05/2023, 7:24 PM

cc @colossal-petabyte-61345

narrow-tomato-61607

03/05/2023, 7:25 PM

this is exactly what we were looking to do! but the idea to have ChatGPT do the heavy lifting did not occur to me 🙂

💡 1

colossal-petabyte-61345

03/05/2023, 7:53 PM

This could help on pre-identify which customer is going into which variant, but it won't help us targeting a specified user segment defined internally, no ?

wooden-country-60054

03/06/2023, 3:56 AM

I believe you typically define segments for targeting based on some attributes that come from your internal data, like, for example, those who spent more than 1000$ as "vip customers" for attribute "customer_type" . Variation assignment comes later.

148 Views

Open in Slack

Previous Next