Hello, I’m currently implementing GrowthBook at my...
# announcements
n
Hello, I’m currently implementing GrowthBook at my organization, so far I’m liking the experiment analysis side of it, and we’re now starting to look at creating new experiments using Features Flagging. Need some help in analyzing these feature flag based experiments: In our custom-code experiments we stored an AB flag in our users table, so it was easy to create the Experiment Assignment Queries. However we would’ve hoped that once we started using the GrowthBook Feature Flagging we wouldn’t need to store the variant assignments in our database, but do the hashing of the unit identifier given a query that calculates the attributes. We’ve done this in the past by hand, e.g. in Snowflake doing the same MD5 hash of the userid in the query, the same hashing that is done in the backend at feature assignment time:
Copy code
SELECT
      user_id,
      'experiment_id' as experiment_id,
      case
          when (to_number(left(md5(concat(user_id, 'hc_channel_v3')), 8),'XXXXXXXX')::bigint::float / 4294967295::float <= 0.5) then 'variant_B'
          else 'variant_A'
      end as variation_id
FROM
     users as u
I understand the recommended approach is to send an event with the assignment, but is re-doing the hashing on the query something that is supported / recommended as well?
f
its technically possible to recreate our hashing algorithm with SQL, but it would only work for extremely simple use cases and will likely introduce a lot of variance into your test results. The odds of making a mistake are very high. The main issue is that you usually only want to include a user in the analysis if they actually see the experiment. So if you only run an experiment on a few pages, or a user doesn't visit your site while your experiment is running, you will add a lot of noise by including them in your analysis. Also, you usually only want to include metric conversions that happen AFTER viewing the experiment. To do that, you don't just need the hash value, but also the date they say the experiment. The other issue is with how the hash value maps to a variation. If you always do 50/50 tests on 100% of traffic it's easy (hash<0.5 is control), but if you ever run 3-way tests or use a different traffic split it will be difficult to model in SQL. That's why we recommend firing a dedicated event when someone views an experiment with a timestamp and the variation they saw. It's much more reliable.
1