if we use this library on client side for a/b test...
# ask-questions
s
if we use this library on client side for a/b testing then how does percentage wise distribution occur. Client state is not shared between users so whoever loads website will try to assign percentage distribution from scratch regardless what experiments other users are assigned? which will be incorrect way of a/b testing
f
We use deterministic hashing. So you would pass a unique identifier for each user into the SDK (usually stored in a cookie). We hash that together with the experiment id to get a number from 0 to 1. Then, each variation is assigned a range (e.g. 0 to 0.5 and 0.5 to 1) and the user gets whichever range their hash falls into
s
okay but every time site is loaded by different user for the first time, that hashing will occur from scratch right? since client side state is not shared between users every first time visitors will run this process you mentioned from scratch.
f
yes, the hashing is run every single time we assign someone a variation. As long as each user has a different unique identifier, then their hash values will be evenly distributed between 0 and 1, so each variation will get the correct percent of random users assigned
s
user A first time visits - exp A, user A repeat visit - exp A. but then other users hashing algo don’t know how many users have got exp A. it will try to run assignment process from scratch regardless what others users have got right? since there is no server side state management.
how come 50% split test will work here?
I hope you understand what I am trying to say. hashing algo is running client side not server side. if it runs server side, hashing algo state is shared between users but if is running client side than every user have their own version of hashing algo state.
I am not sure my understanding is right though
f
Each user id will be an independent hashing event. So for example, this might happen:
Copy code
hash("userA_experimentA") = 0.2123
hash("userB_experimentA") = 0.7853
hash("userC_experimentA") = 0.4867
hash("userD_experimentA") = 0.6543
The hash values are deterministic, but randomly distributed. So the same user id always produces the same output, but over many user ids, the distribution will be uniform. If one of the variation ranges is 0 ot 0.5, then 50% of people will fall into that range (userA and userC in this example).
1
s
I get this part and it works if we are doing hashing on server side but what I am saying is how hashing algo state is shared from user to user if we run it client side? if user A loading library for the first time on client side then hashing algo will start hashing from scratch right? now user B loads the library first time, now this hashing algo is not context aware. It will still think I am running this for first time even though user A has ran it in past since there is no way to connect hashing of A with hashing of B. they are running in their browser.
f
The hashing algorithm doesn't need any context to work correctly. It always "starts from scratch"
we're not using a random number generator with a seed or anything. Just a fast hashing algorithm (FNV) that always produces the same output given the same input with no context or persistence required.
s
ok thanks one more question. You mentioned “over many user ids, the distribution will be uniform”. if hashing algo don’t know how many users it has handled so far(that will happen when running this hashing on client-side), how does this unifrom distribution occur?
f
That's just part of the FNV algorithm. Just like other one-way hashing algorithms like MD5 and SHA, different inputs map to different outputs using a uniform distribution. So if you pass in 1000 different user ids to the hash, you will get 1000 different numbers roughly evenly distributed between 0 and 1. It's not perfect, you might end up with 49.9% vs 50.1%, but the more users you have the closer it will get to a true 50/50 split
🙌 1
s
but in client side case we are not passing 1000 user ids to same hash function instance, we will keep passing one user id to one hash function instance that loads when user visit site. another user loads another hash function instance and will pass only 1 user id. so for 1000 users it will be 1 user id will passed to each of the 1000 different hash function instance . 1000 users id will not be passed to same hash function instance in client side case. How does uniform distribution will occur in this case?
f
The hash function is stateless. Doesn't matter if it's the same instance or 1000 different instances.
s
I don’t understand without knowing how many user id have been processed, how can it determine uniform distribution? 50% split in case of 1000 users will be 500 but if there are 10000 users then it will be 5000.
f
it doesn't guarantee an exact 50/50 split. Each user has a 50% chance of being in either A or B. You could get 100 users in a row, all getting assigned A, but it would be very unlikely. Over time it will trend towards a 50/50 split
s
that’s fine if it is little bit inaccurate, but without being context aware, I still don’t get how it can do split in client-side. anyways thanks