Hi, question on assignment. We’re seeing a sample ...
# ask-questions
q
Hi, question on assignment. We’re seeing a sample ratio mismatch and I’m suspicious that it is to do with having changed the value served for the control arm from ‘Control’ to ‘Baseline’ prior to the experiment starting. The feature flag existed for around a week but was not in our code and we did not request any assignments for it. But the SRM seems consistent with cached assignments being used for some users. Could this be the case? Are any assignments cached in your backend (i.e. when a feature flag exists and assignemnts are requested for an id for other features)?
f
what kind of experiment is it?
visual, url, or feature flag based?
if you made the flag, there is no pre-assignment happening. So if it was not in use, no one was being assigned. We use deterministic hashing to do the assignment, which means as long as the hashing key (the experiment tracking key) is the same and the assignment attribute is the same, they'll get assigned the same variant. You can also have sticky bucketing enabled which means that even if the split percentage changes, they'll still get the original assigned variant.
q
It’s feature flag based, we’ve got 28.7% in control and 36.7% ish in each of the two variants and it seems to be happening quite consistently. Absolute numbers are 31k, 39k and 39k so its a big enough experiment. We haven’t changed the experiment, just the feature flag, but that was before the experiment started and the feature flag is tied to a minimum app version, which wasn’t active until the experiment launched, so nothing could have been cached locally. So I’m a bit stumped.
f
what is the split set to?
q
33.4/33.3/33.3
f
ah, ya, thats likely a bug
can you share your SDK implementation?
q
ah ok cheers, i’ll need to ask our mobile dev as that’s not my side of things
you need to see the code or just want to know the sdk version?
f
one error we see some times is not using the trackingCallback correctly
engineers can try to fire it outside of that or manually on flag usage
and that causes problems
q
ok, will ask and get back to you tomorrow, cheers
One thing we wanted to rule out first actually, we are using a device id for assignment, but what we store in our database is actually a user id in most cases (we do this so we can use mixpanel’s identiy merge feature and include non-logged in users). When we query the database we use an identity table to join all the ids associated with a particular user together. So there could be a case of some users seeing multiple variants, but I’d expect that not to be a problem, as those users are filtered out of the results and each arm of the test should be affected equally. In this case there are only around 500 such users, so it wouldn’t explain the SRM anyway. Does that sound alright?
Here is the relevant [swift] code snippet:
Copy code
growthbook = GrowthBookBuilder(
            apiHost: Environment.current.growthbook.apiHost,
            clientKey: Environment.current.growthbook.clientKey,
            attributes: Self.getAttributes(from: user),
            trackingCallback: { experiment, experimentResult in

                // Ensure the experiment is active and the user is part of it. Although the documentation states this method is "Called whenever someone is put into an experiment", this makes it safer in case that's not always true.
                guard experiment.isActive, experimentResult.inExperiment else {
                    return
                }

                Logger.shared.print("In experiment", mode: .local, info: ["Experiment": experiment.key, "Value": experimentResult.value.rawValue])

                Self.trackExperiment(experiment: .init(key: experiment.key), withResult: experimentResult)
            }, refreshHandler: { cacheRefreshed in
                currentValueSubject.send(cacheRefreshed) // mapped to `growthBookRefreshed`
            }
        ).initializer()
    }
We’ve noticed that there are a lot more
Experiment Joined
events sent to mixpanel for the two variants in this experiment than for the baseline (per unique user). We’d expect that because the code will be hit more often for users who are not in the baseline. Could that be related to the issues you mentioned?
looking 1