Good morning. We're getting some Multiple Exposure...
# ask-questions
j
Good morning. We're getting some Multiple Exposures Warning and Warning: Sample Ratio Mismatch (SRM) detected warnings on multiple experiments. We are using GA4/BigQuery. Nothing has changed in our implementation. The experiments we run have been implemented in the same fashion since we started doing them back in September. We turned on streaming a month ago but this only started the last week. Any ideas on what we could look at outside of what's in the docs? I don't see any issues with how we run the experiments.
r
Hi Erik, I haven't forgotten about you! We had a huge influx of support requests over the past 2 days and I'm still getting caught up.
It does seem odd to me that these warnings would pop up now, after the Experiment has been running for many months. I'm wondering if the experiment duration has simply run long enough to "collect" lots of minor issues over time, which are now being surfaced. I'm going to ask our Data Scientist to chime in here during business hours on Friday. For the Sample Ratio Mismatch (SRM), it's possible that 1.3% is a statistically probable difference and nothing to worry about. Our Data Scientist can help you with this. As for the Multiple Exposures Warning, this generally is an issue with the implementation. Could you send me a screenshot or code snippet of each of the following? 1. The GrowthBook-related code, including the trackingCallback. 2. The Experiment Assignment Query, found in Left Navbar --> Metrics and Data --> Data Sources --> Click on the relevant one --> Experiment Assignment Query
j
Hi @brief-honey-45610 Some of the experiments have only run for days or a week or two. I can get more specifics as needed.
b
Ah, OK, so the GrowthBook-related code in your application hasn't changed, but you've launched new experiments and some of them are getting these warnings. It would be helpful to have a screenshot of the Experiment detail page for each experiment that is surfacing the SRM or ME Warnings.
j
I'll get you some screenshots today.
Here is the experiment assignment query, removed our table names:
Copy code
SELECT
  user_pseudo_id as anonymous_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
  geo.country as country,
  traffic_source.source as source,
  traffic_source.medium as medium,
  device.category as device,
  device.web_info.browser as browser,
  device.operating_system as os
FROM
  `xxxxxxx`.`xxxxxxx`.`events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param
WHERE
  (_TABLE_SUFFIX BETWEEN '{{date startDateISO "yyyyMMdd"}}' AND '{{date endDateISO "yyyyMMdd"}}' or 
  _TABLE_SUFFIX BETWEEN 'intraday_{{date startDateISO "yyyyMMdd"}}' AND 'intraday_{{date endDateISO "yyyyMMdd"}}')
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND user_pseudo_id is not null
Here is our `initGrowthBook`method that has the callback, I took out identifers. The server side implementation is for our cart, we do not use 3rd party JS on our checkout, so the callback sends to GA4 on the server side via an ajax call.
Copy code
import { GrowthBook } from "@growthbook/growthbook";

export async function initGrowthBook(serverside = false) {
    let gbClientKey = "";
    let gbEnableDev = true;
    if (PRODUCTION) {
        gbClientKey = "";
        gbEnableDev = false;
    }

    // Custom generated ID for user
    let gbuuid = getUUID();

    const growthbook = new GrowthBook({
        apiHost: "",
        clientKey: gbClientKey,
        enableDevMode: gbEnableDev,
        // Targeting attributes
        attributes: {
            id: gbuuid,
        },
        trackingCallback: (experiment, result) => {
            if (serverside) {
                <http://DOI.Ajax.post|DOI.Ajax.post>(
                    '/checkout/experiment.json',
                    {
                        data: {
                            experiment_id: experiment.key || "",
                            variation_id: result.key || "",
                            gb_user_id: gbuuid || "",
                        },
                        dataType: 'json',
                        success: () => {
                        },
                        error: (json) => {
                            if (!PRODUCTION) {
                                console.log("GrowthBook error");
                                console.log(json);
                            }
                        },
                    }
                );
            } else {
                window.jxEventBus.push({
                    event: "experiment",
                    experiment_id: experiment.key,
                    variation_id: result.key,
                    gb_user_id: gbuuid,
                });
            }

            if (!PRODUCTION) {
                console.log("Experiment Viewed", {
                    experimentId: experiment.key,
                    variationId: result.key,
                    variationValue: result.value,
                });
            }
        },
    });
    await growthbook.loadFeatures({ autoRefresh: true });

    return growthbook;
}
// Attach to the window object
if ("undefined" !== typeof window) {
    window.initGrowthBook = initGrowthBook;
}

// This is taken from GrowthBook
// <https://docs.growthbook.io/guide/GA4-google-analytics#generating-your-own-id>
const getUUID = () => {
    const COOKIE_NAME = "gbuuid";
    const COOKIE_DAYS = 400; // 400 days is the max cookie duration for chrome

    // use the browsers crypto.randomUUID if set
    const genUUID = () => {
        if(window?.crypto?.randomUUID) return window.crypto.randomUUID();
        return ([1e7]+-1e3+-4e3+-8e3+-1e11).replace(/[018]/g, c =>
        (c ^ crypto.getRandomValues(new Uint8Array(1))[0] & 15 >> c / 4).toString(16)
        );
    }
    const getCookie = (name) => {
        let value = `; ${document.cookie}`;
        let parts = value.split(`; ${name}=`);
        if (parts.length === 2) return parts.pop().split(';').shift();
    }
    const setCookie = (name, value) => {
        var d = new Date();
        d.setTime(d.getTime() + 24*60*60*1000*COOKIE_DAYS);
        document.cookie = name + "=" + value + ";path=/;expires=" + d.toGMTString();
    }

    // get the existing UUID from cookie if set, otherwise create one and store it in the cookie
    if(getCookie(COOKIE_NAME)) return getCookie(COOKIE_NAME);

    const uuid = genUUID();
    setCookie(COOKIE_NAME, uuid);
    return uuid;
}
I'll give you a recent experiment here:
Screenshot 2024-01-12 at 10.10.01 AM.png
This is "show/hide" a/b test. The content is either hidden or displayed.
relevant JS:
Copy code
document.addEventListener("DOMContentLoaded", async () => {
    if ("function" !== typeof window.initGrowthBook) {
        return;
    }

    const growthbook = await window.initGrowthBook();
    const featureName = "items-to-consider";
    const feature = growthbook.isOn(featureName);
    
    if (feature) {
        const items = document.getElementById("jx-abt-items-to-consider");
        items.style.display = "block";
    }
});
b
Thanks for all of this info, Erik, it makes troubleshooting so much easier to have all the info up front!
My colleague on the Data Science team should be coming online in the next hour or two, and I've asked him to take a look
j
No problem. Most the experiments run very similar and we haven't had any real issues til recently.
Appreciate it!
h
Hey Erik. Nothing is jumping out at me from what you've shared, unfortunately. Give me a little bit of time because I want to make sure I fully understand what's going on here to help you. Couple questions: 1. is it always that the control is lower than the treatment across experiments? 2. Can you also share with me your
Experiment Assignment Query
for the Identifier Type you're using for this experiment? This is defined on your Data Source page. a. Follow-up, is there a column specifically for the
gb_user_id
and are you using it, or are you using some other identifier type in that query?
j
@helpful-application-7107 Here is another experiment. Very similar split.
This one only has multiple exposures.
I'll get you the other thing, one sec
h
Yeah, that's making me a bit nervous. Pretty strange.
Can you show me the split on that one with multiple exposures only?
My guess is it will be similar, but probably just too small N to have stat sig SRM test
j
The split, meaning traffic split?
h
Yeah, you can click that little arrow next to "total users" to see the table
j
Also we're using "Anonymous Visitors". Identifier:
anonymous_id
Dimension Columns:
country
,
source
,
medium
,
device
,
browser
,
os
ok, one sec
Screenshot 2024-01-12 at 12.22.27 PM.png
I posted the assignment query above before you joined, I can share that again if you need me to.
h
Oh shoot, I missed it, thanks.
Yeah as you can see that split is again 48.5/51.5
j
I left off our table names with XXXXX
h
yeah ofc no problem
j
yeah, the split is strange that it's lower on the baseline/control
h
Ok thanks. I'll try and get back to you more in a little bit, but it seems like what's happening is that for some reason some of your control users are showing up in your data as treatment users a few times. This could be happening if for some reason the
user_psuedo_id
coming out of GA4 is getting out of sync with the
gbuuid
that you're using for actual hashing. Basically, if there were a way to always use
gbuuid
throughout the system (in your experiment assignment queries, in your metrics), this problem would go away, but that solution isn't always quite so simple. Let me discuss with my team because this happens sometimes with GA4 and we need to ensure we have a better playbook for diagnosing and improving this situation.
j
Gotcha. I appreciate the help. I've done most the implementation on our end (our frontend/JS) but some of this is new and/or foreign to me. I did not setup out actual Growthbook or GA4.
h
Hey Erik. I'm seeing a lot of these are in later phases (e.g. phase 1, phase 2, etc.). Did you re-randomize when changing phases? what's the history of the phases here?
j
So we're using phases to start the test again. Maybe there is a better way to do this but what I do is start an experiment when in development. Once it's deployed in production... could be weeks in some cases, we start a new phase for production.
Our development has it's own GA4 and is not attached as a data source for Growthbook. So no date is shared, it's just for me to test to make sure everything fires correctly.
h
I see. And users aren't getting assigned/exposed, right?
j
Maybe one other dev would get assigned but that data would go to our dev GA4, not production. The trackingCallback for the specific experiment wouldn't fire until deployed to production... if that all makes sense.
h
Yeah, that makes sense.
It's not about where the data goes; it's about whether users exposed in phase 1 cause carryover bias in phase 2 by being more likely to return if they were in the variation. But that's not a problem given your set up where phase 1 is just devs testing.
j
Ok gotcha, that makes sense. Now, there were some phases added in production.
Example, had a bug and we used a new phase when the bug was fixed.
Based on what you just said, that could cause an issue?
h
Yeah, so that can cause carryover bias. We just launched a new flow where we recommend everyone re-randomize when they create a new phase to help mitigate those issues.
j
Can you re-randomize from the GrowthBook dashboard?
h
There's now a flow to create a new phase that goes like so on the page of a running experiment: Make Changes -> Start a new Phase -> New Phase, re-randomize.
j
Gotcha. We hosting ourselves. I'll have our OPS update.
h
But I'm not convinced that's causing all of your problems here, since I don't think you did this "new phase after a bug" flow in every experiment? It seems to me the crux of the issue remains that your
gbuuid
is getting out of sync with the
user_psuedo_id
that you are actually using in the queries to check what experiment membership people are in. There is normally a little bit of error in this process that is worth it to make sure you have fast hashing ("build your own
gbuuid
rather than waiting for ga4 to load the pseudo id") but consistent ids for metric tracking ("using the
user_psuedo_id
from GA4"). They can get out of sync when you aren't storing the gbuuid in a cookie that has long life, akin to the
user_pseudo_id
(which I don't think is your problem), or if there's something else preventing the gbuuid being set consistently for a given
user_pseudo_id
.
j
I am using a version of the gbuuid code from the docs and passing that
Copy code
attributes: {
            id: gbuuid,
        },
h
I think if you ran a query like this, you could see if there was indeed some pattern to the
gbuuid
to
user_pseudo_id
mapping:
Copy code
SELECT
  user_pseudo_id,
  gb_user_id_param.value.string_value AS gb_user_id,
  TIMESTAMP_MICROS(event_timestamp) as timestamp,
  experiment_id_param.value.string_value AS experiment_id,
  variation_id_param.value.int_value AS variation_id,
FROM
  `xxxxxxx`.`xxxxxxx`.`events_*`,
  UNNEST(event_params) AS experiment_id_param,
  UNNEST(event_params) AS variation_id_param,
  UNNEST(event_params) AS gb_user_id_param
WHERE
  _TABLE_SUFFIX BETWEEN '20230101' AND '20230112'
  AND event_name = 'experiment_viewed'  
  AND experiment_id_param.key = 'experiment_id'
  AND variation_id_param.key = 'variation_id'
  AND gb_user_id_param.key = 'gb_user_id'
  AND user_pseudo_id is not null
j
I got nothing back from that query
h
Yes, but the issue is that in your tracking callback you are storing that
id
as
gb_user_id
but in the actual analysis query you are using
user_pseudo_id
. This is what most people do because it ends up being much easier to ensure that you have
user_pseudo_id
everywhere in your GA4 data because it always appends it; if you wanted to use
gb_user_id
you'd have to make sure it gets tracked along with every one of your metrics. So your set up looks normal and typical, but something is going on that is causing certain users (as defined by some given
user_psuedo_id
) to get re-assigned new
gbuuid
values, and therefore get hashed again
j
nm, the dates are 2023.
h
oh yeah, mb
j
ha no worries, I still am in 2023
ok, so some of this was definitely not understanding everything from the get go.
h
Yeah, this is definitely way in the weeds.
j
ok, so you're saying we store the id as gb_user_id but don't use it ?
h
It is getting used for hashing but not for analysis. It's hard to use it for analysis because to join experiment exposures to metrics you would have to ensure it gets tracked any time you do any kind of ga4 tracking.
Normally using
user_pseudo_id
is good enough because you always have it for everything from GA4, and it normally lines up well enough with a custom
gbuuid
. Many people get some multiple exposures (small percentage) when they mismatch, but it normally doesn't trigger SRM.
j
Gotcha.
I had to add in the intraday tables to get data back from that query
h
What you should see is that the
user_pseudo_id
to
gb_user_id
mapping is not 1:1.
j
Small subset of data
h
What might be interesting is setting that to a CTE like
users
and doing something like
Copy code
SELECT
  user_psuedo_id,
  experiment_id,
  COUNT(DISTINCT variation_id) as n_variations,
  COUNT(DISTINCT gb_user_id) as n_gb_uids
FROM
  users
GROUP BY 1, 2
ORDER BY 4 DESC
Then you'll see how many gb_user_ids are being generate for some "user_pseudo_id"
j
How do I select from just users from GA4/Big Query?
h
It would be something like:
Copy code
WITH users AS (
  <...the query that returned the data you pasted ...>
)

SELECT
  user_psuedo_id,
  experiment_id,
  COUNT(DISTINCT variation_id) as n_variations,
  COUNT(DISTINCT gb_user_id) as n_gb_uids
FROM
  users
GROUP BY 1, 2
ORDER BY 4 DESC
j
ahh gotcha
h
But at the end of the day I'm just demonstrating how to see/confirm that the problem is that the gb_user_id keeps getting reset within a user_pseudo_id, and I really don't know the best way in your set up to make sure that that the gb cookie is as consistent as possible with however ga4 is getting your user_pseudo_id
j
Screenshot 2024-01-12 at 4.18.32 PM.png
Gotcha. This type of data analysis is extremely new to me and some is way over my head.
h
No worries. But you can see that for some reason, that first user_pseudo_id is getting 4 different gb_user_ids associated with it within just one experiment.
So for some reason that cookie isn't persisting for that user_pseudo_id.
j
huh, ok that makes more sense to me
So gotta figure out the root cause to why the cookie doesn't persist.
h
This normally happens at some small percentage, like I've said, but yours seems bigger and more problematic than usual. You might want to talk to someone on your end who set up GA4 now that you have a clearer understanding of what's going on. I don't think anything in growthbook should have changed recently to affect this. If you weren't seeing this in october/november with similar experiment sample sizes, then maybe something changed in how you are managing cookies, or how GA4 user_pseudo_ids are persisting, or something like that.
Yeah, the root issue is that gbuserid isn't persisting within a user_pseudo_id
j
Ok, yeah I'll bring this up next week in our stand up. I appreciate the help.
h
Sure, sorry there's no easy resolution right now! Sometimes a problem sticks out from the info you shared with me but nothing did.
j
No worries. This points me in a direction to try. Have a great weekend!