Hi team, I have been running A/A tests a few times...
# ask-questions
p
Hi team, I have been running A/A tests a few times now and run into the issue that they will become statistically significant (lost or won). As far as I understand, this should not happen, right? I set them up using the Visual editor that both control and variation log a different message to console. Everything else is equal. Here's the json object loaded by the website:
Copy code
{
  "status": 200,
  "features": {},
  "experiments": [
    {
      "key": "test_aa_v2",
      "status": "running",
      "variations": [
        {
          "css": "",
          "js": "console.log(\"Version0\");",
          "domMutations": []
        },
        {
          "css": "",
          "js": "console.log(\"Version1\");",
          "domMutations": []
        }
      ],
      "hashVersion": 2,
      "hashAttribute": "deviceId",
      "urlPatterns": [
        {
          "include": true,
          "type": "regex",
          "pattern": ".*\\/products\\/.*"
        }
      ],
      "weights": [
        0.5,
        0.5
      ],
      "filters": [],
      "seed": "test_aa_v2",
      "phase": "0",
      "coverage": 1,
      "meta": [
        {
          "key": "0"
        },
        {
          "key": "1"
        }
      ]
    }
  ],
  "dateUpdated": "2023-09-18T16:50:28.162Z"
}
This test runs on a PDPs and I just compare how often the ATC button is clicked, which statistically should be very even across a high number of events. Any ideas why results lean to one side?
f
Hi Sasha
sorry for the delay
Let me follow up with the team, @helpful-application-7107, do you have thoughts?
👍 1
h
Hi, there's a long discussion of A/A tests: https://linen.growthbook.io/t/13142527/hi-growthbook-team-having-the-same-issue-as-i-described-here#76332a3f-3c02-4ee5-b01a-b3de79fdc260 I gave answers as
helpful-application-7107
in that thread. Please take a look and let me know if you have follow-up questions.
p
Thanks for the link to the discussion. I understand the general idea that A/A tests can become significant. It still feels "off" though. I ran another A/A test for the same scenario ( A/A, just with console.log() ) and it shows the same results. I'll let it running, to see what happens. But -20% and 99.7% significance to lose feels quite strong.
h
Hmm, having two with this strong of results and trending in the same direction does lead me to be a little suspicious as well. Let me check in with the team that owns the visual editor.
👍 1
What happened in the
v1
that you ran?
p
It had multi user exposure, so I removed it
I restarted the test v3 in a new phase (deleted the previous phase). Looks better, but still extreme:
image.png
h
Did you choose to re-randomize? New phases can suffer from carry-over bias if you don't re-randomize.
p
Sorry, what do you mean by re-randomize?
h
Hmm, maybe it isn't an option in the visual editor at the moment. Did you start the new phase from clicking the three dots in the far top right and editing phases in the Experiment page?
p
Yes, I deleted the phase 1 which reset the experiment to "draft"
h
I see.
So that might not provide a fully new test as it will not re-randomize devices and past user behavior could influence who enters the new test. The safest thing to re-run the A/A test with a new experiment key. If you want to do that while I get some more input from our side that may be informative.
p
Sure, I'll set that up.
OK, recreated the experiment from scratch - will send you an update tomorrow
h
Any update here?
p
Yes, it looks much better now. My take away is to create new experiments and not delete phases and start over with existing experiments. Thanks for the help, Luke!
h
Yeah, so a new phase does not re-randomize unless you're explicitly choose to do so (only available in some flows) so it will be as "unlucky" as the previous draw for A/A experiments, and in real experiments it can cause carry-over bias.
b
Ah, I wonder if that's why I am seeing weird values for experiments? I have an existing A/B experiment, and I modified it to use a namespace, and created an A/A experiment in the other half of the namespace. About half of the users are in the A/B experiment and they ALL have the associated feature 'true'. The other half are in the A/A experiment and their associated feature is 'false'.
I'm using the SDK, NOT the visual editor.
If it is this issue @helpful-application-7107 do I need to create new features as well, or only new experiments using the existing features? Thank you!
h
I have an existing A/B experiment, and I modified it to use a namespace
This is an unsafe operation and it can be difficult to make guarantees about how the user experience will change over time. We should make it clear in the app that there isn't a safe way to do this without creating new features/re-randomizing if we don't already.