Hi team I am setting up AWS Athena as the data source for my GrowthBook Users #ask-questions

Hi team. I am setting up AWS Athena as the data so...

calm-pilot-30179

04/03/2024, 3:53 AM

Hi team. I am setting up AWS Athena as the data source for my metrics in GrowthBook. I noticed that there is no explicit requirement of the schema of the resulting SQL from athena. How does GB metrics associate event records with ongoing experiments? Is there any extra setup needed?

fresh-football-47124

04/03/2024, 3:57 AM

The SQL is adjustable

fresh-football-47124

04/03/2024, 3:57 AM

once you connect to the data source, you'll be able to edit the query to pull the assignment information - which users were exposed to experiments, and what variation they got

calm-pilot-30179

04/03/2024, 4:06 AM

so I will need to write the experiments and variations assigned to the user, at the time of the event, in Athena?

fresh-football-47124

04/03/2024, 4:07 AM

yes, in some way, there needs to be a way to pull that information. On the assignment side, we have the 'trackingCallback' which is where you would record this info

calm-pilot-30179

04/03/2024, 4:28 AM

Does that means we have to merge the events in the table in Athena with another data source that is populated from tracking Callbacks. Would it do any harm if we were to record all the ongoing experiments and variations, along with the user_id and timestamp, and all other into Athena?

calm-pilot-30179

04/03/2024, 4:28 AM

I am still a little confused how to join the user experiments assignments and our events data in Athena.

calm-pilot-30179

04/03/2024, 4:29 AM

can you provide an example of a SQL query that has all the information to display AB test results in metrics?

fresh-football-47124

04/03/2024, 4:29 AM

the trackingCallback is up to you to define what to do with that info. Usually people use their same event tracking to record that information

fresh-football-47124

04/03/2024, 4:30 AM

so there are two queries you need

fresh-football-47124

04/03/2024, 4:31 AM

One is the assignment query, which is which users were exposed to which experiments (and should return the randomization unit, experiment ID, variation ID, and time it happened)

fresh-football-47124

04/03/2024, 4:32 AM

then you add metric queries, which can be anything, but needs to return the same randomization unit as used for exposure, and the value for the metric

fresh-football-47124

04/03/2024, 4:32 AM

then GrowthBook will do all the joining and statistical analysis

fresh-football-47124

04/03/2024, 4:32 AM

Here are some examples:

fresh-football-47124

04/03/2024, 4:34 AM

Copy code

SELECT
  userid as user_id,
  timestamp as timestamp,
  experimentid as experiment_id,
  variationid as variation_id,
  browser,
  country,
  date_trunc('week', timestamp) as weeknumber
FROM
  sample.experiment_viewed

(this is an example from segment, that has a few other columns returned which can be used as dimensions for experiment analysis)

fresh-football-47124

04/03/2024, 4:35 AM

then a metric might be something like this: Orders per user:

Copy code

SELECT
  userId as user_id,
  anonymousId as anonymous_id,
  timestamp as timestamp,
  1 as value
FROM
  sample.orders

calm-pilot-30179

04/03/2024, 4:38 AM

i see. that makes more sense. Let me play with it a bit. Thanks a lot for the clear explanation

calm-pilot-30179

04/04/2024, 11:24 PM

Hi @fresh-football-47124, is it possible to join the user expeirment assignment query with the metrics from 2 different data sources?

brainy-address-27020

05/17/2024, 2:14 PM

i have a follow up to this, can we put the experiment id and variation id to all of our event metrics to remove the need to do the join?

fresh-football-47124

05/17/2024, 2:59 PM

Zhengyuan, currently that is not possible as we cannot run a join across data sources - but if the same data for assignments is available on both data sources, you can add a second experiment report for that second data source. Not ideal, but it would work - can I ask the use case here?

fresh-football-47124

05/17/2024, 3:02 PM

Noel: That's interesting - I don't think we have that ability at the moment- the queries we create expect to join between the metric and assignment queries. You could open a GitHub issue to ask if we can support a denormalized metric+assignment data

brainy-address-27020

05/17/2024, 4:45 PM

im assuming on the queries we create if we have our own data warehouse using athena or redshift, we can avoid the join?

fresh-football-47124

05/17/2024, 7:59 PM

I dont think so as currently coded

76 Views

Open in Slack

Previous Next