Hi team. I am setting up AWS Athena as the data so...
# ask-questions
c
Hi team. I am setting up AWS Athena as the data source for my metrics in GrowthBook. I noticed that there is no explicit requirement of the schema of the resulting SQL from athena. How does GB metrics associate event records with ongoing experiments? Is there any extra setup needed?
f
The SQL is adjustable
once you connect to the data source, you'll be able to edit the query to pull the assignment information - which users were exposed to experiments, and what variation they got
c
so I will need to write the experiments and variations assigned to the user, at the time of the event, in Athena?
f
yes, in some way, there needs to be a way to pull that information. On the assignment side, we have the 'trackingCallback' which is where you would record this info
c
Does that means we have to merge the events in the table in Athena with another data source that is populated from tracking Callbacks. Would it do any harm if we were to record all the ongoing experiments and variations, along with the user_id and timestamp, and all other into Athena?
I am still a little confused how to join the user experiments assignments and our events data in Athena.
can you provide an example of a SQL query that has all the information to display AB test results in metrics?
f
the trackingCallback is up to you to define what to do with that info. Usually people use their same event tracking to record that information
so there are two queries you need
One is the assignment query, which is which users were exposed to which experiments (and should return the randomization unit, experiment ID, variation ID, and time it happened)
then you add metric queries, which can be anything, but needs to return the same randomization unit as used for exposure, and the value for the metric
then GrowthBook will do all the joining and statistical analysis
Here are some examples:
Copy code
SELECT
  userid as user_id,
  timestamp as timestamp,
  experimentid as experiment_id,
  variationid as variation_id,
  browser,
  country,
  date_trunc('week', timestamp) as weeknumber
FROM
  sample.experiment_viewed
(this is an example from segment, that has a few other columns returned which can be used as dimensions for experiment analysis)
then a metric might be something like this: Orders per user:
Copy code
SELECT
  userId as user_id,
  anonymousId as anonymous_id,
  timestamp as timestamp,
  1 as value
FROM
  sample.orders
c
i see. that makes more sense. Let me play with it a bit. Thanks a lot for the clear explanation
Hi @fresh-football-47124, is it possible to join the user expeirment assignment query with the metrics from 2 different data sources?
b
i have a follow up to this, can we put the experiment id and variation id to all of our event metrics to remove the need to do the join?
f
Zhengyuan, currently that is not possible as we cannot run a join across data sources - but if the same data for assignments is available on both data sources, you can add a second experiment report for that second data source. Not ideal, but it would work - can I ask the use case here?
Noel: That's interesting - I don't think we have that ability at the moment- the queries we create expect to join between the metric and assignment queries. You could open a GitHub issue to ask if we can support a denormalized metric+assignment data
b
im assuming on the queries we create if we have our own data warehouse using athena or redshift, we can avoid the join?
f
I dont think so as currently coded