I am trying to see how accurate firebase predictions are. I have exported firebase predictions and analytics events to BigQuery.
In order to test the performance of predictions, I wrote a query which harvests predictions for the churn
event, then joined that with app_remove
events from analytics.
The problem was that for many, many users, the app_remove
event had been logged before the prediction date! As if google waited for the user to churn, then place the prediction of churn in the table. Am I missing something?
here is my code:
SELECT
b.day AS prediction_day,
b.user_id,
c.day AS churn_day
FROM
(
SELECT
MIN(day) AS day,
user_id
FROM
(
SELECT
CAST(u.prediction_time AS DATE) AS day,
u.user_id,
p.id,
p.probability,
p.observed_label
FROM
`Predictions_table` AS u,
u.predictions AS p
)a
WHERE
a.probability > 0.2
AND a.id = 'churn'
AND NOT (a.user_id IS NULL OR a.user_id = '')
GROUP BY
user_id
)b
LEFT JOIN
(
SELECT
MAX(day) AS day,
user_id
FROM
(
SELECT
CAST(TIMESTAMP_MICROS(event_timestamp) AS DATE) AS day,
user_id
FROM
`Analytics_table`
WHERE
event_name = 'app_remove'
AND NOT (user_id IS NULL OR user_id = '')
)e
GROUP BY
user_id
)c
ON b.user_id = c.user_id