BigQuery转换率计算不正确

时间:2018-12-04 15:35:45

标签: google-analytics google-bigquery standard-sql

我正在尝试在BigQuery中获取基准数据集。在此数据集中,我想获取数据,例如会话,跳出率,新用户,但最重要的是转换率。这些似乎无法正确计算。对于不应该为null的值,转换率主要给出null值,不幸的是,非null的值是错误的。我一直在搜寻一些关于跳出率等计算的答案,在我看来,转化率的计算应该像这样工作。

我还尝试了不同的公式来计算转化率,但格式与下面的代码相同。

编辑:在会话计算中肯定存在错误,因为它提供的会话少于用户

我正在使用以下代码:

    SELECT
  actiontimestamp,
  medium,
  source,
  users,
  newUsers,
  sessions,
  ROUND(SAFE_DIVIDE(pageviews, sessions), 0) AS pages_per_session,
  CASE
    WHEN sessions = 0 THEN 0
    ELSE ROUND(SAFE_DIVIDE(bounces, sessions), 2)
  END AS bounce_rate,
  ROUND(avgTimeOnSite, 2)
  transactions,
  (SAFE_DIVIDE(transactions, sessions)*100) AS conversion_rate

FROM (
  SELECT
    actiontimestamp,
    medium,
    source,
    COUNT(fullVisitorId) AS users,
    COUNT(DISTINCT fullVisitorId) AS newUsers,
    COUNT(transaction) AS transactions,
    COUNT(pageviews) AS pageviews,
    SUM(bounces) AS bounces,
    SUM(sessions) AS sessions,
    AVG(avgTimeOnSite) AS avgTimeOnSite
  FROM (
    SELECT
      fullVisitorId,
      visitStartTime,
      pageviews,
      actiontimestamp,
      avgTimeOnSite,
      transaction,
      medium,
      source,
      CASE
        WHEN hitNumber = first_interaction THEN bounces
        ELSE 0
      END AS bounces,
      CASE
        WHEN hitNumber = first_hit THEN visits
        ELSE 0
      END AS sessions
    FROM (
      SELECT
        fullVisitorId,
        visitStartTime,
        IFNULL(totals.pageviews,
          0) AS pageviews,
        totals.bounces,
        totals.visits,
        hits.hitNumber,
        MIN(IF(hits.isInteraction IS NOT NULL,
            hits.hitNumber,
            0)) OVER (PARTITION BY fullVisitorId, visitStartTime) AS first_interaction,
        MIN(hits.hitNumber) OVER (PARTITION BY fullVisitorId, visitStartTime) AS first_hit,
        FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP_SECONDS(SAFE_CAST(visitStartTime AS INT64)), "Europe/London") AS actiontimestamp,
        totals.timeOnSite AS avgTimeOnSite,
        hits.transaction.transactionId AS transaction,
        trafficSource.medium AS medium,
        trafficSource.source AS source
      FROM
        `ga_table_id.ga_sessions_*`,
        UNNEST(hits) AS hits
      WHERE
        _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', '2018-11-01')
        AND FORMAT_DATE('%Y%m%d', '2018-11-30')))
  GROUP BY
    actiontimestamp,
    medium,
    source)
ORDER BY
  actiontimestamp DESC

2 个答案:

答案 0 :(得分:0)

变量应该相反吗?

SAFE_DIVIDE(transactions, sessions)

答案 1 :(得分:0)

由于在不同级别上定义了会话,因此代码无法正常工作。但是,当我制作2个单独的表并加入这些表时,效果很好。这两个表使用不同的路由来获取会话,从而使计算工作正常。

SELECT
  actiontimestamp,
  medium,
  source,
  sessions,
  ROUND(SAFE_DIVIDE(pageviews,
      sessions), 0) AS pages_per_session,
  CASE
    WHEN sessions = 0 THEN 0
    ELSE ROUND(SAFE_DIVIDE(bounces,
      sessions), 2)
  END AS bounce_rate,
  ROUND(avgTimeOnSite, 2) AS avgTimeOnSite
FROM (
  SELECT
    actiontimestamp,
    medium,
    source,
    AVG(pageviews) AS pageviews,
    SUM(bounces) AS bounces,
    SUM(sessions) AS sessions,
    AVG(avgTimeOnSite) AS avgTimeOnSite
  FROM (
    SELECT
      fullVisitorId,
      pageviews,
      actiontimestamp,
      avgTimeOnSite,
      medium,
      source,
      CASE
        WHEN hitNumber = first_interaction THEN bounces
        ELSE 0
      END AS bounces,
      CASE
        WHEN hitNumber = first_hit THEN visits
        ELSE 0
      END AS sessions
    FROM (
      SELECT
        fullVisitorId,
        visitStartTime,
        IFNULL(totals.pageviews,
          0) AS pageviews,
        totals.bounces,
        totals.visits,
        totals.newVisits AS newVisits,
        hits.hitNumber,
        MIN(IF(hits.isInteraction IS NOT NULL,
            hits.hitNumber,
            0)) OVER (PARTITION BY fullVisitorId, visitStartTime) AS first_interaction,
        MIN(hits.hitNumber) OVER (PARTITION BY fullVisitorId, visitStartTime) AS first_hit,
        FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP_SECONDS(SAFE_CAST(visitStartTime AS INT64)), "Europe/London") AS actiontimestamp,
        totals.timeOnSite AS avgTimeOnSite,
        trafficSource.medium AS medium,
        trafficSource.source AS source
      FROM
        `gatable.ga_sessions_*`,
        UNNEST(hits) AS hits
      WHERE
        _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', '2018-11-01')
        AND FORMAT_DATE('%Y%m%d', '2018-11-30')))
  GROUP BY
    actiontimestamp,
    medium,
    source)
ORDER BY
  actiontimestamp DESC

然后第二张表将是:

SELECT
  actiontimestamp,
  medium,
  source,
  users,
  newUsers,
  sessions,
  transactions,
  ROUND((SAFE_DIVIDE(transactions,
      sessions)*100), 2) AS conversion_rate
FROM (
  SELECT
    FORMAT_TIMESTAMP("%Y-%m-%d", TIMESTAMP_SECONDS(SAFE_CAST(visitStartTime AS INT64)), "Europe/London") AS actiontimestamp,
    SUM(totals.transactions) AS transactions,
    COUNT(DISTINCT fullVisitorId) AS users,
    SUM(totals.visits) AS sessions,
    COUNT(totals.newVisits) AS newUsers,
    trafficSource.medium AS medium,
    trafficSource.source AS source
  FROM
    `91775944.ga_sessions_*`

  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', '2018-11-01')
    AND FORMAT_DATE('%Y%m%d', '2018-11-30')
  GROUP BY
    actiontimestamp,
    medium,
    source
   )

然后将这些表连接到actiontimestamp,medium和source上,我得到了所需的结果。