REGEXP_REPLACE而非

时间:2019-04-18 08:28:35

标签: google-bigquery

我想知道是否有人可以帮助我。

我正在尝试将REGEXP_REPLACE查询放在一起,该查询将替换数据,但前提是该字符串符合特定条件。

这是我提出的查询:

SELECT
  #select all fields excluding those under the hits record
  * EXCEPT (hits),
  #start array - this rebuilds the hit record
  ARRAY(
  SELECT
    #unnest the hit field, select each field excluding those under the page record
    AS STRUCT * EXCEPT (page),
    (
    SELECT
      #select all page fields excluding pageTitle
      AS STRUCT page.* EXCEPT (pagePath),
      #remove the query parameter from the pagePath fields
      REGEXP_REPLACE(page.pagePath, r'\/invitations\/([a-zA-Z0-9]{8})\/', '/invitations/([a-zA-Z0-9]{8})/redacted') AS pagePath) AS page
  WHERE
    AND NOT page.pagePath= (r'\/invitations\/[a-zA-Z0-9]{8}\/(ltd|limited|co|business')
  FROM
    UNNEST(hits) ) AS hits
FROM
  `Test.Test.ga_sessions_20190401`

查询不起作用,而我苦苦挣扎的部分是WHERE NOT。除上述内容外,我还尝试使用AND NOT REGEXP_MATCH,但我无法使用它。

我只是想知道是否有人可以看看这个问题,并提供一些指导来解决这个问题?

非常感谢和问候

克里斯

2 个答案:

答案 0 :(得分:1)

  

我只是想知道是否有人可以看看这个问题,并就如何解决这个问题提供一些指导?

您的代码中有两个问题:

  1. FROMWHERE
  2. 之后
  3. AND之后的WHERE

这是正确的Sql:

SELECT
  #select all fields excluding those under the hits record
  * EXCEPT (hits),
  #start array - this rebuilds the hit record
  ARRAY(
  SELECT
    #unnest the hit field, select each field excluding those under the page record
    AS STRUCT * EXCEPT (page),
    (
    SELECT
      #select all page fields excluding pageTitle
      AS STRUCT page.* EXCEPT (pagePath),
      #remove the query parameter from the pagePath fields
      REGEXP_REPLACE(page.pagePath, r'\/invitations\/([a-zA-Z0-9]{8})\/', '/invitations/([a-zA-Z0-9]{8})/redacted') AS pagePath) AS page
  FROM
    UNNEST(hits) AS hits
  WHERE 
    NOT page.pagePath= (r'\/invitations\/[a-zA-Z0-9]{8}\/(ltd|limited|co|business')
    )
FROM
  `Test.Test.ga_sessions_20190401`

答案 1 :(得分:1)

以下是针对Bigquery标准SQL的。以下解决方案的好处是它不会更改基础表的结构,只是根据需要进行替换

#standardSQL
SELECT * REPLACE(
  ARRAY(
    SELECT AS STRUCT * REPLACE(
      (SELECT
        AS STRUCT page.* REPLACE(
        REGEXP_REPLACE(page.pagePath, r'\/invitations\/([a-zA-Z0-9]{8})\/', '/invitations/([a-zA-Z0-9]{8})/redacted') AS pagePath)
      ) AS page)
    FROM UNNEST(hits) AS hits
    WHERE NOT page.pagePath= (r'\/invitations\/[a-zA-Z0-9]{8}\/(ltd|limited|co|business')
  ) AS hits)
FROM `Test.Test.ga_sessions_20190401`   

请注意使用SELECT * REPLACE代替SELECT * EXCEPT