查询重复字段否定不包括

时间:2017-03-06 11:20:04

标签: sql google-bigquery

我尝试使用NOT contains(regex)

查询字符串重复字段

这是查询,其中nickname是字符串的数组(重复):

 SELECT
    name
  FROM
    [mytable]
  WHERE
     (NOT  REGEXP_MATCH (nickname, '(query)'))

问题是当用户在昵称下至少有两个值时,如果我使用NOT查询它们将被返回

有关: NOT REGEXP_MATCH(昵称,'(jonny)')

name     nickname 

john    [johhny,jonny]
jon     [jonny]

将返回约翰,它不应该。

2 个答案:

答案 0 :(得分:0)

使用NOT EXISTSARRAY子查询与standard SQL表达此类逻辑会更容易。例如,

#standardSQL
WITH Names AS (
  SELECT 'john' AS name, ['johnny', 'jonny'] AS nicknames UNION ALL
  SELECT 'jon' AS name, ['jonny'] AS nicknames
)
SELECT
  name
FROM Names
WHERE NOT EXISTS (
  SELECT 1 FROM UNNEST(nicknames) AS nickname
  WHERE nickname LIKE '%johnny%'
);

作为另一个例子,您可能只想包含与子字符串不匹配的昵称:

#standardSQL
WITH Names AS (
  SELECT 'john' AS name, ['johnny', 'jonny'] AS nicknames UNION ALL
  SELECT 'jon' AS name, ['jonny'] AS nicknames
)
SELECT *
FROM (
  SELECT
    name,
    ARRAY(SELECT nickname FROM UNNEST(nicknames) AS nickname
          WHERE nickname NOT LIKE '%johnny%') AS nicknames
  FROM Names
)
WHERE ARRAY_LENGTH(nicknames) > 0;

答案 1 :(得分:0)

如果您仍然使用BigQuery Legacy SQL,则下面是相应的解决方案

  
#legacySQL
SELECT name FROM (
  SELECT
    name, SUM(nicknames LIKE '%johnny%') WITHIN RECORD AS matches
  FROM [mytable]
)
WHERE matches = 0