Question

我试图仅在1个表中获取记录，即在A中而不在B中情况1：

 select count(distinct t.col1),count(distinct t.col2)
    from `table1` e
    right join
    (
    select distinct col1,col2
    from `table2_*`
    where _table_suffix between '20180101' and '20181231'
    )t
    on e.col1=t.col1
    where date(timestamp_seconds(ts))>='2018-01-01'
    and e.col1 is null
    ;

案例2：

select count(distinct col1)
from `table2_*`
where _table_suffix between '20180101' and '20181231'
and col1 not in (
select distinct col1 from `table1`
where date(timestamp_seconds(ts))>='2018-01-01'
)

在2个代码中，case2有效，而case1给0作为输出。我还尝试了案例1，将左表反转，但结果是相同的0行。我是Big Query和标准sql版本的新手，不确定为什么会发生这种情况。

Answer 1

如果使用NOT IN，则绝对不能将NULL用作“列表中”的值

SELECT count(DISTINCT t.col1)
FROM `table2_ * ` AS t
WHERE t._table_suffix BETWEEN '20180101' AND '20181231'
 AND col1 NOT IN (
  SELECT DISTINCT e.col1
  FROM `table1` AS e
  WHERE DATE (timestamp_seconds(e.ts)) >= '2018-01-01'
   AND e.col1 IS NOT NULL
  );

我个人更喜欢使用NOT EXISTS：

SELECT count(DISTINCT t.col1)
FROM `table2_ * ` AS t
WHERE t._table_suffix BETWEEN '20180101' AND '20181231'
 AND NOT EXISTS (
  SELECT NULL
  FROM `table1` AS e
  WHERE DATE (timestamp_seconds(e.ts)) >= '2018-01-01'
   AND e.col1 = t.col1
  );

请注意，此处的子查询select子句不需要返回任何值，因此select null或select 1或select *都是有效的。使用exits或not exists时，重要的是子查询的from＆where子句。

Answer 2

在2个代码中，case2起作用，而case1给出0作为输出。

这是因为当列表中有NULL时NOT IN返回NULL。如果您不希望出现这种情况，请排除NULL值：

select count(distinct col1)
from `table2_*`
where _table_suffix between '20180101' and '20181231'
and col1 not in (
select distinct col1 from `table1`
where date(timestamp_seconds(ts))>='2018-01-01'
and col1 IS NOT NULL
)

具有相同表和联接逻辑但结果不同的两个查询

2 个答案: