根据来自同一表的另一个记录的值更新值

时间:2020-09-26 20:48:29

标签: sql sql-update amazon-redshift

在这里,我有一个网站访问者的样本表。我们可以看到,有时访客不提供他们的电子邮件。另外,他们可能会在一段时间内切换到其他电子邮件地址。

**

  • 原始表:

** enter image description here

我要根据以下要求更新此表:

  1. 第一次访问者提供电子邮件时,他过去的所有访问都将标记为该电子邮件
  2. 他所有以后的访问都将标记为该电子邮件,直到他切换到另一封电子邮件为止。

**

  • 更新后的预期表:

** enter image description here

我想知道在Redshift或T-Sql中是否可以做到这一点?

谢谢大家!

3 个答案:

答案 0 :(得分:0)

如果我们假设表的名称为Visits,并且该表的主键由列Visitor_idActivity_Date组成,那么您可以在T-SQL中执行以下操作:

  • 使用相关子查询:
update a
set a.Email = coalesce(
  -- select the email used previously
  (
    select top 1 Email from Visits
    where Email is not null and Activity_Date < a.Activity_Date and Visitor_id = a.Visitor_id
    order by Activity_Date desc
  ),
  -- if there was no email used previously then select the email used next
  (
    select top 1 Email from Visits
    where Email is not null and Activity_Date > a.Activity_Date and Visitor_id = a.Visitor_id
    order by Activity_Date
  )
)
from Visits a
where a.Email is null;
  • 使用窗口函数提供顺序:
update v
set Email = vv.Email
from Visits v
  join (
    select
      v.Visitor_id,
      coalesce(a.Email, b.Email) as Email,
      v.Activity_Date,
      row_number() over (partition by v.Visitor_id, v.Activity_Date
                         order by a.Activity_Date desc, b.Activity_Date) as Row_num
    from Visits v
      -- previous visits with email
      left join Visits a
        on a.Visitor_id = v.Visitor_id
        and a.Email is not null
        and a.Activity_Date < v.Activity_Date
      -- next visits with email if there are no previous visits
      left join Visits b
        on b.Visitor_id = v.Visitor_id
        and b.Email is not null
        and b.Activity_Date > v.Activity_Date
        and a.Visitor_id is null
    where v.Email is null
  ) vv
    on vv.Visitor_id = v.Visitor_id
    and vv.Activity_Date = v.Activity_Date
where
  vv.Row_num = 1;

答案 1 :(得分:0)

对于每个visitor_id,您可以使用之前的非空值更新空电子邮件值。如果没有,则使用下一个非空值。您可以按以下方式获取这些值:

select 
    v.*, v_prev.email prev_email, v_next.email next_email
from
    visits v
    left join visits v_prev on v.visitor_id = v_prev.visitor_id 
        and v_prev.activity_date = (select max(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date < v.activity_date and v2.email is not null)
    left join visits v_next on v.visitor_id = v_next.visitor_id 
        and v_next.activity_date = (select min(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date > v.activity_date and v2.email is not null)
where 
    v.email is null

答案 2 :(得分:0)

在SQL Server或Redshift中,您可以使用子查询来计算电子邮件:

select t.*,
       coalesce(email,
                max(email) over (partition by visitor_id, grp),
                max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
                )
from (select t.*,
             min(case when email is not null then activity_date end) over 
                  (partition by visitor_id order by activity_date rows between unbounded preceding and current row) as first_email_date,
             count(email) over (partition by visitor_id order by activity_date between unbounded preceding and current row) as grp
      from t
     ) t;

然后您可以在更新中使用它:

更新t 设置emai = tt.imputed_email 从(选择t。 合并(电子邮件, max(电子邮件)超过(按visitor_id,grp划分), 最大(当activity_date = first_email_date然后电子邮件结束时的情况)超过(按visitor_id划分) )为imputed_email 从(选择t。, 分钟(电子邮件不为null时,activity_date结束的情况)超过
(按visitor_id顺序按activity_date进行分区)作为first_email_date, 以(grp)(按visitor_id的分区,按activity_date的顺序)计数(电子邮件) 从T )吨 )tt 其中tt.visitor_id = t.visitor_id和tt.activity_date = t.activity_date以及 t.email为空;

相关问题