计算连续访问次数

时间:2011-11-07 19:54:09

标签: php mysql sql

每次登录用户访问网站时,他们的数据都会被放入一个包含userId和日期的表格中(每个用户每天一行或零行):

   444631 2011-11-07
   444631 2011-11-06
   444631 2011-11-05
   444631 2011-11-04
   444631 2011-11-02
   444631 2011-11-01

当我从主用户表中提取用户数据时,我需要准备好访问连续访问次数。对于此用户,它将是4。

目前我通过主用户表中的非规范化consecutivevisits计数器执行此操作,但由于未知原因,它有时会重置..我想尝试一种仅使用上表中数据的方法

获得该数字的最佳SQL查询是什么(上例中为4)?有些用户有数百次访问,我们每天有数百万注册用户和点击量。

编辑:根据下面的评论,我发布了我目前用来执行此操作的代码;然而,它有一个问题,它有时会无缘无故地重置,它也会在周末重置所有人,很可能是因为DST的变化。

// Called every page load for logged in users
public static function OnVisit($user)
{
    $lastVisit = $user->GetLastVisit(); /* Timestamp; db server is on the same timezone as www server */
    if(!$lastVisit)
        $delta = 2;
    else
    {
        $today = date('Y/m/d');

        if(date('Y/m/d', $lastVisit) == $today)
            $delta = 0;
        else if(date('Y/m/d', $lastVisit + (24 * 60 * 60)) == $today)
            $delta = 1;
        else
            $delta = 2;
    }

    if(!$delta)
        return;

    $visits = $user->GetConsecutiveVisits();
    $userId = $user->GetId();

            /* NOTE: t_dailyvisit is the table I pasted above. The table is unused;
             * I added it only to ensure that the counter sometimes really resets
             * even if the user visits the website, and I could confirm that. */
    q_Query("INSERT IGNORE INTO `t_dailyvisit` (`user`, `date`) VALUES ($userId, CURDATE())", DB_DATABASE_COMMON);

    /* User skipped 1 or more days.. */
    if($delta > 1)
        $visits = 1;
    else if($delta == 1)
        $visits += 1;

    q_Query("UPDATE `t_user` SET `consecutivevisits` = $visits, `lastvisit` = CURDATE(), `nvotesday` = 0 WHERE `id` = $userId", DB_DATABASE_COMMON);
    $user->ForceCacheExpire();
}

2 个答案:

答案 0 :(得分:3)

我错过了mysql标签并写了这个解决方案。遗憾的是,这在MySQL中不起作用,因为它不支持窗口函数

无论如何我发布了它,因为我付出了一些努力。用PostgreSQL测试。与Oracle或SQL Server(或支持窗口函数的任何其他合适的RDBMS)的工作方式类似。

测试设置

CREATE TEMP TABLE v(id int, visit date);
INSERT INTO v VALUES
 (444631, '2011-11-07')
,(444631, '2011-11-06')
,(444631, '2011-11-05')
,(444631, '2011-11-04')
,(444631, '2011-11-02')
,(444631, '2011-11-01')
,(444632, '2011-12-02')
,(444632, '2011-12-03')
,(444632, '2011-12-05');

简易版

-- add 1 to "difference" to get number of days of the longest period
SELECT id, max(dur) + 1 as max_consecutive_days
FROM (

   -- calculate date difference of min and max in the group
   SELECT id, grp, max(visit) - min(visit) as dur
   FROM (

      -- consecutive days end up in a group
      SELECT *, sum(step) OVER (ORDER BY id, rn) AS grp
      FROM   (

         -- step up at the start of a new group of days
         SELECT id
               ,row_number() OVER w AS rn
               ,visit
               ,CASE WHEN COALESCE(visit - lag(visit) OVER w, 1) = 1
                THEN 0 ELSE 1 END AS step
         FROM   v
         WINDOW w AS (PARTITION BY id ORDER BY visit)
         ORDER  BY 1,2
         ) x
      ) y
      GROUP BY 1,2
   ) z
GROUP  BY 1
ORDER  BY 1
LIMIT  1;

输出:

   id   | max_consecutive_days
--------+----------------------
 444631 |                    4

更快/更短

我后来找到了更好的方法。 grp数字不连续(但不断上升)。没关系,因为这些只是达到目的的意思:

SELECT id, max(dur) + 1 AS max_consecutive_days
FROM (
    SELECT id, grp, max(visit) - min(visit) AS dur
    FROM (
      -- subtract an integer representing the number of day from the row_number()
      -- creates a "group number" (grp) for consecutive days
      SELECT id
            ,EXTRACT(epoch from visit)::int / 86400
           - row_number() OVER (PARTITION BY id ORDER BY visit) AS grp
            ,visit
      FROM   v
      ORDER  BY 1,2
      ) x
    GROUP BY 1,2
    ) y
GROUP  BY 1
ORDER  BY 1
LIMIT  1;

SQL Fiddle for both.

更多

答案 1 :(得分:1)

如果没有必要每天都有用户登录网站的日志,而您只想知道他登录的连续几天,我希望这样:

选择3列:LastVisit(Date),ConsecutiveDays(int)和User。

登录时,检查用户的条目,确定上次访问是否为“今天 - 1”,然后将1添加到ConsecutiveDays列并在LastVisit列中存储“今天”。如果最后一个vist大于“Today - 1”,则在ConsecutiveDays中存储1。

HTH