获得100行,每组最多10行

时间:2015-10-09 16:29:32

标签: mysql database select

我有以下查询,我想从数据库中获取100个项目,但host_id多次出现在urls表中,我希望获得最多10个唯一每个host_id的该表中的行。

select *
from urls
join hosts using(host_id)
where
(
    last_run_date is null
    or last_run_date <= date_sub(curdate(), interval 30 day)
)
and ignore_url != 1
limit 100

所以,我想:

  • 最高结果= 100
  • 每个主机的最大行数= 10

我不确定完成此任务需要做什么。有没有子查询可以做到这一点?

主持人表

CREATE TABLE `hosts` (
    `host_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `host` VARCHAR(50) NOT NULL,
    `last_fetched` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `ignore_host` TINYINT(1) UNSIGNED NOT NULL,
    PRIMARY KEY (`host_id`),
    UNIQUE INDEX `host` (`host`)
)

网址表

CREATE TABLE `urls` (
    `url_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `parent_url_id` INT(10) UNSIGNED NOT NULL,
    `scheme` VARCHAR(5) NOT NULL,
    `host_id` INT(10) UNSIGNED NOT NULL,
    `path` VARCHAR(500) NOT NULL,
    `query` VARCHAR(500) NOT NULL,
    `date_found` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `last_run_date` DATETIME NULL DEFAULT NULL,
    `ignore_url` TINYINT(1) UNSIGNED NOT NULL,
    PRIMARY KEY (`url_id`),
    UNIQUE INDEX `host_path_query` (`host_id`, `path`, `query`)
)

1 个答案:

答案 0 :(得分:1)

多数民众赞成(我希望)

我无法测试我的真实情况。我没有数据。请测试它并给我一点ping。

SELECT *
  FROM (
    SELECT
      @nr:=IF(@lasthost = host_id, @nr+1, 1) AS nr,
      u.*,
      @lasthost:=IF(@lasthost = host_id, @lasthost, host_id) AS lasthost  
      FROM
        urls u,
         ( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
      WHERE (
            last_run_date IS NULL
            OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
      )
      AND ignore_url != 1
      ORDER BY host_id, last_run_date
  ) AS t
  LEFT JOIN HOSTS USING(host_id)
  WHERE t.nr < 11
  LIMIT 100;

确定,

<强>第一

我只选择您的查询行,并订购它 由host_id和时间

SELECT
      u.*
      FROM
        urls u
         ( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
      WHERE (
            last_run_date IS NULL
            OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
      )
      AND ignore_url != 1
      ORDER BY host_id, last_run_date

<强>第二

我添加变量 nr lasthost 并在select中设置它。现在 如果host_id发生变化,我会计算每一行并将其重置为1。所以我得到了一个 每个host_id

的行数列表,从1到n

选择   @nr:= IF(@lasthost = host_id,@ nr + 1,1)AS nr,   ü。*,   @lasthost:= IF(@lasthost = host_id,@ atomhost,host_id)AS lasthost
  从     你好,      (SELECT @ nr:= 4,@ atomhost:= - 1)AS tmp   在哪里(         last_run_date是NULL         或者last_run_date&lt; = date_sub(curdate(),INTERVAL 30天)   )   AND ignore_url!= 1   ORDER BY host_id,last_run_date

<强>第三

我把这个查询放在一个新的选择中,这样我就可以加入你的第二个表,并且只为少于11的行限制结果,并将结果限制为100

SELECT *
  FROM (
    SELECT
      @nr:=IF(@lasthost = host_id, @nr+1, 1) AS nr,
      u.*,
      @lasthost:=IF(@lasthost = host_id, @lasthost, host_id) AS lasthost  
      FROM
        urls u,
         ( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
      WHERE (
            last_run_date IS NULL
            OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
      )
      AND ignore_url != 1
      ORDER BY host_id, last_run_date
  ) AS t
  LEFT JOIN HOSTS USING(host_id)
  WHERE t.nr < 11
  LIMIT 100;

多数人