Question

表格

dt %>% group_by(Customer) %>% mutate(Date = as.Date(Date[1]) + months(seq(0, length.out = n(), by = 3)))有25万行（ip范围通过从from_ip升序插入）
Customer count Date <chr> <dbl> <date> 1 a 3 2019-02-01 2 a 3 2019-05-01 3 a 3 2019-08-01 4 a 4 2019-11-01 5 a 4 2020-02-01 6 a 4 2020-05-01 7 a 4 2020-08-01 8 c 5 2019-10-01 9 c 5 2020-01-01 10 c 5 2020-04-01 11 c 5 2020-07-01 12 c 5 2020-10-01有50行

明显缓慢（2.687秒）：

  Customer count Date      
   <chr>    <dbl> <date>    
 1 a            3 2019-02-01
 2 a            3 2019-05-01
 3 a            3 2019-08-01
 4 a            4 2019-05-01
 5 a            4 2019-08-01
 6 a            4 2019-11-01
 7 a            4 2020-02-01
 8 c            5 2019-10-01
 9 c            5 2020-01-01
10 c            5 2020-04-01
11 c            5 2020-07-01
12 c            5 2020-10-01

这本身很快（0.031 s）：

ip2country

因此，本质上，问题归结为能够在联接表中使用LIMIT。能做到这一点吗？它会是什么样？（MySQL 5.7.24）

Answer 1

这是一个类似的例子：

我有一个具有100个IP（32位整数）的表和一个具有1M IP范围的表。（请参见下面的架构和示例数据。）

以下查询与您的查询类似：

select *
from ips i join ip_ranges r
  on i.ip between r.ip_from and r.ip_to

返回100个具有相应范围的IP需要9.6秒。每个IP为100毫秒。如果我只搜索一个IP

select *
from ip_ranges r
where 555555555 between ip_from and ip_to

大约需要100毫秒（如预期）。请注意，对于IP = 1，我将在“零”时间内得到结果，但对于IP = 999,999,999，将等待200毫秒。所以平均是100毫秒。

添加LIMIT 1在这里无济于事。但是结合ORDER BY ip_from DESC我可以在“零时间”获得结果。

现在我可以尝试在子查询中为每个IP运行一个LIMIT 1：

select i.ip
, (
    select ip_from
    from ip_ranges r
    where i.ip between r.ip_from and r.ip_to
    order by r.ip_from desc
    limit 1
) as ip_from
from ips i

但是MySQL（在我的例子中为5.6）在这里做得很差，执行需要13秒。

因此，我们所能做的就是获取所有IP，并对每个IP执行一个查询。这至少要快10秒钟。

另一种方法是生成每个IP带有一个子查询的UNION ALL查询。您可以在应用程序中执行此操作，也可以直接在SQL中使用动态准备好的语句来完成此操作：

set @subquery = '(
    select {ip} as ip, r.*
    from ip_ranges r
    where {ip} between ip_from and ip_to
    order by ip_from desc
    limit 1
)';

set session group_concat_max_len = 1000000000;

set @sql = (select group_concat(replace(@subquery, '{ip}', ip) separator 'union all') from ips);

prepare stmt from @sql;
execute stmt;

此查询的执行时间不到1毫秒。

测试模式和数据

create table ips(
    ip int unsigned primary key
);

insert into ips(ip)
    select floor(rand(1) * pow(10, 9))
    from seq1m s
    limit 100
;


create table ip_ranges(
    ip_from int unsigned not null,
    ip_to   int unsigned not null,
    primary key (ip_from, ip_to)
);

insert into ip_ranges
    select (s.seq - 1) * 1000 as ip_from
         , s.seq * 1000 - 1   as ip_to
    from seq1m s
    limit 1000000
;

seq1m是具有1M序列号的表。您可以使用

创建它

create table seq1m (seq int auto_increment primary key);
insert into seq1m (seq)
    select null
    from information_schema.COLUMNS a
       , information_schema.COLUMNS b
    limit 1000000;

在联接表中限制1？

表格

1 个答案:

测试模式和数据