每组选择N个随机记录

时间:2011-02-06 14:14:12

标签: mysql random

对所有人来说,你好周日。 我需要从每个组中选择N个随机记录。

从Quassnoi的查询开始

http://explainextended.com/2009/03/01/selecting-random-rows/

选择X随机记录我写了这个商店程序

delimiter //
drop procedure if exists casualiPerGruppo //
create procedure casualiPerGruppo(in tabella varchar(50),in campo varchar(50),in numPerGruppo int)
comment 'Selezione di N record casuali per gruppo'
begin
declare elenco_campi varchar(255);
declare valore int;
declare finite int default 0;
declare query1 varchar(250);
declare query2 varchar(250);
declare query3 varchar(250);
declare query4 varchar(250);
declare cur_gruppi cursor for select gruppo from tmp_view;
declare continue handler for not found set finite = 1;

drop table if exists tmp_casuali;
set @query1 = concat('create temporary table tmp_casuali like ', tabella);
prepare stmt from @query1;
execute stmt;
deallocate prepare stmt;

set @query2 = concat('create or replace view tmp_view as select ',campo,' as gruppo from ',tabella,' group by ',campo);
prepare stmt from @query2;
execute stmt;
deallocate prepare stmt;

open cur_gruppi;
mio_loop:loop
fetch cur_gruppi into valore;
    if finite = 1 then
        leave mio_loop;
    end if;

set @query3 = concat("select group_concat(column_name) into @elenco_campi
              from information_schema.columns
                      where table_name = '",tabella,"' and table_schema = database()");
prepare stmt from @query3;
execute stmt;
deallocate prepare stmt;

set @query4 = concat('insert into tmp_casuali select ',
             @elenco_campi,' from (
                     select  @cnt := count(*) + 1,
                     @lim :=', numPerGruppo,
                         ' from ',tabella,
                     ' where ',campo,' = ', valore,
                     ' ) vars
                     straight_join
                    (
                    select  r.*,
                    @lim := @lim - 1
                    from ', tabella, ' r
                    where   (@cnt := @cnt - 1)
                    and rand() < @lim / @cnt and ', campo, ' = ', valore ,
                    ') i');

prepare stmt from @query4;
execute stmt;
deallocate prepare stmt;

end loop;
close cur_gruppi;
select * from tmp_casuali;
end //
delimiter ;
我记得以这种方式给你一个想法:

create table prova (
id int not null auto_increment primary key,
id_gruppo int,
altro varchar(10)
) engine = myisam;


insert into prova (id_gruppo,altro) values 
(1,'aaa'),(2,'bbb'),(3,'ccc'),(1,'ddd'),(1,'eee'),(2,'fff'),
(2,'ggg'),(2,'hhh'),(3,'iii'),(3,'jjj'),(3,'kkk'),(1,'lll'),(4,'mmm');

call casualiPerGruppo('prova','id_gruppo',2);

我的问题是Quassnoi查询,即使性能非常高,在大型recorset上也需要1秒钟。因此,如果我在我的sp中多次应用它,总时间会增加很多。

你能建议我一个更好的方法来解决我的问题吗? 提前致谢

EDIT。

create table `prova` (
  `id` int(11) not null auto_increment,
  `id_gruppo` int(11) default null,
  `prog` int(11) default null,
  primary key (`id`)
) engine=myisam charset=latin1;

delimiter //
drop procedure if exists inserisci //
create procedure inserisci(in quanti int)
begin
declare i int default 0;
while i < quanti do
insert into prova (id_gruppo,prog) values (
                        (floor(1 + (rand() * 100))),
                        (floor(1 + (rand() * 30)))
                       );
set i = i + 1;
end while;
end //

delimiter ;

call inserisci(1000000);

@Clodoaldo: 我的存储过程

call casualipergruppo('prova','id_gruppo',2);

给了我200条记录,大约需要23秒。您的存储过程一直给我错误代码:1473即使我将varchar值增加到20000,也会选择嵌套太高的级别。我不知道查询中涉及的联合是否有任何限制。

2 个答案:

答案 0 :(得分:2)

我从程序中删除了tabella和campo参数,以便更容易理解。我相信你可以带回来。

delimiter //
drop procedure if exists casualiPerGruppo //
create procedure casualiPerGruppo(in numPerGruppo int)
begin
declare valore int;
declare finite int default 0;
declare query_part varchar(200);
declare query_union varchar(2000);
declare cur_gruppi cursor for select distinct id_gruppo from prova;
declare continue handler for not found set finite = 1;

create temporary table resultset (id int, id_gruppo int, altro varchar(10));

set @query_part = 'select id, id_gruppo, altro from (select id, id_gruppo, altro from prova where id_gruppo = @id_gruppo order by rand() limit @numPerGruppo) ss@id_gruppo';
set @query_part = replace(@query_part, '@numPerGruppo', numPerGruppo);
set @query_union = '';

open cur_gruppi;
mio_loop:loop
fetch cur_gruppi into valore;
    if finite = 1 then
        leave mio_loop;
    end if;

set @query_union = concat(@query_union, concat(' union ', @query_part));
set @query_union = replace(@query_union, '@id_gruppo', valore);

end loop;
close cur_gruppi;

set @query_union = substr(@query_union, 8);
set @query_union = concat('insert into resultset ', @query_union);

prepare stmt from @query_union;
execute stmt;
deallocate prepare stmt;
select * from resultset order by id_gruppo, altro;
drop table resultset;

end //
delimiter ;

答案 1 :(得分:1)

哇。这是一种非常简单的复杂方法。试试这个:

假设你有连续的id(否则你就没有行)。

create view random_prova as
select * from prova
where id = (select min(id) from prova) + 
    floor(RAND(0) * (select max(id) - min(id) from prova));

这将为您提供1个随机行。

要获取多行,请在存储过程或服务器程序中循环,直到获得足够的行,或以编程方式创建使用union的查询。 例如,这将为您提供3个随机行:

select * from random_prova
union
select * from random_prova
union
select * from random_prova;

请注意,使用RAND(0)而不是RAND()意味着为每次调用获取不同的随机数。 RAND()将为一个语句中的每个调用赋予相同的值(因此使用带有union的RAND()将不会为您提供多行)。

使用union有一些缺点 - 有可能两次偶然获得同一行。以编程方式调用此方法直到获得足够的行为止更安全。

为了提供更好的性能,请使用类似java的东西随机选择id以进行简单查询,例如

select * from prova where id in (...)

并让java(或perl或其他)用随机id填充列表 - 你可以避免每次都必须得到id范围的低效率。

如果您的ID不是连续的,则发布 - 有一种有效的方法,但我的解释很长。