我有一个PostgreSQL表,其中有包含字符串数组的列。该行具有一些唯一的数组字符串,也有一些具有重复的字符串。我想从每行中删除重复的字符串(如果存在)。
我已经尝试了一些查询,但是无法实现。
以下是表格:
veh_id | vehicle_types
--------+----------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8","viper"}
7 | {"ferrariff","viper","viper","volt"}
我期望以下输出:
veh_id | vehicle_types
--------+----------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8"}
7 | {"ferrariff","viper","volt"}
答案 0 :(得分:1)
由于每行的数组都是独立的,因此使用ARRAY构造函数的简单相关子查询即可完成工作:
SELECT *, ARRAY(SELECT DISTINCT unnest (vehicle_types)) AS vehicle_types_uni
FROM vehicle;
请参阅:
请注意,NULL
被转换为空数组('{}'
)。我们需要对其进行特殊处理,但是无论如何,下面的UPDATE
都将其排除在外。
快速简单。但是 不要 使用它。您没有这么说,但是通常您想 保留数组元素的原始顺序 。您的基本样本也有同样的建议。在相关子查询中使用WITH ORDINALITY
,这会变得更加复杂:
SELECT *, ARRAY (SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
) AS vehicle_types_uni
FROM vehicle;
请参阅:
UPDATE
实际删除重复项:
UPDATE vehicle
SET vehicle_types = ARRAY (
SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
)
WHERE cardinality(vehicle_types) > 1 -- optional
AND vehicle_types <> ARRAY (
SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
); -- suppress empty updates (optional)
添加的WHERE
条件都是可选的,以提高性能。第一个是完全多余的。每个条件还排除NULL
情况。第二个禁止所有空更新。
请参阅:
如果您尝试在不保留原始顺序的情况下执行此操作,则可能会更新大多数行而不需要,只是因为顺序或元素即使没有重复也已更改。
需要Postgres 9.4或更高版本。
db <>提琴here
答案 1 :(得分:0)
我并不是说它是有效的,但是类似的事情可能会起作用:
with expanded as (
select veh_id, unnest (vehicle_types) as vehicle_type
from vehicles
)
select veh_id, array_agg (distinct vehicle_type)
from expanded
group by veh_id
如果您真的想花哨并且做一些最坏的情况O(n),则可以编写一个自定义函数:
create or replace function unique_array(input_array text[])
returns text[] as $$
DECLARE
output_array text[];
i integer;
BEGIN
output_array = array[]::text[];
for i in 1..cardinality(input_array) loop
if not (input_array[i] = any (output_array)) then
output_array := output_array || input_array[i];
end if;
end loop;
return output_array;
END;
$$
language plpgsql
用法示例:
select veh_id, unique_array(vehicle_types)
from vehicles