建立没有重复集的元组

时间:2019-03-07 04:42:01

标签: python performance set tuples

使用Python,我想执行以下操作……构建一组元组。但是,我只想将一个集合添加到元组中(如果该元组中不存在该集合)。每套都是一对。我使用集合是因为​​对的顺序无关紧要。我正在使用元组,因为我要处理1.5行以上的数据,并且元组的搜索比列表快。我相信我仍然需要做一些列表理解,但这是我的问题之一。我的第一个问题是我的代码已损坏,我该如何解决?我的第二个问题是,如何提高代码的效率?

我已简化了本示例,仅提供了基础知识。每个新集合都会从数据源接收并通过处理。

my_tuple = ({"a", "b"}, {"c", "d"}, {"c", "e"})  # Existing tuple

new_set = {"b", "c"} # Get a set from data source

set_exists = any(new_set in a_set for a_set in my_tuple)
if not set_exists:
    my_tuple += (new_set,)

print(my_tuple)

({'a', 'b'}, {'c', 'd'}, {'c', 'e'}, {'b', 'c'})

那很好。该集合不在元组中。

new_set = {"b", "a"} # Get a set from data source

set_exists = any(new_set in a_set for a_set in my_tuple)
if not set_exists:
    my_tuple += (new_set,)

print(my_tuple)

({'a', 'b'}, {'c', 'd'}, {'c', 'e'}, {'b', 'c'}, {'a', 'b'})

不好。该集合已经存在于元组中。它不应该被添加。

非常感谢您的帮助。

2 个答案:

答案 0 :(得分:3)

您应该检查的条件比您想象的要容易得多

set_exists = new_set in my_tuple

您的代码应与此一起使用。

无论如何,附加到tuple上的是 slow ;如果您正在寻找性能,那么您的方法肯定不是最好的。一种改进是使用list,它具有非常快的附加操作,但是像tuple一样,成员资格测试也很慢。实际上,与您的想法相反,listtuple在搜索时实际上同样慢。

解决方案是使用set中的frozensets

my_tuple = ({"a", "b"}, {"c", "d"}, {"c", "e"})

# convert to set, it's way faster!
# (this is a one-time operation, if possible, have your data in this format beforehand)
my_set = set(frozenset(s) for s in my_tuple)

# Again, if possible, get your data in the form of a frozenset so conversion is not needed
new_set = frozenset(("b", "c"))

if new_set not in my_set: # very fast!
    my_set.add(new_set)

new_set = frozenset(("a", "b"))

my_set.add(new_set) # the check is actually unneeded for sets

print(my_set)

速度演示:

l = list(range(10 ** 6))
t = tuple(range(10 ** 6))
s = set(range(10 ** 6))

# Appending to tuple is slow!
%timeit global t; t += (1,)
11.4 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Appending to list is fast!
%timeit l.append(1)
107 ns ± 6.43 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

# List and tuple membership tests are slow!
%timeit 500000 in l
5.9 ms ± 83.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit 500000 in t
6.62 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# These operations are trivial for sets...
%timeit 500000 in s
73 ns ± 6.91 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

答案 1 :(得分:1)

您应该只使用一组集合,frozenset确切地说是因为集合不是可哈希的类型:

my_set = {frozenset(["a", "b"]), frozenset(["c", "d"]), frozenset(["c", "e"])}
my_set.add(frozenset(["b", "a"]))
print(my_set)
# >>> set([frozenset(['c', 'e']), frozenset(['a', 'b']), frozenset(['c', 'd'])])
my_set.add(frozenset(["b", "z"]))
print(my_set)
# >>> set([frozenset(['c', 'e']), frozenset(['a', 'b']), frozenset(['b', 'z']), frozenset(['c', 'd'])])
相关问题