问题

Question

我有一个波形对象，定义如下：

class wfm:
    """Class defining a waveform characterized by:
        - A name
        - An electrode configuration
        - An amplitude (mA)
        - A pulse width (microseconds)"""

    def __init__(self, name, config, amp, width=300):
        self.name = name
        self.config = config
        self.amp = amp
        self.width = width

    def __eq__(self, other):
        return type(other) is self.__class__ and other.name == self.name and other.config == self.config and other.amp == self.amp and other.width == self.width

    def __ne__(self, other):
        return not self.__eq__(other)

通过解析，我得到一个名为波形的列表，其中包含770个wfm实例。有很多重复，我需要删除它们。

我的想法是获取等效对象的ID，将最大的ID存储在列表中，然后在弹出每个副本时从最后循环所有波形。

代码：

duplicate_ID = []
for i in range(len(waveforms)):
    for j in range(i+1, len(waveforms)):
        if waveforms[i] == waveforms[j]:
            duplicate_ID.append(waveforms.index(waveforms[j]))
            print ('{} eq {}'.format(i, j))

duplicate_ID = list(set(duplicate_ID)) # If I don't do that; 17k IDs

原来（对于印刷品）我有没有出现在ID列表中的副本，例如750是763的副本（打印说它;测试也是）但是这两个ID中没有一个出现在我的重复清单。

我很确定这个方法（它还没有工作）有更好的解决方案，我很乐意听到它。谢谢你的帮助！

编辑：更复杂的情况

我有一个更复杂的场景。我得到了2个课程，wfm（见上文）和刺激：

class stim:
    """Class defining the waveform used for a stimultion by:
        - Duration (milliseconds)
        - Frequence Hz)
        - Pattern of waveforms"""

    def __init__(self, dur, f, pattern):
        self.duration = dur
        self.fq = f
        self.pattern = pattern

    def __eq__(self, other):
        return type(other) is self.__class__ and other.duration == self.duration and other.fq == self.fq and other.pattern == self.pattern

    def __ne__(self, other):
        return not self.__eq__(other)

我解析我的文件以填写dict：范例。它看起来像是：

paradigm[file name STR] = (list of waveforms, list of stimulations)

# example:
paradigm('myfile.xml') = ([wfm1, ..., wfm10], [stim1, ..., stim5])

再次，我想删除重复项，即我只想保留数据：

波形是相同的
和刺激是一样的

示例：

file1 has 10 waveforms and file2 has the same 10 waveforms.
file1 has stim1 and stim2 ; file2 has stim3, sitm 4 and stim 5.

stim1 and stim3 are the same; so since the waveforms are also the same, I want to keep:
file1: 10 waveforms and stim1 and stim2
file2: 10 waveforms and stim 4 and stim5

这种相关性在我脑海中有点混乱，所以我遇到了一些困难，为波形和刺激寻找合适的存储解决方案，以便轻松地进行比较。如果您有任何想法，我会很高兴听到它。谢谢！

Answer 1

问题

.index方法使用您重载的.__eq__方法。所以

waveforms.index(waveforms[j])

将始终在列表中找到波形的第一个实例，其中包含与waveforms[j]相同的属性。

w1 = wfm('a', {'test_param': 4}, 3, 2.0)
w2 = wfm('b', {'test_param': 4}, 3, 2.0)
w3 = wfm('a', {'test_param': 4}, 3, 2.0)

w1 == w3  # True
w2 == w3  # False

waveforms = [w1, w2, w3]
waveforms.index(waveforms[2]) == waveforms.index(waveforms[0]) == 0  # True

解决方案

不可变

如果您不可改变地执行此操作，则无需存储列表索引：

key = lambda w: hash(str(vars(w)))
dupes = set()
unique = [dupes.add(key(w)) or w for w in waveforms if key(w) not in dupes]

unique == [w1, w2]  # True

可变的

key = lambda w: hash(str(vars(w)))
seen = set()
idxs = [i if key(w) in seen else seen.add(key(w)) for i, w in enumerate(waveforms)]

for idx in filter(None, idxs[::-1]):
    waveforms.pop(idx)

waveforms == [w1, w2]  # True

大O分析

在编写算法时考虑大O复杂度是一个好习惯（尽管优化应该仅在需要时以可读性为代价）。在这种情况下，这些解决方案更具可读性，也是最优化的。

由于双循环，您的初始解是O（n ^ 2）。

提供的两种解决方案都是O（n）。

删除列表中

1 个答案:

问题

解决方案

不可变

可变的

大O分析