如何从列表中随机删除一定百分比的项目

时间:2016-09-13 13:44:40

标签: python list

我有两个相等长度的列表,一个是数据系列,另一个是时间序列。它们代表随时间测量的模拟值。

我想创建一个从两个列表中随机删除设定百分比或分数的函数。即如果我的分数是0.2,我想从两个列表中随机删除20%的项目,但它们必须是相同的项目(每个列表中的相同索引)被删除。

例如,设n = 0.2(要删除20%)

a = [0,1,2,3,4,5,6,7,8,9]
b = [0,1,4,9,16,25,36,49,64,81]

随机删除20%后,它们变为

a_new = [0,1,3,4,5,6,8,9]
b_new = [0,1,9,16,25,36,64,81]

这种关系并不像示例那么简单,所以我不能只在一个列表上执行此操作,然后计算出第二个;它们已经存在为两个列表。他们必须保持原来的顺序。

谢谢!

6 个答案:

答案 0 :(得分:7)

import random

a = [0,1,2,3,4,5,6,7,8,9]
b = [0,1,4,9,16,25,36,49,64,81]

frac = 0.2  # how much of a/b do you want to exclude

# generate a list of indices to exclude. Turn in into a set for O(1) lookup time
inds = set(random.sample(list(range(len(a))), int(frac*len(a))))

# use `enumerate` to get list indices as well as elements. 
# Filter by index, but take only the elements
new_a = [n for i,n in enumerate(a) if i not in inds]
new_b = [n for i,n in enumerate(b) if i not in inds]

答案 1 :(得分:1)

from random import randint as r

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

percentage = 0.3

g = (r(0, len(a)-1) for _ in xrange(int(len(a) * (1-percentage))))

c, d = [], []
for i in g:
    c.append(a[i])
    d.append(b[i])

a, b = c, d

print a
print b

答案 2 :(得分:0)

如果ab不是很大,您就可以使用zip

import random

a = [0,1,2,3,4,5,6,7,8,9]
b = [0,1,4,9,16,25,36,49,64,81]

frac = 0.2  # how much of a/b do you want to exclude
ab = list(zip(a,b))  # a list of tuples where the first element is from `a` and the second is from `b`

new_ab = random.sample(ab, int(len(a)*(1-frac)))  # sample those tuples
new_a, new_b = zip(*new_ab)  # unzip the tuples to get `a` and `b` back

请注意,这不会保留ab

的原始顺序

答案 3 :(得分:0)

您还可以操作压缩 a和b序列,获取索引的随机样本(以维护项目的原始顺序)并将解压缩转换为{{1再次和a_new

b_new

可以打印:

import random


a = [0,1,2,3,4,5,6,7,8,9]
b = [0,1,4,9,16,25,36,49,64,81]

frac = 0.2

c = zip(a, b)  # c = list(zip(a, b)) on Python 3
indices = random.sample(range(len(c)), frac * len(c))
a_new, b_new = zip(*sorted(c[i] for i in sorted(indices)))

print(a_new)
print(b_new)

答案 4 :(得分:0)

<canvas width="500" height="300" style="border: 1px solid"></canvas>

答案 5 :(得分:0)

l = len(a)
n_drop = int(l * n)
n_keep = l - n_drop
ind = [1] * n_keep + [0] * n_drop
random.shuffle(ind)
new_a = [ e for e, i in zip(a, ind) if i ]
new_b = [ e for e, i in zip(b, ind) if i ]