从列表中随机选择值但具有字符长度限制

时间:2018-06-16 03:07:01

标签: python

我有两个字符串列表,如下所示:

test1 = ["abc", "abcdef", "abcedfhi"]

test2 = ["The", "silver", "proposes", "the", "blushing", "number", "burst", "explores", "the", "fast", "iron", "impossible"]

第二个列表更长,所以我想通过随机抽样将其下采样到第一个列表的长度。

def downsample(data):
    min_len = min(len(x) for x in data)
    return [random.sample(x, min_len) for x in data]

downsample([list1, list2])

但是,我想添加一个限制,即从第二个列表中选择的单词必须与第一个列表的长度分布相匹配。因此,对于随机选择的第一个单词,它必须与较短列表的第一个单词具有相同的长度。这里的问题是也不允许替换。

如何从test2中随机选择与test1的字符长度分布匹配的n(较短列表长度)元素? 谢谢, 千斤顶

2 个答案:

答案 0 :(得分:7)

<强> 设置

from collections import defaultdict
import random
dct = defaultdict(list)
l1 = ["abc", "abcdef", "abcedfhi"]
l2 = ["The", "silver", "proposes", "the", "blushing", "number", "burst", "explores", "the", "fast", "iron", "impossible"]

首先,使用 collections.defaultdict 创建一个密钥为字长的字典:

for word in l2:
  dct[len(word)].append(word)

# Result
defaultdict(<class 'list'>, {3: ['The', 'the', 'the'], 6: ['silver', 'number'], 8: ['proposes', 'blushing', 'explores'], 5: ['burst'], 4: ['fast', 'iron'], 10: ['impossible']})

然后,您可以使用简单的列表推导以及 random.choice 来选择与第一个列表中每个元素的长度相匹配的随机词。如果字典中的字词长度,请填写-1

final = [random.choice(dct.get(len(w), [-1])) for w in l1]

# Output
['The', 'silver', 'blushing']

根据明确的要求进行修改
如果列表2中不存在重复,则这种方法满足不允许重复的要求:

for word in l2:
    dct[len(word)].append(word)

for k in dct:
    random.shuffle(dct[k])

final = [dct[len(w)].pop() for w in l1]
# ['The', 'silver', 'proposes']

如果第二个列表中没有足够的字来完成分发,这种方法会引发 IndexError

答案 1 :(得分:1)

一种方法是在list中创建test1项的长度。然后,使用它来创建包含的其他列表 来自test2的那些长度的子列表。最后从列表列表中随机弹出(similar answer之后),以便在为样本选择后删除该项。

from random import randrange

test1 = ["abc", "abcdef", "abcedfhi"]
test2 = ["The", "silver", "proposes", "the", "blushing", "number", "burst", "explores", "the", "fast", "iron", "impossible"]

sizes = [len(i) for i in test1]
# results: [3, 6, 8]

sublists = [[item for item in test2 if len(item) == i] for i in sizes ]
# results for sublists: [['The', 'the', 'the'], ['silver', 'number'], ['proposes', 'blushing', 'explores']]

# randomly pop from the list for samples 
samples = [i.pop(randrange(len(i)))  for i in sublists]

print('Samples: ',samples)

结果:

Samples:  ['the', 'number', 'blushing']