Question

在你认为它是重复的之前（有很多问题要求如何在不破坏单词的情况下分割长字符串）请记住我的问题有点不同：顺序并不重要我应该适应这些单词为了尽可能多地使用每一行。

我有一组无序的单词，我希望在不使用超过253个字符的情况下将它们组合起来。

def compose(words):
    result = " ".join(words)
    if len(result) > 253:
        pass # this should not happen!
    return result

我的问题是我想尽可能地填补这条线。例如：

words = "a bc def ghil mno pq r st uv"
limit = 5 # max 5 characters

# This is good because it's the shortest possible list,
#   but I don't know how could I get it
# Note: order is not important
good = ["a def", "bc pq", "ghil", "mno r", "st uv"]

# This is bad because len(bad) > len(good)
#   even if the limit of 5 characters is respected
# This is equivalent to:
#   bad  = ["a bc", "def", "ghil", "mno", "pq r", "st uv"]
import textwrap
bad = textwrap.wrap(words, limit)

我该怎么办？

Answer 1

这是bin packing problem;解决方案是NP难的，尽管存在非最优启发式算法，主要是先适合递减和最佳拟合递减。有关实施，请参阅https://github.com/type/Bin-Packing。

Answer 2

非最佳离线快速1D bin打包Python算法

def binPackingFast(words, limit, sep=" "):
    if max(map(len, words)) > limit:
        raise ValueError("limit is too small")
    words.sort(key=len, reverse=True)
    res, part, others = [], words[0], words[1:]
    for word in others:
        if len(sep)+len(word) > limit-len(part):
            res.append(part)
            part = word
        else:
            part += sep+word
    if part:
        res.append(part)
    return res

<强>性能

在/usr/share/dict/words上测试（由words-3.0-20.fc18.noarch提供）它在我的慢速双核笔记本电脑上可以在一秒钟内完成50万字，使用这些参数效率至少达到90％：

limit = max(map(len, words))
sep = ""

使用limit *= 1.5我得到92％，limit *= 2得到96％（执行时间相同）。

最佳（理论）值使用：math.ceil(len(sep.join(words))/limit)

计算

无法保证有效的bin-packing算法做得更好

来源：http://mathworld.wolfram.com/Bin-PackingProblem.html

故事的道德

虽然找到最佳解决方案很有意思，但我认为在大多数情况下，将此算法用于一维离线装箱问题会更好。

<强>资源

备注

我没有将textwrap用于我的实现，因为它比我的简单Python代码慢。也许它与：Why are textwrap.wrap() and textwrap.fill() so slow?
有关
即使排序没有逆转，它似乎也能完美运作。

拆分长串而不会破坏满足线条的单词

2 个答案: