Question

Answer 1

<强>更新

我无法抗拒尝试为第一个问题提出我自己的解决方案，即使它没有进行压缩。这是一个使用名为pyecm的第三方分解算法的Python解决方案。

这个解决方案可能比Yevgeny的效率高出几个数量级。对于合理的y值，计算需要几秒而不是几小时甚至几周/年。对于x = 2 ^ 32-1和y = 256，我的核心组合1.2 ghz需要1.68秒。

>>> import time
>>> def test():
...     before = time.time()
...     print factor(2**32-1, 256)
...     print time.time()-before
...
>>> test()
[254, 232, 215, 113, 3, 15]
1.68499994278
>>> 254*232*215*113*3+15
4294967295L

以下是代码：

def factor(x, y):
    # y should be smaller than x. If x=y then {y, 1, 0} is the best solution
    assert(x > y)

    best_output = []

    # try all possible remainders from 0 to y 
    for remainder in xrange(y+1):
        output = []
        composite = x - remainder
        factors = getFactors(composite)

        # check if any factor is larger than y
        bad_remainder = False
        for n in factors.iterkeys():
            if n > y: 
                bad_remainder = True
                break
        if bad_remainder: continue

        # make the best factors
        while True:
            results = largestFactors(factors, y)
            if results == None: break
            output += [results[0]]
            factors = results[1]

        # store the best output
        output = output + [remainder]
        if len(best_output) == 0 or len(output) < len(best_output):
            best_output = output

    return best_output

# Heuristic
# The bigger the number the better. 8 is more compact than 2,2,2 etc...

# Find the most factors you can have below or equal to y
# output the number and unused factors that can be reinserted in this function
def largestFactors(factors, y):
    assert(y > 1)
    # iterate from y to 2 and see if the factors are present.
    for i in xrange(y, 1, -1):
        try_another_number = False
        factors_below_y = getFactors(i)
        for number, copies in factors_below_y.iteritems():
            if number in factors:
                if factors[number] < copies:
                    try_another_number = True
                    continue # not enough factors
            else:
                try_another_number = True
                continue # a factor is not present

        # Do we want to try another number, or was a solution found?
        if try_another_number == True:
            continue
        else:
            output = 1
            for number, copies in factors_below_y.items():
                remaining = factors[number] - copies
                if remaining > 0:
                    factors[number] = remaining
                else:
                    del factors[number]
                output *= number ** copies

            return (output, factors)

    return None # failed




# Find prime factors. You can use any formula you want for this.
# I am using elliptic curve factorization from http://sourceforge.net/projects/pyecm
import pyecm, collections, copy

getFactors_cache = {}
def getFactors(n):
    assert(n != 0)
    # attempt to retrieve from cache. Returns a copy
    try:
        return copy.copy(getFactors_cache[n])
    except KeyError:
        pass

    output = collections.defaultdict(int)
    for factor in pyecm.factors(n, False, True, 10, 1):
        output[factor] += 1

    # cache result
    getFactors_cache[n] = output

    return copy.copy(output)

回答第一个问题

你说你想要压缩数字，但是从你的例子中，这些序列比未分解的数字更长。如果没有更多细节到你遗漏的系统（序列概率/是否有可编程客户端？），就无法压缩这些数字。你能详细说明吗？

这是一个数学解释，为什么当前对问题第一部分的答案永远不会解决你的第二个问题。它与背包问题无关。

$Shannon's entropy$

这是香农的熵算法。它告诉您表示序列{X0，X1，X2，...，Xn-1，Xn}所需的理论最小位数，其中p（Xi）是看到令牌Xi的概率。

假设X0到Xn是0到4294967295（整数范围）的跨度。根据您的描述，每个数字都可能与另一个数字一样出现。因此，每个元素的概率是1/4294967296。

当我们将它插入Shannon算法时，它会告诉我们表示流所需的最小位数。

import math

def entropy():
    num = 2**32
    probability = 1./num
    return -(num) * probability * math.log(probability, 2)
    # the (num) * probability cancels out

熵不足为奇。我们需要32位来表示一个整数，其中每个数字的可能性相等。减少这个数字的唯一方法是增加某些数字的概率，并降低其他数字的概率。您应该更详细地解释该流。

回答第二个问题

执行此操作的正确方法是在与HTTP通信时使用base64。显然Java在标准库中没有这个，但是我找到了一个免费实现的链接：

http://iharder.sourceforge.net/current/java/base64/

这是“伪代码”，它在Python中完美运行，并且不应该很难转换为Java（我的Java生锈）：

def longTo64(num):
    mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
    output = ""

    # special case for 0
    if num == 0:
        return mapping[0]

    while num != 0:
        output = mapping[num % 64] + output
        num /= 64

    return output

如果您可以控制Web服务器和Web客户端，并且可以毫无问题地解析整个HTTP请求，则可以升级到base85。根据维基百科，url encoding allows for up to 85 characters。否则，您可能需要从映射中删除一些字符。

这是Python中的另一个代码示例

def longTo85(num):
    mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~!*'();:@&=+$,/?%#[]"
    output = ""
    base = len(mapping)

    # special case for 0
    if num == 0:
        return mapping[0]

    while num != 0:
        output = mapping[num % base] + output
        num /= base

    return output

这是逆操作：

def stringToLong(string):
    mapping = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_.~!*'();:@&=+$,/?%#[]"
    output = 0
    base = len(mapping)

    place = 0
    # check each digit from the lowest place
    for digit in reversed(string):
        # find the number the mapping of symbol to number, then multiply by base^place
        output += mapping.find(digit) * (base ** place)
        place += 1

    return output

这是香农算法在不同基础上的图表。 alt text

如您所见，基数越高，表示数字所需的符号越少。在base64，需要~11个符号来表示长。在base85，它变成~10个符号。

Answer 2

在最终解释后编辑：

我认为base64是最好的解决方案，因为有标准的功能可以处理它，而这个想法的变体并没有带来太大的改进。这里有其他人更详细地回答了这个问题。

关于原始问题，尽管代码有效，但不能保证在任何合理的时间内运行，正如LFSR Consulting对此问题的回答和评论。

原始答案：

你的意思是这样的吗？

编辑 - 在评论后更正。

shortest_output = {}

foreach (int R = 0; R <= X; R++) {
    // iteration over possible remainders
    // check if the rest of X can be decomposed into multipliers
    newX = X - R;
    output = {};

    while (newX > Y) {
       int i;
       for (i = Y; i > 1; i--) {
           if ( newX  % i == 0) { // found a divider
           output.append(i);
           newX  = newX /i;  
           break;
           }
       }

       if (i == 1) { // no dividers <= Y
          break;
       }
    }
    if (newX != 1) {
        // couldn't find dividers with no remainder
        output.clear();
    }
    else {
        output.append(R);
            if (output.length() < shortest_output.length()) {
                 shortest_output = output;
            }
    }
}

Answer 3

听起来好像要压缩随机数据 - 出于信息理论原因，这是不可能的。（请参阅http://www.faqs.org/faqs/compression-faq/part1/preamble.html问题9.）在数字的连接二进制表示中使用Base64并完成它。

Answer 4

你试图解决的问题（你正在处理问题的一个子集，因为你限制了y）被称为Integer Factorization并且它无法有效地完成已知算法：

在数论中，整数分解是将复合数分解为较小的非平凡除数，当它们相乘时等于原始整数。

这个问题使得许多加密函数成为可能（即使用128位密钥的RSA - 长度只有其中的一半。）维基页面包含一些很好的资源，可以帮助您按照正确的方向解决问题。 / p>

所以，你的脑筋急转弯确实是一个脑筋急转弯...如果你有效地解决它，我们可以将你的数学技能提升到高于平均水平！

Answer 5

完整故事后更新

Base64很可能是您的最佳选择。如果您需要自定义解决方案，可以尝试实施Base 65+系统。请记住，仅仅因为10000可写为“10 ^ 4”并不意味着所有内容都可写为10 ^ n，其中n是整数。不同的基本系统是编写数字的最简单方法，基数越高，数字所需的数字越少。此外，大多数框架库都包含Base64编码的算法。（你用的是什么语言？）。

进一步打包网址的一种方法是你提到的但是在Base64中。

int[] IDs;
IDs.sort() // So IDs[i] is always smaller or equal to IDs[i-1].

string url = Base64Encode(IDs[0]);

for (int i = 1; i < IDs.length; i++) {
  url += "," + Base64Encode(IDs[i-1] - IDs[i]);
}

请注意，您需要一些分隔符，因为初始ID可以任意大，两个ID之间的差异可以超过63，在这种情况下，一个Base64数字是不够的。

<强>更新

重申问题无法解决。对于Y = 64，你不能在乘数+余数中写入87681，其中每个都低于64.换句话说，你不能写任何数字87617..87681，其乘数低于64.这些数字中的每一个都有一个初级术语超过64. 87616可以用64以下的基本术语编写，但是你需要那些+ 65，所以其余的将超过64。

所以，如果这只是一个脑力激荡，那就无法解决了。除了使用乘法和余数之外，是否有某些实际目的可以通过某种方式实现？

是的，这确实应该是一个评论，但我在某些方面失去了评论的能力。：P

我相信最接近的解决方案是Yevgeny。也很容易扩展Yevgeny的解决方案以消除余数的限制，在这种情况下，它能够找到乘数小于Y且余数尽可能小的解，即使大于Y.

旧回答：

如果限制数组中的每个数字必须低于y，那么就没有解决方法。给定足够大的x和足够小的y，你将最终陷入一种不可能的境地。作为一个例子，y为2，x为12，你得到2 * 2 * 2 + 4，因为2 * 2 * 2 * 2将是16.即使你允许负数，abs（n）低于y，也就是'在上面的例子中你需要2 * 2 * 2 * 2 - 4来工作。

我认为问题是NP-Complete，即使你将问题限制在已知有最后一项小于y的答案的输入上。这听起来很像[背包问题] [1]。当然我可能错了。

编辑：

如果没有更准确的问题描述，很难解决问题，但是一个变体可以通过以下方式工作：

set current = x
将电流分解为其条款
如果其中一个术语大于y，则当前数字不能用大于y的术语来描述。从当前减少一个并从2重复。
当前数字可以用小于y表示。
计算余数
尽可能多地结合使用条款。

（Yevgeny Doctor对此有更多的准备（和工作）实施，以防止混淆我跳过了实施。）

Answer 6

OP写道：

我最初的目标是提出打包1..n大的具体方法整数（又名长）一起这样他们的字符串表示特别明显比写实际更短数。想想十倍，10 ^ 6的倍数然而，1 000 000是相同的表示的长度字符不是。

我之前一直走在那条路上，尽管学习所有的数学都很有趣，为了省时间我会指出你：http://en.wikipedia.org/wiki/Kolmogorov_complexity

简而言之，通过更改符号可以轻松压缩一些字符串：

10^9 (4 characters) = 1000000000 (10 characters)

其他人不能：

7829203478 = some random number...

这是我上面链接的文章的一个非常简单的简化，所以我建议你阅读它，而不是从表面上看我的解释。

修改如果您尝试为某些唯一数据创建RESTful URL，为什么不使用哈希，例如MD5？然后将哈希作为URL的一部分包含在内，然后根据哈希查找数据。还是我错过了一些明显的东西？

Answer 7

更新：我没有得到所有内容，因此我以更加Java风格的方式重写了整个内容。我没有想到比除数更大的素数情况。现在已修复。我留下原始代码以获得想法。

更新2：我现在以另一种方式处理大素数的情况。这样就可以获得任何一种结果。

public final class PrimeNumberException extends Exception {

    private final long primeNumber;

    public PrimeNumberException(long x) {
        primeNumber = x;
    }

    public long getPrimeNumber() {
        return primeNumber;
    }
}

public static Long[] decompose(long x, long y) {
    try {
        final ArrayList<Long> operands = new ArrayList<Long>(1000);
        final long rest = x % y;
        // Extract the rest so the reminder is divisible by y
        final long newX = x - rest;
        // Go into recursion, actually it's a tail recursion
        recDivide(newX, y, operands);            
    } catch (PrimeNumberException e) {
        // return new Long[0];
        // or do whatever you like, for example
        operands.add(e.getPrimeNumber());
    } finally {
        // Add the reminder to the array
        operands.add(rest);
        return operands.toArray(new Long[operands.size()]);
    }
}

// The recursive method
private static void recDivide(long x, long y, ArrayList<Long> operands)
    throws PrimeNumberException {
    while ((x > y) && (y != 1)) {
        if (x % y == 0) {
            final long rest = x / y;
            // Since y is a divisor add it to the list of operands
            operands.add(y);
            if (rest <= y) {
                // the rest is smaller than y, we're finished
                operands.add(rest);
            }
            // go in recursion
            x = rest;
        } else {
            // if the value x isn't divisible by y decrement y so you'll find a 
            // divisor eventually
            if (--y == 1) {
                throw new PrimeNumberException(x);
            }
        }
    }
}

原文：这里有一些我提出的递归代码。我宁愿用某种函数式语言编写它，但它在Java中是必需的。我没有费心将数字转换为整数，但这不应该那么难（是的，我很懒;）

public static Long[] decompose(long x, long y) {
    final ArrayList<Long> operands = new ArrayList<Long>();
    final long rest = x % y;
    // Extract the rest so the reminder is divisible by y
    final long newX = x - rest;
    // Go into recursion, actually it's a tail recursion
    recDivide(newX, y, operands);
    // Add the reminder to the array
    operands.add(rest);
    return operands.toArray(new Long[operands.size()]);
}

// The recursive method
private static void recDivide(long newX, long y, ArrayList<Long> operands) {
    long x = newX;
    if (x % y == 0) {
        final long rest = x / y;
        // Since y is a divisor add it to the list of operands
        operands.add(y);
        if (rest <= y) {
            // the rest is smaller than y, we're finished
            operands.add(rest);
        } else {
            // the rest can still be divided, go one level deeper in recursion
            recDivide(rest, y, operands);
        }
    } else {
        // if the value x isn't divisible by y decrement y so you'll find a divisor    
        // eventually
        recDivide(x, y-1, operands);
    }
}

Answer 8

您选择(a * b + c * d + e)的原始方法很难找到最佳解决方案，因为搜索空间很大。你可以对数字进行因式分解，但"+ e"会使事情变得复杂，因为你需要分解只那个数字，而不是紧跟在它之下。

立即考虑两种压缩方法，这两种方法都可以从数字表示中节省大于10％的空间。

64位数字的范围是（无符号）：

                         0 to
18,446,744,073,709,551,616

或（签名）：

-9,223,372,036,854,775,808 to
 9,223,372,036,854,775,807

在这两种情况下，你需要将20个字符（不带逗号）减少到更小的值。

第一个是简单地BCD-ify base64编码它的数字（实际上是一个稍微修改过的base64，因为"/"在URL中不是犹太人 - 你应该使用一个可接受的字符，如{{1 }}）。

将其转换为BCD会将两个数字（或符号和一个数字）存储到一个字节中，使您可以立即减少50％的空间（10个字节）。将其编码为64（将每3个字节转换为4个base64字符）将前9个字节转换为12个字符，将第10个字节转换为2个字符，总共14个字符 - 节省30％。

唯一更好的方法是只对base64编码二进制表示。这是更好的，因为BCD有少量浪费（每个数字只需要大约3.32位来存储[log ₂ 10]，但BCD使用4）。

使用二进制表示，我们只需要对64位数字（8字节）进行base64编码。前6个字节需要8个字符，最后2个字节需要3个字符。这是base64的11个字符，节省了45％。

如果您想要最大压缩，则有73个字符可用于URL编码：

"_"

技术上你可能编码base-73，从粗略的计算中，它仍然会占用11个字符，但是在我看来代码更复杂，这是不值得的。

当然，由于最大值，这是最大压缩。在比例的另一端（1位数），这种编码实际上导致更多数据（扩展而不是压缩）。您可以看到仅对999以上的数字开始改进，其中4位数可以转换为3个base64字符：

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789$-_.+!*'(),

Answer 9

你嫁给使用Java了吗？ Python有一整套专用于此目的的软件包。它甚至会清理编码，让您保持URL安全。

原生Python解决方案

我推荐的标准模块是base64，它将字符的任意位置转换为已清理的base64格式。您可以将它与pickle模块结合使用，该模块处理从long（实际上是任意大小）列表到压缩字符串表示的转换。

以下代码适用于任何Python的vanilla安装：

import base64
import pickle

# get some long list of numbers
a = (854183415,1270335149,228790978,1610119503,1785730631,2084495271,
    1180819741,1200564070,1594464081,1312769708,491733762,243961400,
    655643948,1950847733,492757139,1373886707,336679529,591953597,
    2007045617,1653638786)

# this gets you the url-safe string
str64 = base64.urlsafe_b64encode(pickle.dumps(a,-1))
print str64
>>> gAIoSvfN6TJKrca3S0rCEqMNSk95-F9KRxZwakqn3z58Sh3hYUZKZiePR0pRlwlfSqxGP05KAkNPHUo4jooOSixVFCdK9ZJHdEqT4F4dSvPY41FKaVIRFEq9fkgjSvEVoXdKgoaQYnRxAC4=

# this unwinds it
a64 = pickle.loads(base64.urlsafe_b64decode(str64))
print a64
>>> (854183415, 1270335149, 228790978, 1610119503, 1785730631, 2084495271, 1180819741, 1200564070, 1594464081, 1312769708, 491733762, 243961400, 655643948, 1950847733, 492757139, 1373886707, 336679529, 591953597, 2007045617, 1653638786)

希望有所帮助。使用Python可能是您从单行解决方案中获得的最接近的。

Answer 10

原始算法请求：最后一个数字的大小是否有限制（除此之外必须存储在32b int中）？（最初的要求是我能解决的问题。）

产生最短列表的是：

bool negative=(n<1)?true:false;
int j=n%y;
if(n==0 || n==1)
{
list.append(n);
return;
}
while((long64)(n-j*y)>MAX_INT && y>1) //R has to be stored in int32
{
y--;
j=n%y;
}
if(y<=1)
fail //Number has no suitable candidate factors. This shouldn't happen

int i=0;
for(;i<j;i++)
{
list.append(y);
}
list.append(n-y*j);
if(negative)
list[0]*=-1;
return;

与迄今为止给出的大多数答案相比有点简单，但是它实现了原始帖子所需的功能......这有点脏，但希望有用:)）

Answer 11

这不是模数吗？

让/为整数除法（整数），%为模数。

int result[3];

result[0] = y;
result[1] = x / y;
result[2] = x % y;

Answer 12

就像我上面的评论一样，我不确定我完全理解这个问题。但假设整数（n和给定的y），这适用于你所说的情况：

multipliers[0] = n / y;
multipliers[1] = y;
addedNumber = n % y;

Answer 13

设置x：= x / n，其中n是最大数，它小于x和y。当你得到x＆lt; = y时，这是序列中的最后一个数字。

将整数表示为一系列乘数

13 个答案:

原生Python解决方案