从没有破折号的字符串创建UUID

时间:2013-09-24 16:08:54

标签: java string clojure uuid

如何从没有破折号的字符串创建java.util.UUID?

"5231b533ba17478798a3f2df37de2aD7" => #uuid "5231b533-ba17-4787-98a3-f2df37de2aD7"

11 个答案:

答案 0 :(得分:41)

TL;博士

java.util.UUID.fromString(
    "5231b533ba17478798a3f2df37de2aD7"
    .replaceFirst( 
        "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)", "$1-$2-$3-$4-$5" 
    )
).toString()
  

5231b533-ba17-4787-98a3-f2df37de2ad7

位,不是文本

UUID是128位值。 UUID 实际上由字母和数字组成,它由位组成。您可以将其视为描述非常大的数字。

我们可以将这些位显示为一百二十八0& 1个字符。

  

0111 0100 1101 0010 0101 0001 0101 0110   0110 0000 1110 0110 0100 0100 0100 1100   1010 0001 0111 0111 1010 1001 0110 1110   0110 0111 1110 1100 1111 1100 0101 1111

人类不容易读取位,因此为方便起见,我们通常将128位值表示为由字母和数字组成的hexadecimal字符串。

  

74d25156-60e6-444c-A177-a96e67ecfc5f

这样的十六进制字符串不是UUID本身,只是一种人性化的表示。根据UUID规范将连字符添加为规范格式,但这些连字符是可选的。

  

74d2515660e6444ca177a96e67ecfc5f

顺便说一下,UUID规范明确指出在生成十六进制字符串时必须使用小写字母,而大写应该被容忍作为输入。不幸的是,许多实现都违反了小写生成规则,包括Apple,Microsoft和其他规则。请参阅my blog post


以下是指Java,而不是Clojure。

在Java 7(及更早版本)中,您可以使用java.util.UUID类基于带有连字符作为输入的十六进制字符串来实例化UUID。例如:

java.util.UUID uuidFromHyphens = java.util.UUID.fromString("6f34f25e-0b0d-4426-8ece-a8b3f27f4b63");
System.out.println( "UUID from string with hyphens: " + uuidFromHyphens );

但是,UUID类因输入十六进制字符串而没有连字符而失败。这种失败是不幸的,因为UUID规范需要十六进制字符串表示中的连字符。这失败了:

java.util.UUID uuidFromNoHyphens = java.util.UUID.fromString("6f34f25e0b0d44268ecea8b3f27f4b63");

的正则表达式

一种解决方法是格式化十六进制字符串以添加规范连字符。这是我尝试使用正则表达式格式化十六进制字符串。注意......这段代码有效,但我不是正则表达式专家。你应该使这段代码更健壮,比如在格式化之前检查字符串的长度是32个字符,之后是36。

    // -----|  With Hyphens  |----------------------
java.util.UUID uuidFromHyphens = java.util.UUID.fromString( "6f34f25e-0b0d-4426-8ece-a8b3f27f4b63" );
System.out.println( "UUID from string with hyphens: " + uuidFromHyphens );
System.out.println();

// -----|  Without Hyphens  |----------------------
String hexStringWithoutHyphens = "6f34f25e0b0d44268ecea8b3f27f4b63";
// Use regex to format the hex string by inserting hyphens in the canonical format: 8-4-4-4-12
String hexStringWithInsertedHyphens =  hexStringWithoutHyphens.replaceFirst( "([0-9a-fA-F]{8})([0-9a-fA-F]{4})([0-9a-fA-F]{4})([0-9a-fA-F]{4})([0-9a-fA-F]+)", "$1-$2-$3-$4-$5" );
System.out.println( "hexStringWithInsertedHyphens: " + hexStringWithInsertedHyphens );
java.util.UUID myUuid = java.util.UUID.fromString( hexStringWithInsertedHyphens );
System.out.println( "myUuid: " + myUuid );

Posix表示法

您可能会发现这种替代语法更具可读性,在正则表达式中使用Posix表示法\\p{XDigit}代替[0-9a-fA-F](请参阅Pattern doc):

String hexStringWithInsertedHyphens =  hexStringWithoutHyphens.replaceFirst( "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)", "$1-$2-$3-$4-$5" );

完整的例子。

java.util.UUID uuid =
        java.util.UUID.fromString (
                "5231b533ba17478798a3f2df37de2aD7"
                        .replaceFirst (
                                "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)",
                                "$1-$2-$3-$4-$5"
                        )
        );

System.out.println ( "uuid.toString(): " + uuid );
  

uuid.toString():5231b533-ba17-4787-98a3-f2df37de2ad7

答案 1 :(得分:17)

Clojure的#uuid tagged literaljava.util.UUID/fromString的传递。并且,fromString将其拆分为“ - ”并将其转换为两个Long值。 (UUID的格式标准化为8-4-4-4-12十六进制数字,但“ - ”实际上仅用于验证和视觉识别。)

直接的解决方案是重新插入“ - ”并使用java.util.UUID/fromString

(defn uuid-from-string [data]
  (java.util.UUID/fromString
   (clojure.string/replace data
                           #"(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})"
                           "$1-$2-$3-$4-$5")))

如果您想要没有正则表达式的内容,可以使用ByteBufferDatatypeConverter

(defn uuid-from-string [data]
  (let [buffer (java.nio.ByteBuffer/wrap 
                 (javax.xml.bind.DatatypeConverter/parseHexBinary data))]
    (java.util.UUID. (.getLong buffer) (.getLong buffer))))

答案 2 :(得分:10)

你可以做一个愚蠢的正则表达式替换:

String digits = "5231b533ba17478798a3f2df37de2aD7";                         
String uuid = digits.replaceAll(                                            
    "(\\w{8})(\\w{4})(\\w{4})(\\w{4})(\\w{12})",                            
    "$1-$2-$3-$4-$5");                                                      
System.out.println(uuid); // => 5231b533-ba17-4787-98a3-f2df37de2aD7

答案 3 :(得分:6)

Regexp解决方案可能更快,但您也可以查看:)

String withoutDashes = "44e128a5-ac7a-4c9a-be4c-224b6bf81b20".replaceAll("-", "");      
BigInteger bi1 = new BigInteger(withoutDashes.substring(0, 16), 16);                
BigInteger bi2 = new BigInteger(withoutDashes.substring(16, 32), 16);
UUID uuid = new UUID(bi1.longValue(), bi2.longValue());
String withDashes = uuid.toString();

顺便说一下,从16个二进制字节转换为uuid

  InputStream is = ..binarty input..;
  byte[] bytes = IOUtils.toByteArray(is);
  ByteBuffer bb = ByteBuffer.wrap(bytes);
  UUID uuidWithDashesObj = new UUID(bb.getLong(), bb.getLong());
  String uuidWithDashes = uuidWithDashesObj.toString();

答案 4 :(得分:6)

与使用正则表达式和字符串操作相比,更快(~900%)的解决方案是将十六进制字符串解析为2个long并从中创建UUID实例:

(defn uuid-from-string
  "Converts a 32digit hex string into java.util.UUID"
  [hex]
  (java.util.UUID.
    (Long/parseUnsignedLong (subs hex 0 16) 16)
    (Long/parseUnsignedLong (subs hex 16) 16)))

答案 5 :(得分:5)

public static String addUUIDDashes(String idNoDashes) {
    StringBuffer idBuff = new StringBuffer(idNoDashes);
    idBuff.insert(20, '-');
    idBuff.insert(16, '-');
    idBuff.insert(12, '-');
    idBuff.insert(8, '-');
    return idBuff.toString();
}

也许其他人可以评论这种方法的计算效率。 (这不是我申请的问题。)

答案 6 :(得分:5)

@maerics答案的优化版本:

    String[] digitsList= {
            "daa70a7ffa904841bf9a81a67bdfdb45",
            "529737c950e6428f80c0bac104668b54",
            "5673c26e2e8f4c129906c74ec634b807",
            "dd5a5ee3a3c44e4fb53d2e947eceeda5",
            "faacc25d264d4e9498ade7a994dc612e",
            "9a1d322dc70349c996dc1d5b76b44a0a",
            "5fcfa683af5148a99c1bd900f57ea69c",
            "fd9eae8272394dfd8fd42d2bc2933579",
            "4b14d571dd4a4c9690796da318fc0c3a",
            "d0c88286f24147f4a5d38e6198ee2d18"
    };

    //Use compiled pattern to improve performance of bulk operations
    Pattern pattern = Pattern.compile("(\\w{8})(\\w{4})(\\w{4})(\\w{4})(\\w{12})");

    for (int i = 0; i < digitsList.length; i++)
    {
        String uuid = pattern.matcher(digitsList[i]).replaceAll("$1-$2-$3-$4-$5");
        System.out.println(uuid);
    }

答案 7 :(得分:2)

另一种解决方案类似于Pawel的解决方案,但没有创建新的字符串,只解决了问题。如果需要考虑性能,请避免像瘟疫那样使用正则表达式/ split / replaceAll和UUID.fromString。

String hyphenlessUuid = in.nextString();
BigInteger bigInteger = new BigInteger(hyphenlessUuid, 16);
 new UUID(bigInteger.shiftRight(64).longValue(), bigInteger.longValue());

答案 8 :(得分:1)

这是一个更快的示例,因为它不使用regexp。

public class Example1 {
    /**
     * Get a UUID with hyphens from 32 char hexadecimal.
     * 
     * @param string a hexadecimal string
     * @return a UUID string
     */
    public static String toUuidString(String string) {

        if (string == null || string.length() != 32) {
            throw new IllegalArgumentException("invalid input string!");
        }

        char[] input = string.toCharArray();
        char[] output = new char[36];

        System.arraycopy(input, 0, output, 0, 8);
        System.arraycopy(input, 8, output, 9, 4);
        System.arraycopy(input, 12, output, 14, 4);
        System.arraycopy(input, 16, output, 19, 4);
        System.arraycopy(input, 20, output, 24, 12);

        output[8] = '-';
        output[13] = '-';
        output[18] = '-';
        output[23] = '-';

        return new String(output);
    }

    public static void main(String[] args) {
        String example = "daa70a7ffa904841bf9a81a67bdfdb45";
        String canonical = toUuidString(example);
        UUID uuid = UUID.fromString(canonical);
    }
}

答案 9 :(得分:0)

我相信以下是性能最快的。它甚至比Long.parseUnsignedLong version略快。来自java-uuid-generator的代码略有改动。

 public static UUID from32(
        String id) {
    if (id == null) {
        throw new NullPointerException();
    }
    if (id.length() != 32) {
        throw new NumberFormatException("UUID has to be 32 char with no hyphens");
    }

    long lo, hi;
    lo = hi = 0;

    for (int i = 0, j = 0; i < 32; ++j) {
        int curr;
        char c = id.charAt(i);

        if (c >= '0' && c <= '9') {
            curr = (c - '0');
        }
        else if (c >= 'a' && c <= 'f') {
            curr = (c - 'a' + 10);
        }
        else if (c >= 'A' && c <= 'F') {
            curr = (c - 'A' + 10);
        }
        else {
            throw new NumberFormatException(
                    "Non-hex character at #" + i + ": '" + c + "' (value 0x" + Integer.toHexString(c) + ")");
        }
        curr = (curr << 4);

        c = id.charAt(++i);

        if (c >= '0' && c <= '9') {
            curr |= (c - '0');
        }
        else if (c >= 'a' && c <= 'f') {
            curr |= (c - 'a' + 10);
        }
        else if (c >= 'A' && c <= 'F') {
            curr |= (c - 'A' + 10);
        }
        else {
            throw new NumberFormatException(
                    "Non-hex character at #" + i + ": '" + c + "' (value 0x" + Integer.toHexString(c) + ")");
        }
        if (j < 8) {
            hi = (hi << 8) | curr;
        }
        else {
            lo = (lo << 8) | curr;
        }
        ++i;
    }
    return new UUID(hi, lo);
}

答案 10 :(得分:-1)

也许这个:

String digits = "5231b533ba17478798a3f2df37de2aD7";                     
String.format("%s%s%s%s%s%s%s%s-%s%s%s%s-%s%s%s%s-%s%s%s%s-%s%s%s%s%s%s%s%s%s%s%s%s", digits.split(""));
相关问题