Question

我从服务器获得字符串格式的响应，如

V1YYZZ0x0000010x0D0x00112050x0C152031962061900x0D410240x0E152031962061900x0F410240x1021TATADOCOMOINTERNET101

然后我将其转换为字节数组，因为我需要逐字节地获取值。

我尝试使用

Arrays.copyOfRange(original,
                        from , to);

但它基于索引而不是基于字节。

我也尝试过以下解决方案，但它也会在长度基础上截断String（如果我使用string而不是byte []）。

public static String truncateWhenUTF8(String s, int maxBytes) {
    int b = 0;
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);

        // ranges from http://en.wikipedia.org/wiki/UTF-8
        int skip = 0;
        int more;
        if (c <= 0x007f) {
            more = 1;
        } else if (c <= 0x07FF) {
            more = 2;
        } else if (c <= 0xd7ff) {
            more = 3;
        } else if (c <= 0xDFFF) {
            // surrogate area, consume next char as well
            more = 4;
            skip = 1;
        } else {
            more = 3;
        }

        if (b + more > maxBytes) {
            return s.substring(0, i);
        }
        b += more;
        i += skip;
    }
    return s;
}

我知道如何以字节长度计算字符串，但它只给出字节中的完整字符串长度，如

以下是我需要以字节为基础提取数据包的方法。

enter image description here

以上代码和参数仅为示例。我需要从字符串/字节数组逐字节获取。

我搜索了很多，但没有得到我可以参考的任何解决方案或链接。我没有得到如何使用字节长度拆分字符串，因为我知道每个参数的字节长度和值。

请给我任何参考或提示。

Answer 1

确定什么等于字符串中的一个字节并不简单。您的String包含十六进制文本格式的字节：0x0D（一个字节，等于13），但也包含值作为子字符串。例如，1024可以解释为一个整数，在这种情况下适合2个字节，但也可以解释为由4个字符组成的文本，总计为8个字节。

无论如何，我会使用正则表达式拆分字符串，然后进一步将部分拆分为长度和值：

String message = "V1YYZZ0x0000010x0D0x00112050x0C152031962061900x0D41024"+
    "0x0E152031962061900x0F410240x1021TATADOCOMOINTERNET101";
String regex = "(0)(x)(\\w\\w)";
String[] parts = message.split(regex);
Log.d(TAG,"HEADER = "+parts[0]);
for (int i=1; i<parts.length; i++) {
    String s = parts[i];
    // Only process if it has length > 0
    if (s.length()>0) {
        String len = "", val = "";
        // String s is now in format LVVVV where L is the length, V is the value
        if (s.length() < 11) {
            // 1 character indicates length, up to 9 contains value
            len = s.substring(0, 1);
            val = s.substring(1);
        } else if (s.length() > 10) {
            // 2 characters indicate length, up to 99 contains value
            len = s.substring(0, 2);
            val = s.substring(2);
        } else if (s.length() > 101) {
            // 3 characters indicate length, up to 999 contains value
            len = s.substring(0, 3);
            val = s.substring(3);
        }
        Log.d(TAG, "Length: " + len + " Value: " + val);
    }
}

这会产生以下输出：

D/Activity: HEADER = V1YYZZ
D/Activity: Length: 0 Value: 001
D/Activity: Length: 1 Value: 1205
D/Activity: Length: 15 Value: 203196206190
D/Activity: Length: 4 Value: 1024
D/Activity: Length: 15 Value: 203196206190
D/Activity: Length: 4 Value: 1024
D/Activity: Length: 21 Value: TATADOCOMOINTERNET101

然后你可以查看包（不需要头文件中的前两个包），将字符串转换为你想要的任何内容（例如Integer.parseInt(val)）

如果您解释标题的结构（V1YYZZ0x0000010x0D0x0011205），我可以改进我的答案以找到邮件计数。

Answer 2

我认为可以使用Scanner

import java.util.Scanner;

public class Library {

public static void main(String[] args) {
  String s = "V1YYZZ0x0000010x0D0x001120"
      + "50x0C152031962061900x0D410240x0E152031962061900x0F410240x1"
      + "021TATADOCOMOINTERNET101";

  // Skip first 9? bytes. I'm not sure how you define them
  // so I just assumed it is 26 chars long.
  s = s.substring(26, s.length());
  System.out.println(s);
  Scanner scanner = new Scanner(s);
  // Use byte as delimiter i.e. 0xDC, 0x00
  // Maybe you should use smth like 0x[\\da-fA-F]{2}
  // And if you want to know that byte, you should use
  // just 0x and get first 2 chars later
  scanner.useDelimiter("0x\\w{2}");
  // Easily extracted
  int numberOfParams = scanner.nextInt();
  for (int i = 0; i < numberOfParams; i++) {
      String extracted = scanner.next();
      // Length of message
      int l = extracted.length();
      boolean c = getLength(l) == getLength(l - getLength(l));
      l -= getLength(l);
      l = c ? l : l-1;

      System.out.println("length=" 
              + extracted.substring(0, extracted.length()-l));
      System.out.println("message=" 
              + extracted.substring(extracted.length()-l, extracted.length()));
  }
  // close the scanner
  scanner.close();
}
// Counting digits assuming number is decimal
private static int getLength(int l) {
    int length = (int) (Math.log10(l) + 1);
    System.out.println("counted length = " + length);
    return length;
}
}

我们肯定需要有关规则的更多信息，如何形成字符串。你究竟需要做什么。这段代码可能足够好了。没有评论，它真的很简短。

Answer 3

这不是逐字节访问字节数组的答案，而是您自己发现的情况的答案。

你的解释和描述看起来很混淆你从服务器得到的是什么（例如，很难将“V1YYZZ0x0000010x0D0x001120”表示为9字节字段（注意它可能在2上结束），而不是0））。或者，您使用了错误的方法从服务器获取它，或者没有将其作为预期的数据类型。

您的代码表明您认为您获得的是UTF8字符串。您的问题中显示的数据似乎并不表示其打算采用该格式。

在做这样的事情时请记住，其他程序员必须为您看到的数据创建结构。他们必须在某处定义它，意图是它能够被预期的接收者解码。除非有其他考虑因素（安全性，最小带宽等），否则这些格式通常以易于编码和解码的方式定义。

存在多个“0x”-ASCII编码的十六进制数 - 特别是表示参数的单个字节（在图中称为“varam”） - 强烈暗示此数据旨在被解释为ASCII编码字符串。虽然情况可能并非如此，但从更大的角度来看问题时应该牢记这一点。

您必须花费太多精力来解码从服务器获取的信息。它可能应该相对容易，除非有人考虑为什么它会故意变得困难。

所有这些都表明，在您没有向我们提供任何信息的区域中存在真正的问题。

退一步： 想想像这样的事情：你是如何从服务器（什么功能/接口）接收到的？在从服务器请求信息的调用中，有一种方法可以指定编码类型是字节，ASCII字符串，还是比UTF8更容易处理的其他格式？至少，似乎很清楚数据不打算作为UTF8字符串处理。如果没有转换为UTF8，应该有一种方法可以获得它。

此外，您应该尝试找到数据格式的实际规范。您没有对源代码进行过多解释，因此您可能正在进行逆向工程并且无法访问规范。

基本上，看起来这是一个问题，可能最好退后一步，询问你是否从最容易解决的问题开始，以及你是否朝着正确的方向前进。

Answer 4

我确定我错过了一些明显的东西......

String.getBytes();

如果你想要从数组中获取已定义的对象来处理它，只需使用

进行换行

ByteBuffer.wrap();

结果如下：

String s = "OUTPUT FROM SERVER";
byte[] bytes = s.getBytes();
ByteBuffer bb = ByteBuffer.wrap(bytes);

我从最初的问题中错过了什么？：/

如何从字节数组中逐字节获取

4 个答案: