读入并将原始二进制数据转换为R

时间:2017-08-29 12:06:47

标签: r binary hex hexdump

我有一个二进制文件,其中包含编码为不同长度(大多数为2/4字节)的有符号或无符号整数的数值。为了处理这些数据,我使用raw将文件的所需部分读作readBin()向量,然后尝试将其转换为十进制。问题是,R的内置函数有限制,我不完全理解(例如没有长期未签名的ints) - 请参阅下面的示例。

如何从原始数据中读取自定义长度的未签名int?是否有比下面指定的更合适和更优雅的方法?

require(dplyr)

###############################################################################
# create examplary raw vector of 24 bytes
set.seed(1)
raw <- sample(0:0xff, 24, T) %>% as.raw %>% print


###############################################################################
# approach with readBin() - not working
# read 2-byte unsigned integers left-to-right, not an issue
readBin(raw, size = 2, n = length(raw) / 2, integer(), endian = 'big', signed = FALSE)

# read 4-byte signed integers left-to-right, it's ok
readBin(raw, size = 4, n = length(raw) / 4, integer(), endian = 'big', signed = TRUE)

# first issue: readBin can't read-in 4-byte unsigned integers
readBin(raw, size = 4, n = length(raw) / 4, integer(), endian = 'big', signed = FALSE)

# second issue: readBin can't read-in custom-size integers
readBin(raw[1:3], size = 3, n = length(raw) / 3, integer(), endian = 'big')

###############################################################################
# approach with rawToBits() and packBits() - does not work either
# packBits() also treats an integer as signed
raw[1:2] %>% rawToBits %>% packBits('integer')
# and expects a length of 32 bits
raw[1:2] %>% rawToBits %>% packBits('integer')

###############################################################################
# manual approach - working
# please note this requires reversing order of raw vector, 
#   as rawToBits() places the most significant bit to the right
# this approach correctly converts the 32-bit unsigned int to decimal
#   but would be difficult to vectorize for multiple ints
#   (I guess summing must be done in loops)
raw[4:1] %>% rawToBits %>% as.logical %>% which %>% {2^(. - 1)} %>% sum

0 个答案:

没有答案