从R脚本插入SQL Server VARBINARY列

时间:2016-09-01 11:30:13

标签: sql sql-server r tsql binary-data

我有一个plots表,其列包含plot,用于存储图像文件的二进制数据。我正在运行一个T-SQL查询,该查询调用R脚本并获取要插入的数据的数据帧。数据框如下所示:

    plot     name  date_from    date_to
1 ABCDEF  plot1   2016-08-25   2016-08-31
2 AAAAAA  plot2   2016-08-25   2016-08-31

如您所见,绘图列已包含原始数据。

为了澄清,我想要做的是在数据库中插入两行数据框中的数据(数据框列名称与数据库列匹配)。

我遇到的问题

INSERT INTO dbo.plots
EXECUTE sp_execute_external_script
    @language = N'R'
    ,@script = N'source("path/to/r/script.R")'
    ,@output_data_1_name = N'output_dataset'

是"不允许从数据类型nvarchar(max)到varbinary(max)的隐式转换。使用CONVERT函数运行此查询"。

但是我不确定如何纠正这个错误。我在哪里放置CONVERT功能?或者还有其他方式吗?

2 个答案:

答案 0 :(得分:2)

对于SQL Server R服务,字符类型映射到VARCHAR,原始类型映射到VARBINARY(请参阅Working with R Data Types)。要将数据存储为VARBINARY,十六进制字符串必须转换为原始字节,可以在R或SQL中完成。下面是一个使用临时表在SQL中完成转换的示例(受scsimon的评论启发)

import java.io.UnsupportedEncodingException;
import java.nio.charset.StandardCharsets;
import java.security.GeneralSecurityException;
import java.security.spec.KeySpec;
import java.util.Arrays;
import java.util.Base64;

import javax.crypto.Cipher;
import javax.crypto.SecretKey;
import javax.crypto.SecretKeyFactory;
import javax.crypto.spec.IvParameterSpec;
import javax.crypto.spec.PBEKeySpec;
import javax.crypto.spec.SecretKeySpec;

class SO39257791
{

  private static final int KEY_LEN = 256 / 8, BLOCK_LEN = 16, ITERATIONS = 1000;

  public static void main(String... argv)
    throws Exception
  {
    String value = "YourId|YourFacId";
    String key = "6JxI1HOSg7KQj4fJ1Xb3L1T6AVdLZLBAPFSqOjh2UoA=";
    String salt = "FPSJxiSMpAavjKqyGvVe1A==";
    String good = "Y5w4A3pDZwTcq+FGyqUMO/mZSr6hSst8qiac9zDbfso9FQQbdTDsKnkKDT7SHl4y";

    String output = encrypt(value, key, salt);
    if (output.equals(good))
      System.out.println("strings are equal");
    else
      System.out.println("strings are NOT equal!");
  }

  static final String encrypt(String value, String key, String salt)
    throws GeneralSecurityException, UnsupportedEncodingException
  {
    /* Derive the key, given password and salt. */
    byte[] s = salt.getBytes(StandardCharsets.UTF_16LE);
    int dkLen = (KEY_LEN + BLOCK_LEN) * 8;
    KeySpec spec = new PBEKeySpec(key.toCharArray(), s, ITERATIONS, dkLen);
    SecretKeyFactory factory = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA1");
    byte[] dk = factory.generateSecret(spec).getEncoded();
    SecretKey secret = new SecretKeySpec(Arrays.copyOfRange(dk, 0, KEY_LEN), "AES");
    byte[] iv = Arrays.copyOfRange(dk, KEY_LEN, KEY_LEN + BLOCK_LEN);

    /* Encrypt the message. */
    Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
    cipher.init(Cipher.ENCRYPT_MODE, secret, new IvParameterSpec(iv));
    byte[] plaintext = value.getBytes("UnicodeLittle"); /* Use Byte Order Mark */
    byte[] ciphertext = cipher.doFinal(plaintext);

    return Base64.getEncoder().encodeToString(ciphertext);
  }

}

答案 1 :(得分:0)

不幸的是,我的SQL Server版本没有完成你所做的所有酷R的事情。所以我能提供的最好的是一个R脚本,可以成功地将二进制数据导入到表中,并希望您能够进行必要的调整。

我在SQL Server上使用一个定义为

的表
CREATE TABLE [dbo].[InsertFile](
    [OID] [int] IDENTITY(1,1) NOT NULL,
    [filename] [varchar](50) NULL,
    [filedata] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

我的R脚本是

library(RODBCext)
library(magrittr)

# My example just grabs all the text files out of a directory,
# but as long as you have the full filename, this will work.

file_name <- list.files([directory_to_files],
                        pattern = "[.]txt$",
                        full.names = TRUE)

file_content <- 
  vapply(
    file_name,
    function(x)
    {
      # read the binary data from the file
      readBin(x,
              what = "raw",
              n = file.info(x)[["size"]]) %>%
        # convert the binary data to a character string suitable for import
        as.character() %>%
        paste(collapse = "")
    },
    character(1)
  )

channel <- odbcConnect(...) # Create your connection object here

sqlExecute(
  channel = channel,
  query = paste0("INSERT INTO dbo.InsertFile ",
                 "(filename, filedata) ",
                 "VALUES ",
                 "(?, ?)"),
  data = list(filename = basename(file_name),
              filedata = file_content)
)

执行该脚本后,我的dbo.InsertFile表格中为file_name中的每个文件都添加了一个新行。