Question

我正在获取以Base 64编码的PDF内容。我试图使用带有处理器Base64EncodeContent的NIFI对其进行解码。我通过邮件发送的解码文件。下面是邮件输出的小样本。

“没有数据可用。¹检查是否已发送。。所有文档均以pdf格式发送至* 9：'³：> <âam¬'²@％é‚ÇŽÇ¢|ÀÈ™$ÉØ²§Uû ÷LÒTB¨l，îåù$$6N¬JCäŒÃ°‰_Ïg-æ¿;ž‰ìÛÖYl`õ？èÓÌ[ÿÿPK“

如何提取第三方发送的PDF数据？

我尝试使用JAVA代码对其进行解码，并且也失败了。无法打开PDF，其中也有垃圾字符。

下面使用的

ConvertedJPGPDF.pdf文件包含Base64编码的String。

    String filePath = "C:\\Users\\xyz\\Desktop\\";
    String originalFileName = "ConvertedJPGPDF.pdf";
    String newFileName = "test.pdf";

    byte[] input_file = 
    Files.readAllBytes(Paths.get(filePath+originalFileName));


   // byte[] decodedBytes = Base64.getDecoder().decode(input_file);
    byte[] decodedBytes1 = Base64.getMimeDecoder().decode(input_file);


    FileOutputStream fos = new FileOutputStream(filePath+newFileName);
    fos.write(decodedBytes1);
    fos.flush();
    fos.close();

Answer 1

您提到文件已经包含base64编码的字符串。

下面使用的ConvertedJPGPDF.pdf文件包含Base64编码的字符串。

因此，您无需运行以下行：

byte[] encodedBytes = Base64.getEncoder().encode(input_file);

这样做，您正在尝试再次对这些字节进行编码。

直接解码input_file数组，然后将获得的字节数组保存到.pdf文件中。

更新：

ConvertedJPGPDF.pdf不一定要命名为 .pdf 。考虑到它是基于base 64编码的，所以它实际上是一个纯文本文件。

无论如何，以下代码对我有用：

    String filePath = "C:\\Users\\xyz\\Desktop\\";
    String originalFileName = "ConvertedJPGPDF.pdf";
    String newFileName = "test.pdf";

    byte[] input_file = Files.readAllBytes(Paths.get(filePath+originalFileName));

    byte[] decodedBytes1 = Base64.getMimeDecoder().decode(input_file);

    Files.write(Paths.get(filePath+newFileName), decodedBytes1);

希望这会有所帮助！

在Base 64中解码数据时出现垃圾字符

1 个答案: