如何组合多个PDF?

时间:2017-12-31 15:58:39

标签: python python-3.x pdf

我想创建一个脚本来读取目录中的所有pdf个文件,复制每个文件的第二页并将其写入一个输出pdf(包含所有秒页)。
我已经写了一个代码,但它给了我一个带有空白页面的pdf。这真的很奇怪,因为我有另一个代码,它取每个pdf的第二页,并为每个第二页创建一个新的pdf,并且该代码有效。我认为我的问题可能与addPage()有关 我正在使用PyPDF2库来使用pdf文件。

import pathlib
from PyPDF2 import PdfFileReader, PdfFileWriter

files_list = [file for file in pathlib.Path(__file__).parent.iterdir() if (file.is_file() and not str(file).endswith(".py"))]
total = len(files_list)    
writer = PdfFileWriter()    
for file in files_list:
    with open(file, 'rb') as infile:
        reader = PdfFileReader(infile)
        reader.decrypt("")
        writer.addPage(reader.getPage(1))            
with open('Output.pdf', 'wb') as outfile:
    writer.write(outfile)    
print('Done.')

2 个答案:

答案 0 :(得分:0)

查看PdfFileMerger.append - 它允许您将多个pdf中的页面合并为一个结果文件。

append(fileobj, bookmark=None, pages=None, import_bookmarks=True)
  

与merge()方法相同,但假设您希望将所有页面连接到文件末尾而不是指定位置。

Parameters:   
fileobj               A File Object or an object that supports the standard read 
                      and seek methods similar to a File Object. Could also be a 
                      string representing a path to a PDF file.
bookmark (str)        Optionally, you may specify a bookmark to be applied at the 
                      beginning of the included file by supplying the text of 
                      the bookmark.
pages                 can be a Page Range or a (start, stop[, step]) tuple to merge
                      only the specified range of pages from the source document into 
                     the output document.
import_bookmarks (bool)      You may prevent the source document’s bookmarks 
                             from being imported by specifying this as False.

这似乎更适合您使用PdfFileWriter进行的操作。

from PyPDF2 import PdfFileMerger, PdfFileReader

# ...

merger = PdfFileMerger()

merger.append(PdfFileReader(file(filename1, 'rb')),None, [2])
merger.append(PdfFileReader(file(filename2, 'rb')),None, [2])

merger.write("document-output.pdf")

示例改编自答案:https://stackoverflow.com/a/29871560/7505395

答案 1 :(得分:0)

您是否尝试过以下代码:https://www.randomhacks.co.uk/how-to-split-a-pdf-every-2-pages-using-python/

from pyPdf import PdfFileWriter, PdfFileReader
import glob
import sys

pdfs = glob.glob("*.pdf")

for pdf in pdfs:

    inputpdf = PdfFileReader(file(pdf, "rb"))

    for i in range(inputpdf.numPages // 2):

        output = PdfFileWriter()
        output.addPage(inputpdf.getPage(i * 2))

        if i * 2 + 1 <  inputpdf.numPages:
            output.addPage(inputpdf.getPage(i * 2 + 1))

        newname = pdf[:7] + "-" + str(i) + ".pdf"

        outputStream = file(newname, "wb")
        output.write(outputStream)
        outputStream.close()