Flask / postgres - 使用PDFJS显示pdf

时间:2017-09-17 14:19:08

标签: python postgresql pdf encoding pdfjs

我有一个非常简单的应用程序。用户通过Web前端将pdf文件上传到postgres数据库。然后应该通过pdfjs在浏览器中呈现该pdf。

我相当肯定我的问题是编码问题,但我不认为我能够很好地理解编码,以便自己解决这个问题。

我的模特:

class Lesson(Base):
    __tablename__ = 'lessons'

    # Name of the lesson
    lesson_order = db.Column(db.Enum(LessonIndexes), nullable=False)
    name = db.Column(db.String(128), nullable=False)
    summary = db.Column(db.String(500))
    lesson_plan_id = db.Column(db.Integer(), ForeignKey('lesson_plans.id'), nullable=False)
    pdf = db.Column(db.LargeBinary())

我的控制器:

@mod_lp.route('/<lesson_plan_id>/create_lesson', methods=["POST"])
def create_lesson(lesson_plan_id):
    form = LessonForm()
    file = request.files['pdf']  # type: FileStorage

    if form.validate_on_submit():
        file = request.files['pdf']
        lesson = Lesson(form.lesson_order.data, form.name.data, form.summary.data, lesson_plan_id,
                        pdf=file.read() # this line here
                        )
        db.session.add(lesson)
        db.session.commit()
    return redirect(url_for('lesson_plan.show', lesson_plan_id=lesson_plan_id))

这会将数据存储为:

%PDF-1.4
%����
1 0 obj
<</Creator (Mozilla/5.0 \(Macintosh; Intel Mac OS X 10_12_6\) AppleWebKit/537.36 \(KHTML, like Gecko\) Chrome/60.0.3112.113 Safari/537.36)
/Producer (Skia/PDF m60)
/CreationDate (D:20170916222407+00'00')
/ModDate (D:20170916222407+00'00')>>
endobj
2 0 obj
<</Filter /FlateDecode
/Length 1370>> stream
x���ݎ�4��<�������   qq$8�@%`aB�H�_�����T�E���ړ�c'�t�Z��[������}�{�I���@���

(etc...)

我的javasript(取自PDFJS,你好世界):

var pdfString = "{{ pdf_data}}";
var pdfData = atob(pdfString);
if (pdfData) {
    var loadingTask = PDFJS.getDocument({data: pdfData});
    loadingTask.promise.then(function (pdf) {
        console.log('PDF loaded');

        // Fetch the first page
        var pageNumber = 1;
        pdf.getPage(pageNumber).then(function (page) {
            console.log('Page loaded');

            var scale = 1.5;
            var viewport = page.getViewport(scale);

            // Prepare canvas using PDF page dimensions
            var canvas = document.getElementById('pdf-canvas');
            var context = canvas.getContext('2d');
            canvas.height = viewport.height;
            canvas.width = viewport.width;

            // Render PDF page into canvas context
            var renderContext = {
                canvasContext: context,
                viewport: viewport
            };
            var renderTask = page.render(renderContext);
            renderTask.then(function () {
                console.log('Page rendered');
            });
        });
    }, function (reason) {
        // PDF loading error
        console.error(reason);
    });

我当前的错误是:

6:108 Uncaught DOMException: Failed to execute 'atob' on 'Window': The string to be decoded is not correctly encoded.

我尝试过的事情:

file.stream.getvalue()

file.stream.getvalue().decode("latin-1") # for whatever reason, this was the only 'decode' that didn't throw an error

file.stream.getvalue().decode("latin-1").encode()

base64.b64encode(file.stream.getvalue().decode("latin-1").encode())

但这些都以各种方式失败了。 UPDATE:

如果我将数据库中的二进制数据发送到我的模板:

pdf_data = lesson.pdf

忘记在其上调用atob

var pdfData = pdfString;
        if (pdfData) {
...

我收到此错误:

Error: Invalid XRef stream header
pdf.worker.js:340     at error (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:340:17)
    at XRef_readXRef [as readXRef] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:20943:13)
    at XRef_parse [as parse] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:20613:28)
    at PDFDocument_setup [as setup] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:26445:17)
    at PDFDocument_parse [as parse] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:26336:12)
    at http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36120:28
    at Promise (<anonymous>)
    at LocalPdfManager_ensure [as ensure] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36115:14)
    at LocalPdfManager.BasePdfManager_ensureDoc [as ensureDoc] (http://0.0.0.0:8080/static/js/pdfjs/build/pdf.worker.js:36067:19)

1 个答案:

答案 0 :(得分:1)

atob需要一个base64编码的字符串。我得到了一个基本的例子,至少可以成功调用atob。很确定这是你看到的问题。您可能只需将base64编码的内容保存在该postgres表中,这样您就不需要一直解码它。 &#39; source.pdf&#39;只是我在磁盘上的pdf示例。但是,您可以使用postgres表格中的数据进行交换。

flask_app.py

from flask import Flask, request, render_template
import base64

app = Flask(__name__)


@app.route("/testing", methods=["GET"])
def get_test_file():
    with open("source.pdf", "rb") as data_file:
        data = data_file.read()
    encoded_data = base64.b64encode(data).decode('utf-8')
    return render_template("test.html", encoded_data=encoded_data)

的test.html

<html>
<head>
</head>
<body>
  <script>
    var encoded_data = '{{ encoded_data }}';
    var pdf_data = atob(encoded_data);
  </script>
</body>
</html>