如何使用wkhtmltopdf将简单的html转换为pdf?

时间:2013-03-28 22:48:48

标签: html pdf wkhtmltopdf html-to-pdf

这是我做的:

  1. 在亚马逊云中创建了一个Linux虚拟机。
  2. 按照https://code.google.com/p/wkhtmltopdf/wiki/compilation的说明下载并编译wkhtmltopdf-qt和wkhtmltopdf的源代码。最后我有一个wkhtmltopdf的静态版本。
  3. 取了这个html(http://jsfiddle.net/mark69_fnd/8CtjB/):

    < HTML>         < HEAD>                 < style type =“text / css”> p {font-family:sans-serif;};< / style>         < /头>         <身体GT;                 < p>让我们测试< / p>         < /体> < / HTML>

  4. wkhtmltopdf test.html test.pdf

  5. 将test.pdf复制到我的Windows桌面,打开它并得到它(https://docs.google.com/file/d/0B2pbsdBJxJI3MV8zby14cGk5VWs/edit?usp=sharing): enter image description here
  6. 我严格遵循指南,qt配置选项来自../wkhtmltopdf/static_qt_conf_base../wkhtmltopdf/static_qt_conf_linux,如指南所示。

    毋庸置疑,我对结果有点失望。谁能解释一下我做错了什么?

    P.S。

    实际上我需要转换更复杂的HTML,但是当我无法转换一个简单的HTML时,没有必要谈论它。

    修改

    我想强调一点,我不在Linux上工作,我只打开一个亚马逊托管的Linux机顶盒。意思是,我没有X11环境。

    这是我在尝试使用预定义的wkhtmltopdf包时得到的结果:

    ubuntu@ip-10-245-78-162:~$ which wkhtmltopdf
    ubuntu@ip-10-245-78-162:~$ /usr/bin/wkhtmltopdf
    -bash: /usr/bin/wkhtmltopdf: No such file or directory
    ubuntu@ip-10-245-78-162:~$ sudo apt-get install wkhtmltopdf
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following NEW packages will be installed:
      wkhtmltopdf
    0 upgraded, 1 newly installed, 0 to remove and 120 not upgraded.
    Need to get 0 B/104 kB of archives.
    After this operation, 303 kB of additional disk space will be used.
    Selecting previously unselected package wkhtmltopdf.
    (Reading database ... 36679 files and directories currently installed.)
    Unpacking wkhtmltopdf (from .../wkhtmltopdf_0.9.9-3_amd64.deb) ...
    Processing triggers for man-db ...
    Setting up wkhtmltopdf (0.9.9-3) ...
    ubuntu@ip-10-245-78-162:~$ l test.*
    -rw-r--r-- 1 ubuntu ubuntu 123 Mar 30 12:46 test.html
    ubuntu@ip-10-245-78-162:~$ cat test.html
    <html> <head> <style type="text/css">p{font-family: sans-serif;};</style> </head> <body> <p>Let's Test</p> </body> </html>
    ubuntu@ip-10-245-78-162:~$ /usr/bin/wkhtmltopdf test.html test.pdf
    wkhtmltopdf: cannot connect to X server
    ubuntu@ip-10-245-78-162:~$
    

    EDIT2

    1. 我已下载ftp://rpmfind.net/linux/fedora/linux/development/rawhide/x86_64/os/Packages/u/urw-fonts-2.4-14.fc19.noarch.rpm
    2. http://www.howtogeek.com/howto/ubuntu/install-an-rpm-package-on-ubuntu-linux/开始按照说明将rpm转换为deb格式。
    3. 安装了deb
    4. 制作了pdf,但仍然只看到了正方形。
    5. 以下是成绩单:

      ubuntu@ip-10-245-78-162:~$ sudo alien urw-fonts-2.4-14.fc19.noarch.rpm --scripts
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      warning: urw-fonts-2.4-14.fc19.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID fb4b18e6: NOKEY
      urw-fonts_2.4-15_all.deb generated
      ubuntu@ip-10-245-78-162:~$ sudo dpkg -i urw-fonts_2.4-15_all.deb
      Selecting previously unselected package urw-fonts.
      (Reading database ... 38529 files and directories currently installed.)
      Unpacking urw-fonts (from urw-fonts_2.4-15_all.deb) ...
      Setting up urw-fonts (2.4-15) ...
      Processing triggers for fontconfig ...
      ubuntu@ip-10-245-78-162:~$  ./wkhtmltopdf/bin/wkhtmltopdf test.html test.pdf
      Loading pages (1/6)
      Counting pages (2/6)
      Resolving links (4/6)
      Loading headers and footers (5/6)
      Printing pages (6/6)
      Done
      ubuntu@ip-10-245-78-162:~$
      

      EDIT3

      我已经安装了xvfb-run软件包,现在可以通过它运行默认版本(/ usr / bin / wkhtmltopdf)。实际上,它能够将简单的test.html转换为pdf,但是,对于带有Javascript代码的复杂html页面,它无法这样做。似乎/ usr / bin / wkhtmltopdf无法在正在转换的页面上运行任何Javascript代码。

      我仍然感到困惑,为什么编译版本不起作用。

      EDIT4

      我对默认的wkhtmltopdf版本不公正。它能够理解页面中的Javascript,它成功转换了以下html:

      <html>
        <head>
          <style type="text/css">
            body {
              font-family: sans-serif;
            }
          </style>
        </head>
        <body id='body'>
          <script>
            document.getElementById('body').innerHTML = 'Hello world!';
          </script>
        </body>
      </html>
      

      我会尝试理解为什么它会因为真实的页面而失败,但我不知道如何解决它,除非试图通过丢弃原始页面来获得最小的失败页面。

      EDIT5

      好的,这是最小的例子,不适用于默认的wkhtmltopdf版本:

      <!DOCTYPE html>
      <html>
        <head>
          <style type="text/css">
              html, body {
                      height: 100%;
                      overflow: hidden;
              }
          </style>
        </head>
        <body>
          Hello World!
        </body>
      </html>
      

      创建的pdf为空。以下是成绩单:

      ubuntu@ip-10-245-78-162:~$ cat test2.html
      <!DOCTYPE html>
      <html>
        <head>
          <style type="text/css">
              html, body {
                      height: 100%;
                      overflow: hidden;
              }
          </style>
        </head>
        <body>
          Hello World!
        </body>
      </html>
      ubuntu@ip-10-245-78-162:~$ xvfb-run /usr/bin/wkhtmltopdf test2.html test2.pdf ; l test2.pdf
      Loading page (1/2)
      Printing pages (2/2)
      Done
      -rw-r--r-- 1 ubuntu ubuntu 1266 Mar 31 11:16 test2.pdf
      ubuntu@ip-10-245-78-162:~$ cat test2.html |sed 6d | xvfb-run /usr/bin/wkhtmltopdf - test2.pdf ; l test2.pdf
      Loading page (1/2)
      Printing pages (2/2)
      Done
      -rw-r--r-- 1 ubuntu ubuntu 4284 Mar 31 11:16 test2.pdf
      ubuntu@ip-10-245-78-162:~$
      

      注意删除第6行(height:100%;)会如何改变创建的pdf文件的大小。

      EDIT6

      自定义版本是静态链接的,而默认版本依赖于相当多的WebKit共享库:

      自定义版本:

      ubuntu@ip-10-245-78-162:~/wkhtmltopdf/bin$ l wkhtmltopdf
      -rwxr-xr-x 1 ubuntu ubuntu 35020224 Mar 31 22:26 wkhtmltopdf
      ubuntu@ip-10-245-78-162:~/wkhtmltopdf/bin$ ldd !$
      ldd wkhtmltopdf
              linux-vdso.so.1 =>  (0x00007fff195ff000)
              libXrender.so.1 => /usr/lib/x86_64-linux-gnu/libXrender.so.1 (0x00007fefc06db000)
              libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007fefc03a7000)
              libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fefc01a2000)
              librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fefbff9a000)
              libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fefbfd7d000)
              libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fefbfa7c000)
              libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fefbf780000)
              libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fefbf56a000)
              libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fefbf1aa000)
              /lib64/ld-linux-x86-64.so.2 (0x00007fefc08ef000)
              libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007fefbef8c000)
              libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007fefbed88000)
              libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007fefbeb82000)
      ubuntu@ip-10-245-78-162:~/wkhtmltopdf/bin$
      

      现在是默认版本:

      ubuntu@ip-10-245-78-162:/usr/bin$ l wkhtmltopdf
      -rwxr-xr-x 1 root root 233512 May  7  2011 wkhtmltopdf
      ubuntu@ip-10-245-78-162:/usr/bin$ ldd wkhtmltopdf
              linux-vdso.so.1 =>  (0x00007fff031ff000)
              libQtWebKit.so.4 => /usr/lib/x86_64-linux-gnu/libQtWebKit.so.4 (0x00007f28a33bc000)
              libQtGui.so.4 => /usr/lib/x86_64-linux-gnu/libQtGui.so.4 (0x00007f28a26ee000)
              libQtNetwork.so.4 => /usr/lib/x86_64-linux-gnu/libQtNetwork.so.4 (0x00007f28a23a1000)
              libQtCore.so.4 => /usr/lib/x86_64-linux-gnu/libQtCore.so.4 (0x00007f28a1ecf000)
              libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f28a1bcf000)
              libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f28a19b8000)
              libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f28a15f9000)
              libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f28a1356000)
              libXrender.so.1 => /usr/lib/x86_64-linux-gnu/libXrender.so.1 (0x00007f28a114b000)
              libgstapp-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstapp-0.10.so.0 (0x00007f28a0f3f000)
              libgstinterfaces-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstinterfaces-0.10.so.0 (0x00007f28a0d2d000)
              libgstpbutils-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstpbutils-0.10.so.0 (0x00007f28a0b09000)
              libgstvideo-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstvideo-0.10.so.0 (0x00007f28a08ed000)
              libgstbase-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstbase-0.10.so.0 (0x00007f28a069a000)
              libgstreamer-0.10.so.0 => /usr/lib/x86_64-linux-gnu/libgstreamer-0.10.so.0 (0x00007f28a03b2000)
              libgobject-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0 (0x00007f28a0163000)
              libglib-2.0.so.0 => /lib/x86_64-linux-gnu/libglib-2.0.so.0 (0x00007f289fe6e000)
              libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f289fc50000)
              libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f289f91c000)
              libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f289f620000)
              libfontconfig.so.1 => /usr/lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007f289f3e9000)
              libaudio.so.2 => /usr/lib/x86_64-linux-gnu/libaudio.so.2 (0x00007f289f1d1000)
              libpng12.so.0 => /lib/x86_64-linux-gnu/libpng12.so.0 (0x00007f289efa9000)
              libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f289ed91000)
              libfreetype.so.6 => /usr/lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007f289eaf5000)
              libSM.so.6 => /usr/lib/x86_64-linux-gnu/libSM.so.6 (0x00007f289e8ed000)
              libICE.so.6 => /usr/lib/x86_64-linux-gnu/libICE.so.6 (0x00007f289e6d2000)
              libXi.so.6 => /usr/lib/x86_64-linux-gnu/libXi.so.6 (0x00007f289e4c3000)
              libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f289e2b2000)
              libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f289e0ad000)
              librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f289dea5000)
              /lib64/ld-linux-x86-64.so.2 (0x00007f28a517e000)
              liborc-0.4.so.0 => /usr/lib/x86_64-linux-gnu/liborc-0.4.so.0 (0x00007f289dc29000)
              libgmodule-2.0.so.0 => /usr/lib/x86_64-linux-gnu/libgmodule-2.0.so.0 (0x00007f289da25000)
              libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f289d6ca000)
              libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007f289d4c1000)
              libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f289d284000)
              libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f289d065000)
              libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f289ce3b000)
              libXt.so.6 => /usr/lib/x86_64-linux-gnu/libXt.so.6 (0x00007f289cbd5000)
              libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f289c9d1000)
              libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007f289c7cc000)
              libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f289c5c5000)
      ubuntu@ip-10-245-78-162:/usr/bin$
      

      EDIT7

      伙计们,我不明白wkhtmltopdf是如何为你效力的。我从头开始,完全是:

      1. 创建了一个全新的Ubuntu亚马逊微实例(免费套餐)
      2. sudo apt-get update
      3. sudo apt-get upgrade
      4. sudo apt-get install libx11-dev
      5. sudo apt-get install libfontconfig1-dev
      6. wget https://wkhtmltopdf.googlecode.com/files/wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
      7. tar xjf wkhtmltopdf-0.11.0_rc1-static-amd64.tar.bz2
      8. 使用 EDIT5 中的内容创建了test2.html(参见 EDIT5 成绩单)
      9. 在test2.html上运行wkhtmltopdf-amd64。 制作的pdf为空!
      10. 从test2.html(CSS属性宽度或溢出)中删除第6行或第7行,突然之间就可以了!
      11. 有人可以追溯我的步骤并确认吗?

        EDIT8

        在我的笔记本电脑上的VMWare VM中安装了CentOS 6.4。结果相同。 wkhtmltopdf不适用于上述简单的html文件。

1 个答案:

答案 0 :(得分:2)

尝试在html head标签中设置charset声明,如下所示:

<head>
  <meta charset="utf-8">
  ...
</head>
相关问题