textract在pdf上不起作用

时间:2018-08-18 09:42:38

标签: python anaconda text-extraction

我是python的新手。我正在使用Pycharm 2018.2 Anaconda 上的最新版本。我正在Windows 10上工作。

解决了在win 10上安装textract的所有问题之后,我使用anaconda提示获得了肯定的安装结果。另外,我已经从\continuum\anaconda3\python.exe

导入了项目解释器

我的目标是我要从大文件中提取pdf文本,因此将其另存为.txt

我尝试了textract的test_pdf.py文件,但它们不起作用。

以下是结论代码:

  

“ textract”写错或找不到(自我翻译自   德语:-/)

因此,我在textract页面上尝试了自己的方法。但这不起作用...:

代码:

import textract
text = textract.process('pfad/large.pdf')

结果:

C:\Users\raz\AppData\Local\Continuum\anaconda3\python.exe "C:/Users/raz/Google Drive/FOM/Master/Master/NurText/Testo.py"
Traceback (most recent call last):
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\utils.py", line 85, in run
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] Das System kann die angegebene Datei nicht finden

在处理上述异常期间,发生了另一个异常:

Traceback (most recent call last):
File "C:/Users/raz/Google Drive/FOM/Master/Master/NurText/Testo.py", line 2, in 
text = textract.process('pfad/large.pdf')
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers_init_.py", line 77, in process
return parser.process(filename, encoding, **kwargs)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\utils.py", line 46, in process
byte_string = self.extract(filename, **kwargs)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\pdf_parser.py", line 28, in extract
raise ex
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\pdf_parser.py", line 20, in extract
return self.extract_pdftotext(filename, **kwargs)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\pdf_parser.py", line 43, in extract_pdftotext
stdout, _ = self.run(args)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\utils.py", line 92, in run
' '.join(args), 127, '', '',
textract.exceptions.ShellError: The command pdftotext pfad/large.pdf - failed with exit code 127
------------- stdout -------------
------------- stderr -------------

感谢您的帮助

0 个答案:

没有答案
相关问题