Question

我使用Spyder IDE在Python 3.5中运行我的主脚本，我想从只在Python 3.4中运行的脚本导入函数。所以我建议将第二个脚本作为子进程运行，如下所示：

import subprocess
cmd = [r'c:\python34\pythonw.exe', r'C:\users\John\Desktop\scraper.py']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
print(stdout)
print(stderr)

被调用的脚本是NikolaiT的Web引擎刮刀的一个例子：

# -*- coding: utf-8 -*-

import sys
from GoogleScraper import scrape_with_config, GoogleSearchError
from GoogleScraper.database import ScraperSearch, SERP, Link

def basic_usage():
    # See in the config.cfg file for possible values
    config = {
        'SCRAPING': {
            'use_own_ip': 'True',
            'search_engines': 'baidu',
            'num_pages_for_keyword': 3
        },
        'keyword': '苹果',
        'SELENIUM': {
            'sel_browser': 'chrome',
        },
        'GLOBAL': {
            'do_caching': 'False'
        }
    }

    try:
        sqlalchemy_session = scrape_with_config(config)
    except GoogleSearchError as e:
        print(e)

    # let's inspect what we got
    link_list = []
    for serp in sqlalchemy_session.serps:
        #print(serp)
        for link in serp.links:
            #print(link)
            link_list.append(link.link)
    return link_list



links = basic_usage()

print("test")
for link in links:
    print(link)

这个脚本在Python 3.4的IDLE IDE中运行时效果很好，但是当它作为上面的子进程运行时，我从我的主脚本中打印出以下UnicodeEncodeError：

\ python34 \ scripts \ googlescraper \ GoogleScraper \ caching.py＆＃34;，第413行，在parse_all_cached_files store_serp_result（serp，self.config）

文件＆＃34; c：\ python34 \ scripts \ googlescraper \ GoogleScraper \ output_converter.py＆＃34;，第123行，在store_serp_result pprint.pprint（data）

文件＆＃34; c：\ python34 \ lib \ pprint.py＆＃34;，第52行，在pprint printer.pprint（object）

文件＆＃34; c：\ python34 \ lib \ pprint.py＆＃34;，第139行，在pprint self._format（object，self._stream，0,0，{}，0）

文件＆＃34; c：\ python34 \ lib \ pprint.py＆＃34;，第193行，在_format allowance + 1，context，level）

文件＆＃34; c：\ python34 \ lib \ pprint.py＆＃34;，第268行，_format write（rep）

文件＆＃34; c：\ python34 \ lib \ encodings \ cp1252.py＆＃34;，第19行，编码返回codecs.charmap_encode（input，self.errors，encoding_table）[0] UnicodeEncodeError：\＆＃39; charmap \＆＃39;编解码器无法对位置1-2中的字符进行编码：字符映射到未定义的＆＃39;

为什么这只会在间接运行时发生？感谢您提出任何帮助，澄清问题或改进我的问题的建议。

Answer 1

简而言之，您的问题如下：

Python IDE完全支持UTF-8编码，这就是您调用的脚本在那里运行良好的原因。
另一方面，打开子进程时，由于您使用的是Windows，因此默认情况下使用的是Windows1252字符集，它不支持部分输出。

快速解决方案：如果在Windows命令行上将脚本调用为python foobar.py，则可以在调用脚本之前运行chcp 65001;如果您使用Spyder IDE，则应该有一个允许您将文件/项目编码设置为UTF-8的设置。

（您也可以尝试将# -*- coding: utf-8 -*-添加到主Python脚本的顶部。）

仅当脚本作为子进程运行时的UnicodeEncodeError

1 个答案: