SyntaxError:非ASCII字符。蟒蛇

时间:2014-11-10 10:08:39

标签: python unicode syntax syntax-error quoting

有人可以告诉我下面哪个字符是非ASCII字符:

  

Columns(str) - 逗号分隔的值列表。仅当格式为制表符或xls时才有效。对于UnitprotKB,一些可能的列是:id,条目名称,长度,有机体。某些列名称后面必须跟有数据库名称(即“数据库(PDB)”)。再次访问uniprot网站了解更多详情。有关列关键字的完整列表,另请参阅_valid_columns。

基本上我是在定义一个类并试图给它一个注释来定义它是如何工作的:

def test(self,uniprot_id):
    '''
    Same as the UniProt.search() method arguments:
    search(query, frmt='tab', columns=None, include=False, sort='score', compress=False, limit=None, offset=None, maxTrials=10)


    query (str) -- query must be a valid uniprot query. See http://www.uniprot.org/help/text-search, http://www.uniprot.org/help/query-fields See also example below
    frmt (str) -- a valid format amongst html, tab, xls, asta, gff, txt, xml, rdf, list, rss. If tab or xls, you can also provide the columns argument. (default is tab)
    include (bool) -- include isoform sequences when the frmt parameter is fasta. Include description when frmt is rdf.
    sort (str) -- by score by default. Set to None to bypass this behaviour
    compress (bool) -- gzip the results
    limit (int) -- Maximum number of results to retrieve.
    offset (int) -- Offset of the first result, typically used together with the limit parameter.
    maxTrials (int) -- this request is unstable, so we may want to try several time.
    Columns(str) -- comma-seperated list of values. Works only if format is tab or xls. For UnitprotKB, some possible columns are: id, entry name, length, organism. Some column names must be followed by a database name (i.e. ‘database(PDB)’). Again see uniprot website for more details. See also _valid_columns for the full list of column keyword. '

    '''        
    u = UniProt()
    uniprot_entry = u.search(uniprot_id)
    return uniprot_entry

没有第52行,即在引用的注释块中以'columns'开头的那一行,这可以按预期工作,但只要我描述'列'是什么,我就会收到以下错误:

SyntaxError: Non-ASCII character '\xe2' in file /home/cw00137/Documents/Python/Identify_gene.py on line 52, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

有人知道发生了什么吗?

2 个答案:

答案 0 :(得分:4)

您正在使用' fancy'该行中的卷曲引号:

>>> u'‘database(PDB)’'
u'\u2018database(PDB)\u2019'

一开始是U+2018 LEFT SINGLE QUOTATION MARK,最后是U+2019 RIGHT SINGLE QUOTATION MARK

使用ASCII引号(U+0027 APOSTROPHEU+0022 QUOTATION MARK)或声明ASCII以外的编码作为源。

您还使用U+2013 EN DASH

>>> u'Columns(str) –'
u'Columns(str) \u2013'

将其替换为U+002D HYPHEN-MINUS

所有三个字符都使用前导E2字节编码为UTF-8:

>>> u'\u2013 \u2018 \u2019'.encode('utf8')
'\xe2\x80\x93 \xe2\x80\x98 \xe2\x80\x99'

,然后您会看到SyntaxError异常消息中的反映。

您可能希望首先避免使用这些字符。可能是您的操作系统在您键入时替换了这些操作系统,或者您正在使用文字处理器而不是纯文本编辑器来编写代码,而它正在为您替换这些代码。您可能想要关闭该功能。

答案 1 :(得分:1)

以前遇到同样的问题和同样的错误,python2默认使用ASCII编码。 您可以尝试在py文件的第一行或第二行行中声明以下注释:

# -*- coding: utf-8 -*-