Cython编译错误 - 在赋值之前引用的变量

时间:2016-08-22 22:28:22

标签: python cython

我正在使用cython和一个相当大的for循环 - 超过一百万。当我作为常规python程序运行时,常规运行大约需要40分钟。

vetdns.pyx并在声明函数 -

之下标记了cdef变量
now = datetime.datetime.now()
today = now.strftime("%Y-%m-%d")
my_date = date.today()
dayoftheweek=calendar.day_name[my_date.weekday()]
#needed because of the weird naming and time objects vs datetime objects 
read_date = datetime.datetime.strptime(today, '%Y-%m-%d')
previous_day = read_date - datetime.timedelta(days=1)
yesterday = previous_day.strftime('%Y-%m-%d')

my_dir = os.getcwd()
# extracted = "extracted_"+today
outname = "alexa_all_vetted"+today
downloaded_file = "top-1m"+today+".zip"

INPUT_FILE="dns-all"
OUTPUT_FILE="dns_blacklist_"+dayoftheweek
REMOVE_FILE="dns_blacklist_upto"+yesterday
PATH = "/home/money/Documents/hybrid"
FULL_FILENAME= os.path.join(PATH, OUTPUT_FILE)
CLEANUP_FILENAME=os.path.join(PATH, REMOVE_FILE)
##cdef outname, INPUT_FILE, OUTPUT_FILE  labeled just inside function. 


def main():

    zip_file_url = "http://s3.amazonaws.com/alexa-static/top-1m.csv.zip"
    urllib.urlretrieve(zip_file_url, downloaded_file)
    ###naming variables affected in for loop
    cdef outname, INPUT_FILE, OUTPUT_FILE
    with zipfile.ZipFile(downloaded_file) as zip_file:
        for member in zip_file.namelist():
            filename = os.path.basename(member)
            # skip directories
            if not filename:
                continue

            # copy file (taken from zipfile's extract)
            source = zip_file.open(member)
            target = file(os.path.join(my_dir, filename), "wb")
            with source, target:
                shutil.copyfileobj(source, target)

    whitelist = open(outname,'w')
    with open(member,'r') as member:
        reader = csv.reader(member, delimiter=',')
        alexia_hosts = []
        for row in reader:
            alexia_hosts.append(row[1])
    whitelist.write("\n".join(alexia_hosts))

    file_out=open(FULL_FILENAME,"w")
    with open(INPUT_FILE, 'r') as dnsMISP:
        with open(outname, 'r') as f:
            alexa=[]
            alexafull=[]
            blacklist = []
            for line in f:
                line = line.strip()
                alexahostname=urltools.extract(line)
                alexa.append(alexahostname[4])
                alexafull.append(line)
            for line in dnsMISP:
                line = line.strip()
                hostname = urltools.extract(line)
    #           print hostname[4]
                if hostname[4] in alexa:
                    print hostname[4]+",this hostname is in alexa"
                    pass
                elif hostname[5] in alexafull:
                    print hostname[5]+",this hostname is in alexafull"
                else:
                    blacklist.append(line)

    file_out.write("\n".join(blacklist))

    file_out.close()


main()

内置setup.py

from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("vetdns.pyx")
)

但是当我跑步时

python setup.py build_ext --inplace

我收到以下错误 -

Error compiling Cython file:
------------------------------------------------------------
...
            source = zip_file.open(member)
            target = file(os.path.join(my_dir, filename), "wb")
            with source, target:
                shutil.copyfileobj(source, target)

    whitelist = open(outname,'w')
                        ^
------------------------------------------------------------
vetdns.pyx:73:25: local variable 'outname' referenced before assignment

现在这可能有点超出我的想法,但无论如何我想要玩它。

1 个答案:

答案 0 :(得分:2)

您在此行声明outname作为本地变量:

cdef outname, INPUT_FILE, OUTPUT_FILE

但是你永远不会给它分配任何东西。 Python要求在使用变量之前分配变量,没有默认值将它们初始化为。

我看到你有一个名为“outname”的全局变量,如果你想使用全局变量,你不需要在你的函数中使用cdef。这同样适用于您的其他全局变量。

Upadate

你可以尝试的一件事,对我来说效果很好,就是将循环弹出到一个cythonized函数中。这样,调试/优化的cython代码就会减少,但是当大部分处理时间花费在几行代码中时(通常就是这种情况),编译这些代码就会产生很大的不同。在实践中,这看起来有点像这样:

# my_script.py
import os
from my_helper import bottle_neck

def main():
    a = 12
    b = 22
    c = 999
    # More prep code
    print bottle_neck(a, b, c)

main()

在另一个文件中:

# my_helper.pyx
def bottle_neck(int a, int b, int c):
    # Silly example, this loop might never finish
    while a > 0:
        a = a & b
        b = a - c
        c = b * a
    return a, b, c

确保你对自己的代码进行了分析,只有在你花时间进行优化之后才发现它实际上很快就会很慢。