SQLite3 - Executemany没有完成Python3中大型列表的更新

时间:2017-05-16 18:10:10

标签: python sqlite

我正在尝试更新SQLite数据库中的大约500k行。我可以很快地创建它们,但是当我更新时,它似乎无限期挂起,但我没有收到错误消息。 (相同大小的插件需要35秒,此更新已超过12小时)。

我执行更新的代码部分是:

for line in result:
if --- blah blah blah ---:
   stuff
else:
    counter = 1
    print("Starting to append result_list...")
    result_list = []
    for line in result:
        result_list.append((str(line),counter))
        counter += 1                 
    sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
         ' = ? where row_id = ?'
    print("Executing SQL...")
    c.executemany(sql, result_list)
print("Committing.")
conn.commit()

它打印出“执行SQL ...”,并且可能会尝试执行executemany,而这就是它被卡住的地方。变量“result”是一个记录列表,并且据我所知,因为insert语句正常工作,它基本相同。

我是否误用了executemany?我在executemany()上看到很多线程,但据我所知,所有这些线程都会收到错误消息,而不仅仅是无限期挂起。

作为参考,我的完整代码如下。基本上我正在尝试将ASCII文件转换为sqlite数据库。我知道我可以在技术上同时插入所有列,但我可以访问的机器都限制为32位Python并且内存不足(此文件非常大,接近1GB的文本)。

import pandas as pd
import sqlite3

ascii_file = r'c:\Path\to\file.ASC_'
sqlite_file = r'c:\path\to\sqlite.db'

conn = sqlite3.connect(sqlite_file)
c = conn.cursor()

# Taken from https://www.cdc.gov/brfss/annual_data/2015/llcp_varlayout_15_onecolumn.html
raw_list = [[1,"_STATE",2],
[17,"FMONTH",2],
... many other values here
[2154,"_AIDTST3",1],]

col_list = []
for col in raw_list:
    begin = (col[0] - 1)
    col_name = col[1]
    end = (begin + col[2])
    col_list.append([(begin, end,), col_name,])

for col in col_list:
    print(col)
    col_specification = [col[0]]
    print("Parsing...")
    data = pd.read_fwf(ascii_file, colspecs=col_specification)
    print("Done")
    result = data.iloc[:,[0]]
    result = result.values.flatten()
    sql = '''CREATE table if not exists BRFSS2015
             (row_id integer NOT NULL,
              ''' + col[1] +  ' text)'
    print(sql)
    c.execute(sql)
    conn.commit()
    sql = '''ALTER TABLE 
             BRFSS2015 ADD COLUMN ''' + col[1] + ' text'
    try:
        c.execute(sql)
        print(sql)
        conn.commit()
    except Exception as e:
        print("Error Happened instead")
        print(e)

    counter = 1  
    result_list = []
    for line in result:
        result_list.append((counter, str(line)))
        counter += 1

    if '_STATE' in col:
        counter = 1  
        result_list = []
        for line in result:
            result_list.append((counter, str(line)))
            counter += 1
        sql = 'INSERT into BRFSS2015 (row_id,' + col[1] + ')'\
               + 'values (?,?)'
        c.executemany(sql, result_list)
    else:
        counter = 1
        print("Starting to append result_list...")
        result_list = []
        for line in result:
            result_list.append((str(line),counter))
            counter += 1                 
        sql = 'UPDATE BRFSS2015 SET ' + col[1] + \
             ' = ? where row_id = ?'
        print("Executing SQL...")
        c.executemany(sql, result_list)
    print("Committing.")
    conn.commit()
    print("Comitted... moving on to next column...")

1 个答案:

答案 0 :(得分:2)

对于要更新的​​每一行,数据库必须搜索该行。 (插入时不需要这样做。)如果row_id列上没有索引,则数据库必须遍历整个表以进行每次更新

最好一次插入整行。如果无法做到这一点,row_id上的create an index或更好,请将其声明为INTEGER PRIMARY KEY