循环到指定的索引

时间:2016-03-01 17:24:58

标签: python list loops

我有2个大型列表,每个列表大约有10万个元素,其中一个比另一个大,我想要迭代。我的循环看起来像这样:

for i in list1:
   for j in list2:
       function()

此电流循环需要太长时间。但是,list1是需要从list2检查的列表,但是从某个索引中,list2之外不再有实例。这意味着从索引循环可能会更快,但问题是我不知道如何这样做。

在我的项目中,list2是一个包含三个键的词典列表:valuenametimestamp。 list1是按顺序排列的timestamp列表。该函数是基于value的{​​{1}}并将其放入相应timestamp列中的csv文件的函数。

这是list1中的条目示例:

name

这就是list2的样子:

[1364310855.004000, 1364310855.005000, 1364310855.008000]

在我的最终csv文件中,我应该有这样的东西:

http://s000.tinyupload.com/?file_id=03563948671103920273

3 个答案:

答案 0 :(得分:2)

如果您想要快速,您应该重新构建list2中的数据,以加快查找速度:

# The following code converts list2 into a multivalue dictionary

from collections import defaultdict

list2_dict = defaultdict(list)

for item in list2:
    list2_dict[item['timestamp']].append((item['name'], item['value']))

这使您可以更快地查找时间戳:

print(list2_dict)

defaultdict(<type 'list'>, {
    1364310855.008: [('torque_at_transmission', -3), ('vehicle_speed', 0)], 
    1364310855.005: [('engine_speed', 0)], 
    1364310855.004: [('vehicle_speed', 0), ('accelerator_pedal_position', 0)]})

使用list2_dict时,查找效率会更高:

for i in list1:
    for j in list2_dict[i]:
        # here j is a tuple in the form (name, value)
        function()

答案 1 :(得分:0)

您似乎只想使用list2中与i*2i*2+1对应的元素,即元素0,1和2,3,...

你只需要一个循环。

for i in range(len(list1)):
    j = list[i*2]
    k = list2[j+1]
    # Process function using j and k

您只会处理到第一个列表的末尾。

答案 2 :(得分:0)

我认为pandas模块完全符合您的目标......

import ujson            # 'ujson' (Ultra fast JSON) is faster than the standard 'json'
import pandas as pd

filter_list = [1364310855.004000, 1364310855.005000, 1364310855.008000]

def file2list(fn):
    with open(fn) as f:
        return [ujson.loads(line) for line in f]

# Use pd.read_json('data.json') instead of pd.DataFrame(load_data('data.json'))
# if you have a proper JSON file
#
# df = pd.read_json('data.json')
df = pd.DataFrame(file2list('data.json'))

# filter DataFrame with 'filter_list'
df = df[df['timestamp'].isin(filter_list)]

# convert UNIX timestamps to readable format
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')

# pivot data frame
# fill NaN's with zeroes
df = df.pivot(index='timestamp', columns='name', values='value').fillna(0)

# save data frame to CSV file
df.to_csv('output.csv', sep=',')

#pd.set_option('display.expand_frame_repr', False)
#print(df)

output.csv

timestamp,accelerator_pedal_position,engine_speed,torque_at_transmission,vehicle_speed
2013-03-26 15:14:15.004,4.0,0.0,0.0,2.0
2013-03-26 15:14:15.005,0.0,5.0,0.0,0.0
2013-03-26 15:14:15.008,0.0,0.0,-3.0,1.0

PS我不知道你从哪里获得[Latitude,Longitude]列,但是将这些列添加到结果DataFrame中非常容易 - 只需在调用df.to_csv()之前添加以下行

df.insert(0, 'latitude', 0)
df.insert(1, 'longitude', 0)

会导致:

timestamp,latitude,longitude,accelerator_pedal_position,engine_speed,torque_at_transmission,vehicle_speed
2013-03-26 15:14:15.004,0,0,4.0,0.0,0.0,2.0
2013-03-26 15:14:15.005,0,0,0.0,5.0,0.0,0.0
2013-03-26 15:14:15.008,0,0,0.0,0.0,-3.0,1.0