我有一个包含两列的博客文章列表。它们的创建日期以及创建它们的人的唯一ID。
我想为每个唯一ID返回最新博客文章的日期。很简单,但所有日期值都存储在字符串中。如果月份小于10,所有字符串都不会有前导0。
我一直在与strftime和strptime挣扎,但无法让它有效地回归。
import csv
Posters = {}
with open('datetouched.csv','rU') as f:
reader = csv.reader(f)
for i in reader:
UID = i[0]
Date = i[1]
if UID in Posters:
Posters[UID].append(Date)
else:
Posters[UID] = [Date]
for i in Posters:
print i, max(Posters[i]), Posters[i]
返回以下输出
0014000000s5NoEAAU 7/1/10 ['1/6/14', '7/1/10', '1/18/14', '1/24/14', '7/1/10', '2/5/14']
0014000000s5XtPAAU 2/3/14 ['1/4/14', '1/10/14', '1/16/14', '1/22/14', '1/28/14', '2/3/14']
0014000000vHZp7AAG 2/1/14 ['1/2/14', '1/8/14', '1/14/14', '1/20/14', '1/26/14', '2/1/14']
0014000000wnPK6AAM 2/2/14 ['1/3/14', '1/9/14', '1/15/14', '1/21/14', '1/27/14', '2/2/14']
0014000000d5YWeAAM 2/4/14 ['1/5/14', '1/11/14', '1/17/14', '1/23/14', '1/29/14', '2/4/14']
0014000000s5VGWAA2 7/1/10 ['7/1/10', '1/7/14', '1/13/14', '1/19/14', '7/1/10', '1/31/14']
它返回7/1/2010,因为#大于1.我需要返回列表的最大值作为完全相同的字符串值。
答案 0 :(得分:2)
在加载CSV时将datetime.datetime.strptime()
解析为日期,或将key
函数解析为max()
。
加载时:
from datetime import datetime
Date = datetime.strptime(i[1], '%m/%d/%y')
或使用max()
时:
print i, max(Posters[i], key=lambda d: datetime.strptime(d, '%m/%d/%y')), Posters[i]
后者的演示:
>>> from datetime import datetime
>>> dates = ['1/6/14', '7/1/10', '1/18/14', '1/24/14', '7/1/10', '2/5/14']
>>> max(dates, key=lambda d: datetime.strptime(d, '%m/%d/%y'))
'2/5/14'
您的代码可以稍微优化一下:
import csv
posters = {}
with open('datetouched.csv','rb') as f:
reader = csv.reader(f)
for row in reader:
uid, date = row[:2]
posters.setdefault(uid, []).append(datetime.strptime(date, '%d/%m/%y'))
for uid, dates in enumerate(posters.iteritems()):
print i, max(dates), dates
只要密钥不存在,dict.setdefault()
method就会设置默认值(此处为空列表)。
答案 1 :(得分:2)
我在加载时将日期转换为日期时间,并将结果存储在defaultdict
中,例如:
import csv
from collections import defaultdict
from datetime import datetime
posters = defaultdict(list)
with open('datetouched.csv','rU') as fin:
csvin = csv.reader(fin)
items = ((row[0], datetime.strptime(row[1], '%m/%d/%y')) for row in csvin)
for uid, dt in items:
posters[uid].append(dt)
for uid, dates in posters.iteritems():
# print uid, list of datetime objects, and max date in same format as input
print uid, dates, '{0.month}/{0.day}/%y'.format(max(dates))