使用python

时间:2018-02-18 00:26:34

标签: python pandas csv datetime input

我有一个大型数据库,如下所示:

id, Start Time, End Time
0, 2017-01-01 00:00:21, 2017-01-01 00:11:41
1, 2017-01-01 00:00:45, 2017-01-01 00:11:46
2, 2017-02-01 00:00:57, 2017-02-01 00:22:08
3, 2017-03-01 00:01:10, 2017-03-01 00:11:42
4, 2017-01-01 00:01:51, 2017-01-01 00:12:57

使用大熊猫可能会更容易做到这一点,但我没有多少经验。我研究了arrowdatetime等模块,并希望根据用户的输入过滤数据。使用该输入,用户返回过滤后的数据。例如:

def get_month('data.csv'):
    month = input('\nWhich month? January, February, March, April, May, or June?\n')
    date = '1 ' + month + ', 2017'
    with open(city_data, 'r') as fin, open('userdata.csv', 'w') as fout:
         writer = csv.writer(fout, delimiter=' ')
         for row in csv.reader(fin, delimiter=' '):
             if row[0] == arrow.get(date,'D MMMM, YYYY').format('YYYY-MM-DD'):
                 return writer.writerow(row)

我接近这个吗?我想我可能会在date = '1 ' + month + ', 2017'部分走错方向。有没有办法只使用January等输入来过滤数据?

1 个答案:

答案 0 :(得分:3)

对于结构化数据,pandas提供了有效的解决方案:

from datetime import datetime
import pandas as pd

# read data from file
df = pd.read_csv('data.csv')

# this creates a dataframe as below:
#    id           Start Time             End Time
# 0   0  2017-01-01 00:00:21  2017-01-01 00:11:41
# 1   1  2017-01-01 00:00:45  2017-01-01 00:11:46
# 2   2  2017-02-01 00:00:57  2017-02-01 00:22:08
# 3   3  2017-03-01 00:01:10  2017-03-01 00:11:42
# 4   4  2017-01-01 00:01:51  2017-01-01 00:12:57

# cast string columns to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])

def get_month(df):
    month = input('\nWhich month? January, February, March, April, May, or June?\n')
    return df[df['Start Time'].dt.month == datetime.strptime(month, '%B').month]

get_month(df)