使用PeriodIndex列中的数据创建新列

时间:2017-03-05 05:07:05

标签: python-3.x pandas

我有一个包含州和城镇名称的multiIndex的数据框。列是通过PeriodIndex创建的季度住房数据。我想在新列中创建数据的比率:

housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])

每当我尝试创建这个新列时,我都会收到错误:

DateParseError: Unknown datetime string format, unable to parse: P Ratio

完整代码:

# Create housing cost dataframe
zillow_file = 'City_Zhvi_AllHomes.csv'    #from https://www.zillow.com/research/data/
zillow_df = pd.read_csv(zillow_file,header=0,usecols=1,2,*range(51,251)],index_col=[1,0]).dropna(how='all')

# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)

housing_data_df = zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()


rec_start = '2000Q1'
rec_bottom = '2001Q1'

#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))

housing_data_compact_df = housing_data_df[[start_col,end_col]]

#This is where the issue occurs
housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(anal_start_col_name)].div(housing_data_compact_df[pd.Period(anal_end_col_name)])

以下是一些可能/可能没有帮助的其他数据:

[In]: print(housing_data_compact_df.head())

                                  2000Q1         2001Q1
State        RegionName                                
New York     New York      503933.333333  465833.333333
California   Los Angeles   502000.000000  413633.333333
Illinois     Chicago       237966.666667  219633.333333
Pennsylvania Philadelphia  118233.333333  116166.666667
Arizona      Phoenix       205300.000000  168200.000000



[In]: print("Indices: " + str(housing_data_compact_df.index.names))
Indices: ['State', 'RegionName']


[In]: print(housing_data_compact_df.columns)
PeriodIndex(['2000Q1', '2001Q1'], dtype='period[Q-DEC]', freq='Q-DEC')

我尝试过的事情:

似乎我的问题与PeriodIndex列有关。我尝试通过直接转换来转换数据:

[In]: housing_data_compact_df['P Ratio'] = float(housing_data_compact_df[pd.Period(start_col_name)]).div(float(housing_data_compact_df[pd.Period(end_col_name)]))

TypeError: cannot convert the series to <class 'float'>

我还尝试使用.astype(),但是我得到了与没有转换时相同的错误:

[In]: housing_data_compact_df['P Ratio'] = housing_data_compact_df[pd.Period(start_col_name)].astype(float).div(housing_data_compact_df[pd.Period(end_col_name)].astype(float))

DateParseError: Unknown datetime string format, unable to parse: P Ratio

我还重置了密钥以试图打破PeriodIndex,然后在操作完成后重新索引。然而,这似乎并不适用于我测试它的所有系统,并且似乎是一种迂回的方式来修复我认为应该是一个简单的解决方案。

问题:

如何创建新列作为这些PeriodIndex列的数据比例?

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

您需要strftime才能将string转换为housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q') 并添加copy

zillow_file = 'http://files.zillowstatic.com/research/public/City/City_Zhvi_AllHomes.csv'
zillow_df = pd.read_csv(zillow_file,header=0,
                        usecols=[1,2] + list(range(51,251)), #changed for python 3
                        index_col=[1,0]).dropna(how='all')

# rename state abbreviations in level 0 multiindex to full state name
zillow_df.reset_index(inplace=True)
#no states in question, so commented
#zillow_df['State'] = zillow_df['State'].map(states)
zillow_df.set_index(['State','RegionName'], inplace=True)

housing_data_df=zillow_df.groupby(pd.PeriodIndex(zillow_df.columns, freq="Q"), axis=1).mean()

rec_start = '2000Q1'
rec_bottom = '2001Q1'

#Reduce Size to desired data
start_col = housing_data_df.columns.get_loc(pd.Period(rec_start))-1
end_col = housing_data_df.columns.get_loc(pd.Period(rec_bottom))

所有代码:(仅为我工作的小改动,使用你的代码(很好;)))

#add copy
#http://stackoverflow.com/q/42438987/2901002
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())
                      2016Q3         2001Q1
State RegionName                           
NY    New York      599850.0            NaN
CA    Los Angeles   588750.0  233000.000000
IL    Chicago       207600.0  156933.333333
PA    Philadelphia  129950.0   55333.333333
AZ    Phoenix       197800.0  119600.000000

anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'

a = housing_data_compact_df[pd.Period(anal_start_col_name)]
                              .div(housing_data_compact_df[pd.Period(anal_end_col_name)])
housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = a
print (housing_data_compact_df.head())
                      2016Q3         2001Q1   P Ratio
State RegionName                                     
NY    New York      599850.0            NaN       NaN
CA    Los Angeles   588750.0  233000.000000  2.526824
IL    Chicago       207600.0  156933.333333  1.322855
PA    Philadelphia  129950.0   55333.333333  2.348494
AZ    Phoenix       197800.0  119600.000000  1.653846
housing_data_compact_df = housing_data_df[[start_col,end_col]].copy()
print (housing_data_compact_df.head())

anal_start_col_name = '2016Q3'
anal_end_col_name = '2001Q1'

housing_data_compact_df.columns = housing_data_compact_df.columns.strftime('%YQ%q')
housing_data_compact_df['P Ratio'] = housing_data_compact_df[anal_start_col_name]
                                       .div(housing_data_compact_df[anal_end_col_name])

print (housing_data_compact_df.head())
                      2016Q3         2001Q1   P Ratio
State RegionName                                     
NY    New York      599850.0            NaN       NaN
CA    Los Angeles   588750.0  233000.000000  2.526824
IL    Chicago       207600.0  156933.333333  1.322855
PA    Philadelphia  129950.0   55333.333333  2.348494
AZ    Phoenix       197800.0  119600.000000  1.653846

另一种可能的解决方案是:

[y for x in [old_list[slice(*a)] for a in ((0,1),(3,201),(201,None,3))] for y in x]