Question

我正在尝试更改以下代码的输出：

import numpy as np
import pandas as pd
from pandas import Series, DataFrame, Panel, bdate_range, DatetimeIndex, date_range
from pandas.tseries.holiday import get_calendar
from datetime import datetime, timedelta
import pytz as pytz
from pytz import timezone

start =  datetime(2013, 1, 1)

hr1 = np.loadtxt("Spot_2013_Hour1.txt")

index = date_range(start, end = '2013-12-31', freq='B')
Allhrs = Series(index)
Allhrs = DataFrame({'hr1': hr1})
df = Allhrs
indexed_df = df.set_index(index)
print indexed_df

错误：

  File "<ipython-input-61-c7890d8ccb07>", line 17, in <module>
    indexed_df = df.set_index(index)

  File "/Applications/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 2390, in set_index
    frame.index = index

  File "/Applications/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1849, in __setattr__
    object.__setattr__(self, name, value)

  File "properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:38491)

  File "/Applications/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 400, in _set_axis
    self._data.set_axis(axis, labels)

  File "/Applications/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 1965, in set_axis
    'new values have %d elements' % (old_len, new_len))

ValueError: Length mismatch: Expected axis has 365 elements, new values have 261 elements

问题：

我有一个时间序列，我从一个txt文件加载。时间序列由365个元素组成，即2013年的所有日子。我需要这个txt文件，因为我需要分析每一天。

此外，我需要分析2013年的特定日期。所以我想更改数据的读数，即我只想看到工作日。此外，看到/打印特定的日子会很棒。

帮助表示赞赏

Answer 1

首先，使用年度的创建一个DataFrame（或系列）：

index = date_range(start='2013-1-1', end='2013-12-31', freq='D') df = pd.DataFrame(hr1, index=index)

接下来，使用df.asfreq('B')将df缩减到工作日：

import numpy as np import pandas as pd # hr1 = np.loadtxt("Spot_2013_Hour1.txt") hr1 = np.random.random(365) index = date_range(start='2013-1-1', end='2013-12-31', freq='D') df = pd.DataFrame(hr1, index=index) indexed_df = df.asfreq('B') print(indexed_df)

要将频率设置为工作日，而不包括特定日期，您可以使用offsets.CustomBusinessDay：

import pandas.tseries.offsets as offsets holidays = ['2013-10-03' , '2013-12-25'] business_days = offsets.CustomBusinessDay(holidays=holidays) custom_df = df.asfreq(business_days)

因此，custom_df比indexed_df
少两天
In [12]: len(custom_df) Out[12]: 259 In [13]: len(indexed_df) Out[13]: 261

和＆＃34;假期＆＃34;像'2013-10-03'一样缺少：

In [18]: '2013-10-03' in indexed_df.index Out[18]: True In [19]: '2013-10-03' in custom_df.index Out[19]: False

知道the reindex method可用于子选择行也很有用。例如，您可以从indexed_df.index中减去特定日期：

idx = indexed_df.index - pd.DatetimeIndex(holidays) custom_df2 = df.reindex(idx)

结果custom_df2等于custom_df：

In [35]: custom_df2.equals(custom_df) Out[35]: True

但请注意索引有点不同：

In [36]: custom_df.index Out[36]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01, ..., 2013-12-31] Length: 259, Freq: C, Timezone: None In [37]: custom_df2.index Out[37]: <class 'pandas.tseries.index.DatetimeIndex'> [2013-01-01, ..., 2013-12-31] Length: 259, Freq: None, Timezone: None

custom_df为Freq: C，而custom_df2为Freq: None。 freq由某些方法使用，例如snap和to_period。但是这些方法也允许你指定所需的频率作为参数，所以在实践中我没有发现这个差异是一个大问题。

更改DataFrame中的索引数量？

1 个答案: