从multiindex获取唯一索引的值

时间:2018-04-26 09:27:00

标签: python pandas

df是一个如下所示的数据集。列名,日期是多索引,值是相应的数据(代表股票的美元价格)

                               value
name             date               
122630 KS Equity 2013-01-02  13495.0
                 2013-01-03  13400.0
                 2013-01-04  13195.0
                 2013-01-07  13220.0
                 2013-01-08  12960.0
                 2013-01-09  12850.0
                 2013-01-10  13080.0
                 2013-01-11  12910.0
                 2013-01-14  13050.0
                 2013-01-15  12765.0
                 2013-01-16  12570.0
                 2013-01-17  12595.0
                 2013-01-18  12690.0
                 2013-01-21  12735.0
                 2013-01-22  12880.0
                 2013-01-23  12630.0
                 2013-01-24  12415.0
...                              ...
278240 KS Equity 2018-03-19  22855.0
                 2018-03-20  23690.0
                 2018-03-21  23275.0
                 2018-03-22  22285.0
                 2018-03-23  19460.0
                 2018-03-26  21110.0
                 2018-03-27  21080.0
                 2018-03-28  20535.0
                 2018-03-29  21605.0
                 2018-03-30  21785.0
291630 KS Equity 2018-03-16   9980.0
                 2018-03-19   9680.0
                 2018-03-20  10025.0
                 2018-03-21   9865.0
                 2018-03-22   9420.0
                 2018-03-23   8225.0
                 2018-03-26   8930.0
                 2018-03-27   8915.0
                 2018-03-28   8680.0
                 2018-03-29   9165.0
                 2018-03-30   9230.0
292340 KS Equity 2018-03-20  10050.0
                 2018-03-21  10050.0
                 2018-03-22  10090.0
                 2018-03-23   9750.0
                 2018-03-26   9815.0
                 2018-03-27   9925.0
                 2018-03-28   9745.0
                 2018-03-29   9890.0
                 2018-03-30   9970.0

问题是“我如何制作一个唯一的日期时间数据集,其中包含上述所有日期而不是重复日期?

all_dates = [datetime(2013,1,2,0), datetime(2013,1,3,0), datetime(2013,1,4,0),...datetime(2018,3,29,0), datetime(2018,3,30,0)]

我尝试df.index(1).value,但错误''MultiIndex' object is not callable'

3 个答案:

答案 0 :(得分:1)

先使用get_level_values,然后使用unique,最后转换为list

L = df.index.get_level_values('date').unique().tolist()
print (L[:10])
[Timestamp('2013-01-02 00:00:00'), Timestamp('2013-01-03 00:00:00'), 
 Timestamp('2013-01-04 00:00:00'), Timestamp('2013-01-07 00:00:00'), 
 Timestamp('2013-01-08 00:00:00'), Timestamp('2013-01-09 00:00:00'), 
 Timestamp('2013-01-10 00:00:00'), Timestamp('2013-01-11 00:00:00'), 
 Timestamp('2013-01-14 00:00:00'), Timestamp('2013-01-15 00:00:00')]

对于纯python日期时间添加to_pydatetime

L = df.index.get_level_values('date').unique().to_pydatetime().tolist()
print (L[:10])
[datetime.datetime(2013, 1, 2, 0, 0), datetime.datetime(2013, 1, 3, 0, 0), 
 datetime.datetime(2013, 1, 4, 0, 0), datetime.datetime(2013, 1, 7, 0, 0), 
 datetime.datetime(2013, 1, 8, 0, 0), datetime.datetime(2013, 1, 9, 0, 0), 
 datetime.datetime(2013, 1, 10, 0, 0), datetime.datetime(2013, 1, 11, 0, 0), 
 datetime.datetime(2013, 1, 14, 0, 0), datetime.datetime(2013, 1, 15, 0, 0)]

答案 1 :(得分:0)

CONSTRAINT Registration_unique_section_id UNIQUE (section_id)

答案 2 :(得分:0)

您可以使用以下技术从日期列表中仅提取唯一日期:

li =已排序(all_dates)

print([li [d]表示d在范围内(0,len(li))如果不是(li [d] == li [d-1]或(d