Question

我有一个看起来像这样的数据框：

import numpy as np
import pandas as pd    

data = {'datetime' : ['2009-07-24 02:00:00', '2009-07-24 03:00:00','2009-07-24 04:00:00'],
     'value1' : ['a', np.nan ,'c'],
     'value2' : ['d','e','f']}
df = pd.DataFrame(data)
df = df.set_index(pd.DatetimeIndex(df['datetime']))
missing = df.loc[:, df.columns != ('datetime')]

以上数据仅是示例。但是可以说，在更大的数据中我有很多缺失的值。我想在“值1”列中选择所有缺少值的数据。

missing_index = df[df['value1'].isnull()].index

这段代码可以让我得到所有缺失值的索引，但是我想要它们的实际行，在这种情况下，是第二行。

所以，我尝试了

df[missing_index]

但是我在说一个错误

KeyError：“ DatetimeIndex（['2009-07-24 03:00:00']，dtype ='datetime64 [ns]'，name ='datetime'，freq = None）不在索引中”

请在这里帮助我。谢谢。

Answer 1

该错误来自以下事实：df[<something>]用于获取列。调用df[missing_index]时，它会尝试在列（也是索引）中找到missing_index。

@panktijk在他的评论中指出，做您想要的事的最简单方法是

df[df['value1'].isnull()]

但是，如果由于某种原因（也许您想操纵它们）而想走到首先获取索引然后使用那些索引拉取子数据帧的位置，则可以执行以下操作：

df.loc[missing_index]

Answer 2

我正在使用索引来捕获行号：（从0开始）

void main() {

  testA<String>(String content) {
    print(content);
  }

  Widget testB<Product>(Product item) {
    print(item.a);
  }

  testWidgets('Counter increments smoke test', (WidgetTester tester) async {
    Test<int, Product> c = Test(testA, testB);
    c.eval();
  });
}

class Product {
 final String a;
 Product(this.a);
}

typedef CellBuilder<F> = Widget Function(F item);
typedef testString<T>(T text);

class Test<T, F> {
  final testString _f;
  final CellBuilder _cellBuilder;
  Test(testString this._f, CellBuilder this._cellBuilder);
  eval() {
    _f("hello");
    _cellBuilder(Product("world"));
  }
}

结果：

import pandas as pd
import numpy as np

data = {'datetime' : ['2009-07-24 02:00:00', '2009-07-24 03:00:00','2009-07-24 04:00:00', '2009-07-24 05:00:00'],
     'value1' : ['a', np.nan ,'c', np.nan],
     'value2' : ['d','e','f', 'g']}
df = pd.DataFrame(data)
df = df.set_index(pd.DatetimeIndex(df['datetime']))

listofnan = df.index[df['value1'].isnull()].tolist()

for i in listofnan:
    print(df.index.get_loc(i))

数据框使用DatetimeIndex选择数据

2 个答案: