使用Python Pandas进行数据处理

时间:2016-11-03 10:35:17

标签: python pandas indexing dataframe

我有一个CSV文件,格式如下:

API Name                Test Result    Risk Rating    Vulnerability Category

https://api-test.com      FAIL           LOW          Information Gathering
https://api-test1.com     PASS           MEDIUM       Authentication Test
https://api-test2.com     SKIP           HIGH         Web Service
https://api-test1.com     FAIL           CRITICAL     Configuration Management

我正在使用pandas库进行数据处理。现在,您可以从表中看到重复的API网址。所以,我想要的是在Dataframe中获得相同API的deatils。例如:API" https://api-test1.com"的API名称变量应该包含这样的数据:

 API Name                Test Result    Risk Rating    Vulnerability Category

https://api-test1.com     PASS           MEDIUM         Authentication Test
https://api-test1.com     FAIL           CRITICAL       Configuration Management

API2的类似变量应包含与所有API2相关的数据。谢谢!

1 个答案:

答案 0 :(得分:1)

您可以使用.duplicated(keep=False)方法:

In [138]: df['API Name'].duplicated(keep=False)
Out[138]:
0    False
1     True
2    False
3     True
Name: API Name, dtype: bool

In [139]: df[df['API Name'].duplicated(keep=False)]
Out[139]:
                API Name Test Result Risk Rating    Vulnerability Category
1  https://api-test1.com        PASS      MEDIUM       Authentication Test
3  https://api-test1.com        FAIL    CRITICAL  Configuration Management

更新:您不需要这些变量(api1api2等),因为您始终可以轻松访问DataFrame中的数据:< / p>

In [152]: apis = df['API Name'].unique()

In [153]: apis
Out[153]: array(['https://api-test.com', 'https://api-test1.com', 'https://api-test2.com'], dtype=object)

In [154]: for api in apis:
     ...:     print(df.loc[df['API Name'] == api])
     ...:
               API Name Test Result Risk Rating Vulnerability Category
0  https://api-test.com        FAIL         LOW  Information Gathering
                API Name Test Result Risk Rating    Vulnerability Category
1  https://api-test1.com        PASS      MEDIUM       Authentication Test
3  https://api-test1.com        FAIL    CRITICAL  Configuration Management
                API Name Test Result Risk Rating Vulnerability Category
2  https://api-test2.com        SKIP        HIGH            Web Service