如果您“或”选择两个不匹配的系列会怎样?

时间:2019-06-15 04:32:13

标签: python pandas dataframe series

因此,我创建了两个由100个元素组成的系列,并将它们“或”在一起。 但是首先,我对第一个系列进行了“排序”,这意味着索引不会对齐。 我预期会发生错误。还是不好的结果。但是我得到的是带有126个元素的第三系列!那真是令人惊讶。有什么想法吗?

请注意billy_or_peter输出清单中的“ Richardson”的4行。有4个值,两个为“ True”,两个为“ False”。

我认为可能会有某种“笛卡尔积”导致200行。但是,我看到了126行-这很奇怪。

有想法吗?

# Loc and Iloc also allow for conditional statments to filter rows of data
# using Loc on the logic test above only returns rows where the result is True
only_billys = df.loc[df["first_name"] == "Billy", :]
print(only_billys)

only_peters = df.loc[df["first_name"] == "Peter", :]
print(only_peters)
print()

only_richardsons = df.loc["Richardson", :]
print(only_richardsons)
print()

isBilly = (df["first_name"] == "Billy").sort_index()
print(isBilly.describe())
print()

isPeter = (df["first_name"] == "Peter")
print(isPeter.describe())
print()

billy_or_peter = isPeter | isBilly
print(billy_or_peter.describe())
print(billy_or_peter)

输出


(only_billys)
           id first_name      Phone Number       Time zone
last_name                                                 
Clark      20      Billy  62-(213)345-2549   Asia/Makassar
Andrews    23      Billy  86-(859)746-5367  Asia/Chongqing
Price      59      Billy  86-(878)547-7739   Asia/Shanghai
            id first_name     Phone Number      Time zone

(only_peters)
last_name                                                
Richardson   1      Peter  7-(789)867-9023  Europe/Moscow

            id first_name      Phone Number      Time zone

(only_richardsons)
last_name                                                 
Richardson   1      Peter   7-(789)867-9023  Europe/Moscow
Richardson  25     Donald  62-(259)282-5871   Asia/Jakarta

(isBilly.describe() - sorted index)
count       100
unique        2
top       False
freq         97
Name: first_name, dtype: object

(isPeter.describe())
count       100
unique        2
top       False
freq         99
Name: first_name, dtype: object

(billy_or_peter.describe() - 126 rows???)
count       126
unique        2
top       False
freq        121
Name: first_name, dtype: object

(billy_or_peter listing - notice 4 Richardsons where before there were only 2)
last_name
Adams         False
Allen         False
Andrews        True
Austin        False
Baker         False
Banks         False
Bell          False
Berry         False
Bishop        False
Black         False
Brooks        False
Brown         False
Bryant        False
Bryant        False
Bryant        False
Bryant        False
Burke         False
Butler        False
Butler        False
Butler        False
Butler        False
Carroll       False
Chapman       False
Chavez        False
Clark          True
Collins       False
Cook          False
Day           False
Day           False
Day           False
              ...  
Price          True
Reid          False
Reyes         False
Rice          False
*Richardson     True
*Richardson     True
*Richardson    False
*Richardson    False
Riley         False
Roberts       False
Robertson     False
Robinson      False
Rogers        False
Scott         False
Shaw          False
Shaw          False
Shaw          False
Shaw          False
Simmons       False
Snyder        False
Sullivan      False
Torres        False
Tucker        False
Vasquez       False
Wagner        False
Walker        False
Washington    False
Watkins       False
Wells         False
Williamson    False
Name: first_name, Length: 126, dtype: bool

1 个答案:

答案 0 :(得分:1)

不匹配不是问题所在,pandas将在|之前对齐。您的问题是由于索引重复。为此,比较是通过在匹配索引中进行outer连接来完成的。因此,一个中的2个Richardsons和另一个中的2个Richardsons将导致您的输出中有4行。

为更清楚地说明这一点,请看一下添加具有重复和未对齐索引的字符串时发生的情况。我们从笛卡尔乘积中获得了索引1的6(2 x 3)行:

import pandas as pd

df1 = pd.DataFrame(list('abcd'), index=[1,1,2,3])
df2 = pd.DataFrame(list('1243'), index=[1,1,3,1])
df1+df2

     0
1   a1
1   a2
1   a3
1   b1
1   b2
1   b3
2  NaN
3   d4