在pyspark数据框中减去

时间:2019-08-13 18:31:46

标签: python pyspark pyspark-sql

我想知道subtract的工作原理

target_df = df.subtract(df1)

要么将df1之外的行返回到target_df,要么将df之外的df1行返回到target_df < / p>

1 个答案:

答案 0 :(得分:0)

让我们假设以下示例:

df1 has values as (1,2,3,4,5,6)
df2 has values as (3,4,5,6,7,8)

然后target_df = df1.subtract(df2)的值将为“ df1中的值-两个dfs中的通用值”,即

(1,2,3,4,5,6) - (3,4,5,6) = (1,2)

请按以下代码运行:

from pyspark.sql import Row
df1 = spark.sparkContext.parallelize([Row(1), Row(2), Row(3), Row(4), Row(5), Row(6)]).toDF()
df2 = spark.sparkContext.parallelize([Row(3), Row(4), Row(5), Row(6), Row(7), Row(8)]).toDF()
target_df = df1.subtract(df2)
target_df.show()
相关问题