加入一个有条件的另外两个 DF - Scala Spark

时间:2021-07-21 11:50:15

标签: scala apache-spark

我正在尝试使用条件将一个 DF 与另外两个加入。我有以下 DF。 DF1,我想与 df_cond1df_cond2 一起加入的 DF。

如果 DF1 InfoNum col 是 NBC 我想加入 df_cond1 否则如果 DF1 InfoNum Column 是 BBC 我想加入 df_cond2 但我不知道我该怎么做。

DF1
+-------------+----------+-------------+
|  Date       | InfoNum  |   Sport     |
+-------------+----------+-------------+
|  31/11/2020 |   NBC    |  football   | 
|  11/01/2020 |   BBC    |  tennis     |
+-------------+----------+-------------+

df_cond1
+-------------+---------+-------------+
| Periodicity |   Info  | Description |
+-------------+---------+-------------+
|  Monthly    |  NBC    | DATAquality |
+-------------+---------+-------------+

df_cond2
+-------------+---------+-------------+
| Periodicity |   Info  | Description |
+-------------+---------+-------------+
|  Daily      |  BBC    | InfoIndeed  |
+-------------+---------+-------------+

final_df
+-------------+----------+-------------+-------------+
|  Date       | InfoNum  |   Sport     | Description |
+-------------+----------+-------------+-------------+
|  31/11/2020 |   NBC    |  football   | DATAquality | 
|  11/01/2020 |   BBC    |  tennis     | InfoIndeed  |
+-------------+----------+-------------+-------------+

我一直在寻找,但没有找到好的解决方案,你能帮我吗?

1 个答案:

答案 0 :(得分:0)

以下是您加入的方式

val df = Seq(
  ("31/11/2020", "NBC", "football"),
  ("1/01/2020", "BBC", "tennis")
).toDF("Date", "InfoNum", "Sport")

val df_cond1 = Seq(
  ("Monthly", "NBC", "DATAquality")
).toDF("Periodicity", "Info", "Description")

val df_cond2 = Seq(
  ("Daily", "BBC", "InfoIndeed")
).toDF("Periodicity", "Info", "Description")

df.join(df_cond1.union(df_cond2), $"InfoNum" === $"Info")
  .drop("Info", "Periodicity")
  .show(false)

输出:

+----------+-------+--------+-----------+
|Date      |InfoNum|Sport   |Description|
+----------+-------+--------+-----------+
|31/11/2020|NBC    |football|DATAquality|
|1/01/2020 |BBC    |tennis  |InfoIndeed |
+----------+-------+--------+-----------+
相关问题